tinypandas

TODO: Write a description here

Installation

Add the dependency to your shard.yml:

dependencies:
  tinypandas:
    github: orangeSi/tinypandas

Run shards install

Features

1. support seprated by tab format or csv or vcf format file

Usage

test code is in example/test.cr like this:

require "tinypandas"

pd = Tinypandas.new

## support seprate by tab format file
df = pd.read_table(ifile, sep: "\t") # def read_table(filepath_or_buffer : String, sep = "\t", delimiter : String = "\n", header : HeaderType = 0, index_col : IndexColType = 0, comment : String|Regex = "#", skiprows : SkiprowsType = false, skip_blank_lines : Bool = true)

puts "df is #{df}\n"

puts "df.to_str is\n#{df.to_str}\n"

puts "df[A2][B3] is #{df["A2"]["B3"]}\n"

puts "df[df[A2]>=5].to_str is"
puts df[df["A2"]>=5].to_str

puts "df[df[A3]==9][A2].to_str is "
puts df[df["A3"]==9]["A2"].to_str

puts "df[df[A3]>=3][A2].to_str is "
puts df[df["A3"]>=3]["A2"].to_str

t = df["A2"]
puts "t = df[A2]is #{t}"
puts "t>2 is #{t>2}"

puts "df.t.to_str is\n#{df.t.to_str}"

puts "df.t[B3][A1] is "
puts df.t["B3"]["A1"]


## support vcf format file
df = pd.load_vcf("demo.vcf")
puts "df.head(1).to_s is\n"
puts df.head(1).to_s
puts "\n"

## support csv format file
df = pd.load_csv("sample.csv")
puts "df is #{df}\n"
puts "df.to_str is\n#{df.to_str}\n"
puts "df[col2][2] is #{df["col2"]["2"]}\n"


## convert Array(Array) to DataFrame
data = [[1,2,3],[4,5,6],[6,7,8]]
df = DataFrame.new(data, columns: ["c1","c2","c3"]) # read_array_by_row: true
puts "\nArray(Array()):#{data} to DataFrame:\n#{df.to_s}"

## read Hash(String, Array()) as DataFrame
data = {"c1"=>[1,2,3], "c2"=>[4,5,6], "c3"=>[6,7,8]}
df = DataFrame.new(data)
puts "\nHash(String, Array()):#{data} to DataFrame:\n#{df.to_s}"

then go to example cd example; crystal build test.cr --release

$cat demo.xls
# note
	A1	A3	A2
B1	1	3	2
B2	7	2	8
B3	4	9	5

then ./test demo.xls or ./test demo.xls.gz will get this:

## support seprate by tab format file
intpu file demo.xls

df is DataFrame(@dict={"A1" => Series(@dict={"B1" => 1, "B2" => 7, "B3" => 4}), "A3" => Series(@dict={"B1" => 3, "B2" => 2, "B3" => 9}), "A2" => Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})}, @index=["B1", "B2", "B3"], @columns=["A1", "A3", "A2"])

df.to_str is
	A1	A3	A2
B1	1	3	2
B2	7	2	8
B3	4	9	5

df[A2][B3] is 5
df[df[A2]>=5].to_str is
	A1	A3	A2
B2	7	2	8
B3	4	9	5

df[df[A3]==9][A2].to_str is 
B3	5

df[df[A3]>=3][A2].to_str is 
B1	2
B3	5
t = df[A2]is Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})
t>2 is Series(@dict={"B2" => 8, "B3" => 5})

df.t.to_str is
	B1	B2	B3
A1	1	7	4
A3	3	2	9
A2	2	8	5

df.t[B3][A1] is 
4

## support vcf format file
df.head(1).to_s is
	#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00096	HG00097	HG00099
0	MT	10	.	T	C	100	fa	VT=S;AC=3	GT	0	0	0

## support csv format file
df is DataFrame(@dict={"date" => Series(@dict={"0" => "2020-02-01 12:00:02", "1" => "2020-02-01 12:00:07", "2" => "2020-02-01 12:00:12", "3" => "2020-02-01 12:00:17", "4" => "2020-02-01 12:00:22", "5" => "2020-02-01 12:00:27", "6" => "2020-02-01 12:00:32", "7" => "2020-02-01 12:00:37"}), "col1" => Series(@dict={"0" => 66808, "1" => 66873, "2" => 66875, "3" => 66874, "4" => 66881, "5" => 66858, "6" => 66905, "7" => 66885}), "col2" => Series(@dict={"0" => 0.68, "1" => 0.67, "2" => 0.65, "3" => 0.67, "4" => 0.67, "5" => 0.66, "6" => 0.64, "7" => 0.66}), "col3" => Series(@dict={"0" => "TRUE", "1" => "FALSE", "2" => "TRUE", "3" => "FALSE", "4" => "TRUE", "5" => "FALSE", "6" => "TRUE", "7" => "FALSE"}), "col4" => Series(@dict={"0" => "str1", "1" => "str2", "2" => "str3", "3" => "str4", "4" => "str5", "5" => "str6", "6" => "str7", "7" => "str8"})}, @index=["0", "1", "2", "3", "4", "5", "6", "7"], @columns=["date", "col1", "col2", "col3", "col4"])
df.to_str is
	date	col1	col2	col3	col4
0	2020-02-01 12:00:02	66808	0.68	TRUE	str1
1	2020-02-01 12:00:07	66873	0.67	FALSE	str2
2	2020-02-01 12:00:12	66875	0.65	TRUE	str3
3	2020-02-01 12:00:17	66874	0.67	FALSE	str4
4	2020-02-01 12:00:22	66881	0.67	TRUE	str5
5	2020-02-01 12:00:27	66858	0.66	FALSE	str6
6	2020-02-01 12:00:32	66905	0.64	TRUE	str7
7	2020-02-01 12:00:37	66885	0.66	FALSE	str8

df[col2][2] is 0.65

Array(Array()):[[1, 2, 3], [4, 5, 6], [6, 7, 8]] to DataFrame:
	c1	c2	c3
0	1	2	3
1	4	5	6
2	6	7	8

Hash(String, Array()):{"c1" => [1, 2, 3], "c2" => [4, 5, 6], "c3" => [6, 7, 8]} to DataFrame:
	c1	c2	c3
0	1	4	6
1	2	5	7
2	3	6	8

TODO: Write usage instructions here

Development

TODO: Write development instructions here

Contributing

Fork it (https://github.com/orangeSi/tinypandas/fork)
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

Contributors

orangeSi - creator and maintainer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tinypandas

Installation

Features

Usage

Development

Contributing

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

tinypandas

Installation

Features

Usage

Development

Contributing

Contributors