Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Chromosome" and "Strand" columns become "object" dtype instead of "category" after .merge_overlaps() #37

Open
jeanmonet opened this issue Jul 5, 2024 · 1 comment

Comments

@jeanmonet
Copy link

jeanmonet commented Jul 5, 2024

Hi, I've found this dtype discrepancy before & after applying merge_overlaps():

gtfpr.remove_nonloc_columns().dtypes

Chromosome    category
Start            int64
End              int64
Strand        category
dtype: object

However Chromosome & Strand columns become of object dtype after merge_overlaps():

gtfpr.remove_nonloc_columns().merge_overlaps().dtypes

Chromosome    object
Start          int64
End            int64
Strand        object
dtype: object

Is this expected behavior or is it a bug?


In addition, using .join_ranges with join_type="left" produces dtypes of type float64 for Start_b, End_B and other columns, whereas join_type="inner" keeps those to their original int64 dtype:

joined = gtf_ext.join_ranges(fragments, join_type="left")

	Chromosome	Start	End	Start_b	End_b	barcode	count	ucount
0	chr1	55418	65419	56893.0	57061.0	CATGGATTCTTGCAGG-1	3.0	1.0
1	chr1	55418	65419	57033.0	57135.0	TTGTGCGAGTCATTTC-1	1.0	1.0


joined = gtf_ext.join_ranges(fragments, join_type="inner")


Chromosome	Start	End	Start_b	End_b	barcode	count	ucount
0	chr1	55418	65419	56893	57061	CATGGATTCTTGCAGG-1	3	1
1	chr1	55418	65419	64419	64681	CGCACACAGCGTGCGT-1	1	1
@endrebak
Copy link
Collaborator

endrebak commented Jul 5, 2024

Definitely not intended behavior. Will fix when I'm done with my PhD revision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants