Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve aggregation #23

Merged
merged 5 commits into from
Jun 12, 2024
Merged

Improve aggregation #23

merged 5 commits into from
Jun 12, 2024

Conversation

nebfield
Copy link
Member

@nebfield nebfield commented Jun 12, 2024

  • Fix missing logging messages in verbose mode
  • Explicitly loads one DF into memory at a time
  • Use and export natural sorting functions to add chromosomes in a logical way

@nebfield nebfield marked this pull request as ready for review June 12, 2024 08:48
@nebfield nebfield requested a review from smlmbrt June 12, 2024 08:56
@nebfield nebfield requested a review from smlmbrt June 12, 2024 14:06
@nebfield nebfield merged commit 1664852 into dev Jun 12, 2024
6 checks passed
@nebfield nebfield deleted the improve_aggregation branch June 12, 2024 15:00
nebfield added a commit that referenced this pull request Jun 12, 2024
* Don't perform ancestry adjustments/keep AVG columns.

* Edit expected column list

* Bump versions (calc=0.2.0;utils=1.1.0)

* simplify polygenicscore class (remove batches until we run into problems)

* Fix pgscatalog.match performance regression (#22)

* drop pyarrow support, it doesn't scale well, and be more consistent about public path properties

* refactor to use polars for reading and writing IPC files to improve scalability

* fix map_elements deprecation warning

* update lockfiles

* fix weird path -> is_path refactor that broke this test

* missed one >_>

* fix pyproject

* update dockerfile

* fix exception handling when one score fails matching

* fix merging scoring files with different column sets

* set pgscatalog package logging levels to INFO

* Improve aggregation (#23)

* export key functions for sorting chromosomes / effect types

* use new key functions for sorting

* reduce memory usage during aggregation

* fix doctest output

* make aggregation steps clearer

* bump minor version of pgscatalog.core

* minor version bump pgscatalog.match

---------

Co-authored-by: Benjamin Wingfield <bwingfield@ebi.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants