Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add dataprocessor. #1256

Merged
merged 3 commits into from
Mar 3, 2023

Conversation

joernweissenborn
Copy link
Member

@joernweissenborn joernweissenborn commented Feb 23, 2023

Adds a (chainable) PreProcessingPipeline with (initial) support for CorrectBaselineAverage and CorrectBaselineValue.

@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch joernweissenborn/pyglotaran/feature/datapreprocessing

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2023

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff v0.6.0 vs. main

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)

All benchmarks:

       before           after         ratio
     [6c3c390e]       [517f485c]
     <v0.6.0>                   
!      59.8±0.9ms           failed      n/a  BenchmarkOptimize.time_optimize(False, False, False)
!        129±30ms           failed      n/a  BenchmarkOptimize.time_optimize(False, False, True)
!        61.1±1ms           failed      n/a  BenchmarkOptimize.time_optimize(False, True, False)
!       68.2±30ms           failed      n/a  BenchmarkOptimize.time_optimize(False, True, True)
!        77.4±1ms           failed      n/a  BenchmarkOptimize.time_optimize(True, False, False)
!        84.1±7ms           failed      n/a  BenchmarkOptimize.time_optimize(True, False, True)
!        78.1±1ms           failed      n/a  BenchmarkOptimize.time_optimize(True, True, False)
!       86.8±30ms           failed      n/a  BenchmarkOptimize.time_optimize(True, True, True)
             205M             208M     1.02  IntegrationTwoDatasets.peakmem_optimize
-      2.27±0.05s       1.44±0.01s     0.64  IntegrationTwoDatasets.time_optimize

Benchmark diff main vs. PR

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)

All benchmarks:

       before           after         ratio
     [517f485c]       [9473bf56]
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, False, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, False, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, True, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, True, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, False, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, False, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, True, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, True, True)
             208M             208M     1.00  IntegrationTwoDatasets.peakmem_optimize
       1.44±0.01s       1.46±0.04s     1.01  IntegrationTwoDatasets.time_optimize

@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch 3 times, most recently from 92c0927 to 2cecc05 Compare February 23, 2023 23:19
@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch 2 times, most recently from e78d409 to c4629ba Compare February 26, 2023 06:12
@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch from c4629ba to b4248d0 Compare February 28, 2023 01:18
@joernweissenborn joernweissenborn marked this pull request as ready for review February 28, 2023 01:18
@joernweissenborn joernweissenborn requested review from jsnel and a team as code owners February 28, 2023 01:18
@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch 3 times, most recently from 2b85d3d to 4896747 Compare February 28, 2023 01:40
@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch 3 times, most recently from c897f82 to fecbfdf Compare February 28, 2023 01:56
@codecov
Copy link

codecov bot commented Feb 28, 2023

Codecov Report

Patch coverage: 100.0% and project coverage change: +0.1 🎉

Comparison is base (517f485) 88.3% compared to head (9473bf5) 88.4%.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1256     +/-   ##
=======================================
+ Coverage   88.3%   88.4%   +0.1%     
=======================================
  Files        104     107      +3     
  Lines       5100    5145     +45     
  Branches     852     853      +1     
=======================================
+ Hits        4505    4550     +45     
  Misses       478     478             
  Partials     117     117             
Impacted Files Coverage Δ
glotaran/io/preprocessor/__init__.py 100.0% <100.0%> (ø)
glotaran/io/preprocessor/pipeline.py 100.0% <100.0%> (ø)
glotaran/io/preprocessor/preprocessor.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch from fecbfdf to d49cf9d Compare February 28, 2023 20:40
Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start, but I'd like to see a bit more meaning full tests.
The CorrectBaselineValue method is ok, just needs more tests.
The CorrectBaseLineAverage case needs revision, and more extensive tests.

glotaran/io/preprocessor/pipeline.py Outdated Show resolved Hide resolved
glotaran/io/preprocessor/pipeline.py Outdated Show resolved Hide resolved
glotaran/io/preprocessor/preprocessor.py Outdated Show resolved Hide resolved
@joernweissenborn joernweissenborn force-pushed the feature/datapreprocessing branch 2 times, most recently from cb0c70a to 9210faa Compare March 1, 2023 16:41
Copy link
Member

@s-weigand s-weigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start, just got my usual nitpicks.

I guess for now there be no other way to compose pipelines than copypasting a file together or combining the action attributes?

compose_pipeline = PreProcessingPipeline(pl1.actions[:-1] + pl2.actions[-2:])

glotaran/io/preprocessor/preprocessor.py Outdated Show resolved Hide resolved
glotaran/io/preprocessor/preprocessor.py Outdated Show resolved Hide resolved
Copy link
Member

@s-weigand s-weigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound something else (sorry for not catching it before 😢)

glotaran/io/preprocessor/__init__.py Show resolved Hide resolved
s-weigand
s-weigand previously approved these changes Mar 1, 2023
Copy link
Member

@s-weigand s-weigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks for adding the preprocessing. 🎉

@s-weigand s-weigand force-pushed the feature/datapreprocessing branch from 1ea0664 to dd020ed Compare March 1, 2023 19:40
jsnel
jsnel previously approved these changes Mar 3, 2023
Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be the most intuitive way of doing preprocessing, but it works (for the limited cases currently defined), and it doesn't conflict with anything, so I'm ok with merging it in.

I do think improvements are possible, but that can be tackled in a future PR.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approve changes following dropped commit.

@s-weigand s-weigand merged commit 3792fd3 into glotaran:main Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants