Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing to use Memory in Pipeline #52

Merged
merged 27 commits into from
Aug 6, 2024

Conversation

c-w-feldmann
Copy link
Collaborator

No description provided.

@c-w-feldmann c-w-feldmann added the type: bug Something isn't working label Jul 12, 2024
@c-w-feldmann c-w-feldmann self-assigned this Jul 12, 2024
@c-w-feldmann c-w-feldmann linked an issue Jul 12, 2024 that may be closed by this pull request
4 tasks
@c-w-feldmann c-w-feldmann changed the title first functional fix Allowing to use Memory in Pipeline Jul 12, 2024
@c-w-feldmann c-w-feldmann marked this pull request as ready for review July 15, 2024 11:49
Copy link
Collaborator

@JochenSiegWork JochenSiegWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some problems with the design of unit test. These should be fixed before merging.

tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved

# Compare results
self.assertTrue(np.allclose(pred1, pred2))
self.assertLess(time2, time1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing timings in unit tests is considered a bad thing. Timings depend on many external circumstances. Our code can be correct even when these external circumstances lead to bad runtimes. Instead, I would just test to ensure that the results are correct.

You could also add a test that tests that the cached element is used and not recreated every time the pipeline is called. Maybe wrap the featurization-element in a Wrapper class that counts executions of the fit_transform method, like:

class MyWrapper:

def __init__(element):
  self.counter = 0
  self.element = element

def fit_transform(X, y):
  self.counter += 1
  return self.element.fit_transform(X, y)

featurization_element_wrapped = MyWrapper(MorganFP())
pipeline = Pipeline([
  ..,
  ("featurizater", featurization_element_wrapped),
 ..]
pipeline.fit(X,y)

self.assertEqual(featurization_element_wrapped.counter, 1)

Note that this should only work when you don't parallelize the execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented. I had to make the counter var a global var, as otherwise the hash of the object would be changed. It is kind of ugly, do you have a better idea?

tests/test_pipeline.py Outdated Show resolved Hide resolved
@c-w-feldmann
Copy link
Collaborator Author

Currently fails docsig. Assumingly, this is a docsig error, as this only occurs for 0.59.0. See this issue.

test_extras/test_chemprop/test_chemprop_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
test_extras/test_chemprop/test_chemprop_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
tests/test_pipeline.py Outdated Show resolved Hide resolved
@c-w-feldmann c-w-feldmann merged commit cc98b76 into main Aug 6, 2024
14 checks passed
@c-w-feldmann c-w-feldmann deleted the 51-pipeline-does-fails-when-setting-memory branch August 6, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline does fails when setting memory
2 participants