Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanded derivatives #85

Merged
merged 18 commits into from
Aug 30, 2020
Merged

Expanded derivatives #85

merged 18 commits into from
Aug 30, 2020

Conversation

briandesilva
Copy link
Member

Expands numerical differentiation capabilities by wrapping the methods in the derivative package. The PySINDy versions of each method have the same names as the objects in derivative, but with "Differentiator" appended to the end. I created an example notebook comparing all the differentiation options with and without noise. I also added some basic unit tests, but didn't make them stringent because derivative already has its own suite of tests.

For now I decided to leave in our original FiniteDifference and SmoothedFiniteDifference methods because

  • they allow the endpoints to be dropped
  • the seem to perform better in some instances
  • there isn't a method in derivative that implements smoothed finite differences (apply smoothing, then apply a finite difference method)

See #58 for more context.

I'm curious whether @andgoldschmidt and @Ohjeah have any thoughts about this implementation.

@briandesilva briandesilva requested a review from Ohjeah July 20, 2020 20:32
@briandesilva briandesilva self-assigned this Jul 20, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2020

Codecov Report

Merging #85 into master will increase coverage by 0.02%.
The diff coverage is 97.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #85      +/-   ##
==========================================
+ Coverage   95.35%   95.37%   +0.02%     
==========================================
  Files          18       19       +1     
  Lines         968      995      +27     
==========================================
+ Hits          923      949      +26     
- Misses         45       46       +1     
Impacted Files Coverage Δ
pysindy/differentiation/base.py 100.00% <ø> (ø)
pysindy/feature_library/polynomial_library.py 97.72% <ø> (ø)
pysindy/optimizers/sr3.py 91.00% <ø> (ø)
pysindy/optimizers/stlsq.py 92.53% <ø> (ø)
pysindy/differentiation/sindy_derivative.py 96.15% <96.15%> (ø)
pysindy/differentiation/__init__.py 100.00% <100.00%> (ø)
pysindy/feature_library/__init__.py 100.00% <100.00%> (ø)
pysindy/optimizers/__init__.py 100.00% <100.00%> (ø)
pysindy/pysindy.py 96.88% <100.00%> (ø)
pysindy/utils/__init__.py 100.00% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0800089...622f5c3. Read the comment docs.

@andgoldschmidt
Copy link

FiniteDifference in pysindy is better than the one in the derivative package. I'll work to improve the derivative implementation en route to v1.0.

I believe SmoothedFiniteDifference in pysindy is similar to the SavitzkyGolay derivative in the package. Both fit a polynomial to a window of the data. The difference is whether the fit polynomial is used to compute the exact derivative (derivative) or if the the fit polynomial is piped into another derivative method (pysindy).

Not sure which functionality is desired, but I think if the pipe goes to finite difference the methods should be equivalent up to the numerical error of the finite difference method. (If you pipe into a method with additional fits/regularization, they will be different).

@Ohjeah
Copy link
Collaborator

Ohjeah commented Jul 28, 2020

I am slightly worried about the maintenance overhead this implementation causes. Every change in the derivative package requires us to adopt these changes in pysindy too.

I'd prefer to use the dxdt and pass through the derivative_args and derivative_kwargs as well as link to the derivative sphinx docu as soon as its ready.

@briandesilva
Copy link
Member Author

@Ohjeah, I see what you mean and I agree that the code would be more maintainable if we were to use derivative objects more directly.

I'd prefer to use the dxdt and pass through the derivative_args and derivative_kwargs as well as link to the derivative sphinx docu as soon as its ready.

I'm not sure I understand what this solution would look like. Are you envisioning that rather than specify a class

model = SINDy(differentiation_method=SpectralDerivative(...))

the user would supply some keywords?

model = SINDy(derivative_kwargs={...})

Or maybe we could write a simple wrapper class for all derivative objects, similar to what we did with the optimizers?

optimizer = SINDyOptimizer(self.optimizer, unbias=unbias)

Or were you thinking something along the lines of having the user pass in a derivative class directly, that SINDy uses unmodified?

I'm not sure it's good enough reason to outweigh the maintainability concern you raised, but the nice thing about the current implementation is that it allows parameters of the differentiators to be included when performing sklearn-style cross-validation.

@andgoldschmidt, I think SmoothedFiniteDifference and SavitzkyGolay can coexist. You're right about how the two methods are similar, but if the user specifies a different smoother in SmoothedFiniteDifference, then they'll give different results. I could change the default smoother that's used to be something other than scipy's savgol_filter, but it appears to do well in practice.

Once you've updated FiniteDifference in derivative, I can switch PySINDy to use that implementation by default. Actually, maybe the best thing to do would be to use derivative objects exclusively and do away with pysindy.differentiation completely, pending reasonable resolutions to the concerns I raised above. I could contribute a SmoothedFiniteDifference class to your package to help ease the transition.

@andgoldschmidt
Copy link

andgoldschmidt commented Jul 28, 2020

  1. I like the approach of SINDyDerivative("kind", args, kwargs) as a wrapper on the derivative dxdt("kind", args, kwargs) function. Echoing my understanding of @Ohjeah's comment, this should allow you to maintain the additional features you desire without requiring unique wrappers for each derivative object. This keeps you safe once derivative has a 1.0 interface--any new methods in derivative 1.x that satisfy the 1.0 interface should be able to be included with PySINDy without breaking anything.

I do agree that a wrapper for PySINDy is important, especially because

it allows parameters of the differentiators to be included when performing sklearn-style cross-validation.

is a great feature--honestly essential for users to truly integrate some of these derivative methods into the PySINDy framework. Also, a wrapper let's you create default args if that's appealing.

Let me know if derivative should make changes to better accommodate integration with cross validation.

  1. If there are other common tools for filtering data besides a savgol filter, maybe that belongs as a separate utility. That is, this seems less like a derivative method and more like a pre-processing step to me.

@Ohjeah
Copy link
Collaborator

Ohjeah commented Jul 29, 2020

I agree with you both. Wrapping is required to ensure the cross-validation functionality and maybe specific input validation and error handling. The wrapping should be flexible enough though to seamlessly upgrade the derivative package dependency.

@briandesilva
Copy link
Member Author

Derivative wrapper

I'll take a stab at implementing a universal wrapper for derivative objects. I'm a little worried that if the PySINDy wrapper only interfaces with the dxdt function then cross-validation will be difficult. When dxdt is called it builds a differentiation object on the fly, then immediately applies it to the data. I'll dig into how Scikit-learn's cross-validation methods set different parameter values to see if our wrapper implementation will work. I'll get back to you about any changes that need to be made in derivative

Smoothed finite differences

The main reason I included this method in the initial release was because it was one that Floris highlighted as being relatively simple, effective, and easy to implement. I agree that we could just leave it to the user to smooth their data before feeding it to a SINDy instance. Let me deal with the derivative wrapper stuff first then I can better judge what to do with SmoothedFiniteDifference.

@Ohjeah
Copy link
Collaborator

Ohjeah commented Aug 4, 2020

@briandesilva You can run a gridsearch on **kwargs parameters if you modify the set_params() method similar to how they do it with xgboost. Here is a little mockup:

from sklearn.base import BaseEstimator


class Model(BaseEstimator):
    def __init__(self, nested_kwargs=None):
        self.kwargs = {"nested_kwargs": nested_kwargs}

    def fit(self, x, y=None, **fit_params):
        print(self.kwargs)
        return self

    def score(self, x, y):
        return 0

    def set_params(self, **params):
        """Set the parameters of this estimator.
        Modification of the sklearn method to allow unknown kwargs. This allows using
        the full range of xgboost parameters that are not defined as member variables
        in sklearn grid search.
        Returns
        -------
        self
        """
        if not params:
            # Simple optimization to gain speed (inspect is slow)
            return self

        for key, value in params.items():
            if hasattr(self, key):
                setattr(self, key, value)
            else:
                self.kwargs[key] = value

        return self


import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

from sklearn.model_selection import GridSearchCV

m = Model()
p = {"nested_kwargs": [{"a": 1}, {"a": 2}]}
grid = GridSearchCV(m, p)
print(grid)
grid.fit(X, y)

Model would be the derivative wrapper and nested_kwargs the object passed to the dxdt method.

@briandesilva
Copy link
Member Author

Okay, I finally got around to updating the implementation based on @Ohjeah's suggestion. I added unit tests and updated the derivative and sklearn example notebooks.

The solution ended up being both lightweight and flexible enough to account for changes in the derivative package. Let me know what you think @andgoldschmidt.

@Ohjeah
Copy link
Collaborator

Ohjeah commented Aug 24, 2020

LGTM, we should also use an inter-sphinx link to the derivative.py documentation.

@briandesilva
Copy link
Member Author

Okay, I'm going to go ahead and merge this and create a new release.

@briandesilva briandesilva reopened this Aug 30, 2020
@briandesilva briandesilva merged commit c7c7829 into master Aug 30, 2020
@briandesilva briandesilva deleted the expanded-derivatives branch August 30, 2020 00:48
jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants