Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Func1-derived tabulated functions #797

Merged
merged 11 commits into from
Apr 5, 2020

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Jan 22, 2020

Checklist

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • The pull request is ready for review

If applicable, fill in the issue number this pull request is fixing

Fixes Cantera/enhancements#17

Changes proposed in this pull request

  • use existing C++ Func1.h framework
  • create Python interface for new tabulated Func1 object and add unit tests

Implementing a tabulated function as a class derived from Func1 allows for a straight-forward integration into the existing framework without further code changes. An implementation in the C++ layer has the advantage that this solution is usable from (or at least can be extended to) all platforms.

Use case and benchmark

In [1]: import cantera as ct
   ...: import numpy as np
   ...: arr = np.array([[0, 2], [1, 1], [2, 0]])

In [2]: fcn1 = ct.Func1(arr)
   ...: fcn1(1.3)
Out[2]: 0.7

In [3]: fcn2 = ct.Func1(arr, interpolation='previous')
   ...: fcn2(1.3)
Out[3]: 1.0

In [4]: %timeit fcn1(1.3)
The slowest run took 29.33 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 122 ns per loop

In [5]: time = arr[:, 0]
   ...: fval = arr[:, 1]
   ...: fcn3 = lambda t: np.interp(t, time, fval)

In [6]: %timeit fcn3(1.3)
The slowest run took 9.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.46 us per loop

In [7]: fcn4 = ct.Func1(fcn3)

In [8]: %timeit fcn4(1.3)
The slowest run took 13.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.76 us per loop

@ischoegl ischoegl changed the title Implement tabulated functions for func1 Implement Func1-derived tabulated functions Jan 22, 2020
@codecov
Copy link

codecov bot commented Jan 22, 2020

Codecov Report

Merging #797 into master will increase coverage by 0.02%.
The diff coverage is 50.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #797      +/-   ##
==========================================
+ Coverage   71.39%   71.42%   +0.02%     
==========================================
  Files         372      372              
  Lines       43482    43580      +98     
==========================================
+ Hits        31045    31128      +83     
- Misses      12437    12452      +15     
Impacted Files Coverage Δ
include/cantera/numerics/Func1.h 0.54% <10.00%> (+0.26%) ⬆️
src/numerics/Func1.cpp 8.44% <59.18%> (+7.81%) ⬆️
src/oneD/Domain1D.cpp 58.46% <0.00%> (-3.85%) ⬇️
include/cantera/oneD/Sim1D.h 62.50% <0.00%> (ø)
test/kinetics/kineticsFromYaml.cpp 97.75% <0.00%> (+0.10%) ⬆️
include/cantera/oneD/StFlow.h 91.83% <0.00%> (+1.13%) ⬆️
src/oneD/Sim1D.cpp 73.01% <0.00%> (+1.28%) ⬆️
src/kinetics/KineticsFactory.cpp 96.34% <0.00%> (+1.34%) ⬆️
src/oneD/StFlow.cpp 89.52% <0.00%> (+2.58%) ⬆️
include/cantera/oneD/OneDim.h 38.29% <0.00%> (+8.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b449e83...6e637a9. Read the comment docs.

@ischoegl ischoegl requested a review from speth January 22, 2020 21:11
@ischoegl ischoegl force-pushed the tabulated-func1 branch 2 times, most recently from 2a089c3 to 5a88863 Compare January 23, 2020 16:23
@bryanwweber
Copy link
Member

This is great, and I'm glad the implementation turned out to be so simple. I disagree, however, that interpolation should be done between values. In my opinion, the nearest previous value should be returned. If we want to include interpolation, it should be an option that can be passed, IMO, e.g.,

ct.Func1(arr, interpolation=None)  # or 'None'?
ct.Func1(arr, interpolation='linear')
ct.Func1(arr, interpolation='quadratic')

etc.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 23, 2020

@bryanwweber ... happy that this may be useful.

The main reason why I implemented a linear interpolation is that it is more physical/flexible, and already contains the previous value case via:

ct.Func1([0, 1, 1, 2], [0, 0, 1, 1]) # using the two argument version for clarity

Beyond, I'm unsure about the fate of Func1.h, which defines methods for derivatives, etc. that are not broken out to Python. As I'm working from within that framework, things aren't straight-forward, and I'd rather not get into this deeper than absolutely necessary.

PS: while I have a strong preference for interpolation (we don't necessarily have to mimic CHEMKIN here), adding the 'previous value' alternative is still relatively simple. I do not think that it's worth going into quadratic or spline variants in this PR.

@ischoegl ischoegl force-pushed the tabulated-func1 branch 4 times, most recently from 8a9d8f7 to 0fb3ece Compare January 27, 2020 14:55
@ischoegl
Copy link
Member Author

@bryanwweber ... I ended up adding the non-interpolating option per your suggestion, i.e.

ct.Func1([0, 1, 1, 2], [0, 0, 1, 1], interpolation='previous') # CHEMKIN style
ct.Func1([0, 1, 1, 2], [0, 0, 1, 1], interpolation='linear')
ct.Func1([0, 1, 1, 2], [0, 0, 1, 1]) # defaults to 'linear'

I am proposing to leave the default as linear as it is more intuitive. The interpolation method strings match scipy.interpolate.interp1d to future-proof potential additions beyond this PR.

@bryanwweber
Copy link
Member

@ischoegl Thanks for adding the "previous" method! I appreciate you taking the time to consider my suggestion and add that. I want to advocate for having the "previous" method be the default, let me know what you think.

I think there is also a physical justification for the "previous" method being the default, beyond compatibility with CHEMKIN (which I agree is not strictly necessary). The justification I'm thinking of is that by assuming a linear variation, we are assuming more about the input data than the user has provided us. We are assuming that if the user were to add additional values in between, they would fall on a straight line between the two surrounding points. However, there is no particular reason we should assume this; the data could just as easily vary quadratically, sinusoidally, etc (note, I'm not suggesting these other options be implemented here). So on that basis, the case with the fewest assumptions on our part is that the data do not vary between the provided points. IMO this is the more reasonable assumption to make. What do you think?

@ischoegl
Copy link
Member Author

ischoegl commented Jan 27, 2020

I want to advocate for having the "previous" method be the default, let me know what you think.

Sure! I am mainly following precedent for default options of scipy.interpolation.interp1d or MATLAB interp1d, i.e. imho 'linear' is the expected behavior. There are also some practical reasons: this approach avoids discontinuous inputs when integrating ODE's. As a physical example, positions resulting from velocity inputs become C2 continuously differentiable, which avoids unphysical 'dirac'-type accelerations.

@bryanwweber
Copy link
Member

As a physical example, positions resulting from velocity inputs become C2 continuously differentiable, which avoids unphysical 'dirac'-type accelerations.

Hmm, this makes a lot of sense as well. In the past, I've always set the maximum time step in the integration to be the smallest time interval between data points, which I think helps with this problem. I wonder how much difference it would make practically.

I am mainly following precedent for default options of scipy.interpolation.interp1d or MATLAB interp1d, i.e. imho 'linear' is the expected behavior.

I think the expected behavior would be linear if the user is expecting interpolation to happen. I don't think it is obvious that interpolation should happen for tabular data input.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 27, 2020

I think the expected behavior would be linear if the user is expecting interpolation to happen. I don't think it is obvious that interpolation should happen for tabular data input.

Hmm. Based on past experience (using MATLAB/Scipy/etc.) I would automatically assume that tabulated input involves interpolation. I guess I've never used CHEMKIN much, so I'm not used to their convention. I.e. there may be no clear answer here.

I am still partial to the physics argument: if given the choice of C1 vs C0 input, I'd stick with the more 'natural' choice. My intuition would be that the system is less stiff for C1, as the ODE solver is unable to predict the effect of discontinuous input for C0. (Ps: similar arguments hold for higher derivatives, but C0 is imho the worst-case scenario for ODE input.)

Considering the alternatives, I'm still advocating for linear.

@ischoegl ischoegl requested review from bryanwweber and removed request for speth February 13, 2020 04:04
@ischoegl
Copy link
Member Author

@speth ... while this is motivated by RCM work, would you mind having a quick look? Parts of this are reviving C++ Func1 objects where I’m not sure what their fate was intended to be.

- add new derived class to Func1.h that interpolates a tabulated function in C++
- the C++ defined Tabulated1 class is added as a variant of the Func1 object
  defined in Python
- this approach is compatible with existing interfaces of various Python
  methods with Func1 arguments
- replace Python lambda function defining constant Func1 variants by
  pre-existing Const1 class defined in C++
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have plenty of misgivings about the current implementation of the Func1 set of classes, but this barely touches on the problematic parts of that interface, so I think it should be okay and won't require too many changes if/when I get around to a more complete overhaul of Func1.

include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
include/cantera/numerics/Func1.h Outdated Show resolved Hide resolved
@ischoegl ischoegl requested a review from speth April 3, 2020 03:39
Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ischoegl! I have a few comments and questions about the Python interface here.

Comment on lines 140 to 144
if arr.shape[1] == 2:
time = arr[:, 0]
fval = arr[:, 1]
method = kwargs.get('interpolation', 'linear')
self._set_tables(time, fval, stringify(method))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to allow row-based input here as well? It seems as long as ndim is 2, it doesn't matter whether it is rows or columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the ambiguity of the case where a 2x2 array is provided? I sort of prefer just having the two-argument form. That way, you can use your input if it's either rows or columns. Assuming numpy arrays, that would just be one of

TabulatedFunction(x[0], x[1])
TabulatedFunction(x[:,0], x[:,1])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a valid point. I don’t have any objections to just requiring the two argument form. @bryanwweber ... any thoughts?

Copy link
Member Author

@ischoegl ischoegl Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to Cantera/enhancements#17, I believe there was the vel_array = np.genfromtxt('velocity-trace.txt') case, with the trace having two columns. I believe it would make sense to allow for a shorthand

TabulatedFunction(vel_array)

I.e. I'm tempted to just revert to the original implementation. I also think it may make sense to support something like [(0, 1), (1, 1.5), (2, 2)] where tuples are used to specify points.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer the two argument form that @speth suggested. A two column or row array is easy enough for the user to specify as two arguments.

Copy link
Member Author

@ischoegl ischoegl Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - two arguments it shall be ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user is coming into this with list of tuples, they can always transform it with something like TabulatedFunction(*zip(*vel_array)).

interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some copy editing changes this time around. Thanks for the quick turnaround!

interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/test/test_func1.py Outdated Show resolved Hide resolved
@ischoegl
Copy link
Member Author

ischoegl commented Apr 3, 2020

@bryanwweber / @speth ... thanks for the comments/recommendations!

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one minor change, and a curiosity question

interfaces/cython/cantera/func1.pyx Show resolved Hide resolved
interfaces/cython/cantera/func1.pyx Outdated Show resolved Hide resolved
@speth speth merged commit 8a0ac7e into Cantera:master Apr 5, 2020
@ischoegl ischoegl deleted the tabulated-func1 branch July 13, 2020 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow arrays as input to time-varying properties
3 participants