-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso #7270
Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso #7270
Conversation
4120352
to
d93ff75
Compare
benchmarks-vectorized-2023-07-13-21.txt Results of final run of todays benchmarks. Tomorrow I will analyze them and write up a summary. |
Comparing performance Java to Enso roundtrip vs staying in Enso, on a primitive value (
|
Comparing a Boxed value (
|
Comparing Enso vs Java vectorized operation on a Boxed value (
|
Comparing Enso vs Java vectorized operation on a primitive value (
|
The Enso workflow I'm using to analyze benchmarks |
Revisiting Boxed Map operationThe previous Boxed map operation was relying on a pretty heavy I created a new benchmark, using Here we can see that the Java vectorized operation is fast. It is 12.5x faster than the fastest Enso implementation. So here we are getting results similar to the divide in primitive operations, even though here the Date is boxed. It may be additional overhead of polyglot conversions between The comparisons also show us some guideline for library development - the approach with builder is faster than rebuilding from vector, and curiously a presumably not boxing |
I'm thinking it could be worth to get this PR merged to keep these benchmarks for future reference. The only blocking issue is some modifications in Table Java helpers that should not be 'leaked' to production code just because some benchmarks wanted them. Next week I will try to refactor them into separate helpers placed in the Benchmarks project. This may slightly influence the results (as they will need to use indirection of calling methods on Storage instead of direct array access). I will post the updated results then, they will show a more realistic scenario anyway. |
SummaryThis data gives us two main insights. I will summarize them shortly here and then link to tickets. Improving current Enso librariesIt seems that for boxed types, We should revisit places where we use non-vectorized operations and amend these to use the best tool for each. Some further benchmarks may be needed to judge as we have combinations of Second thing, we modified Optimizing EnsoWe are seeing that Enso is ~3x slower than Java on concatenating two string columns. That is a decent result but possibly could be better. Even more, we can see that other boxed operations like mapping Then, there are operations on primitive values. There also many are roughly 12x slower than Java. Ideally we should get these results closer as well, recognizing it may be harder to achieve then for boxed values, as Java has more advantage in this case. An optimized primitive aggregation that is not allocating (but returning a scalar) is as much as 70x faster in Java! |
This is interesting. Does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the tests is good, but until there is an objective way to execute the tests reliably and daily, I would avoid making conclusions of what needs to be fixed. Also related to #5067. I assume I need some assisted guidance/walk thru to understand the individual tests more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having these benchmarks is interesting and important. At the end most of Enso users are going to operate on Table
and Column
objects and we need to make sure the operations are fast.
On the other hand, I have to admit, it is all a bit too fuzzy for me. It is a bunch of code and I don't even know where to start, how to run a single benchmark, how to warm it up, how to compare the results, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to run a single benchmark
To run a single benchmark, we just need to run the main
function. I use the launcher - enso run benchmark-file.enso
. The Run_All.enso
is just a shortcut to run all of these.
How to warm it up
I run the benchmark on a 1 million row table 100 times. We can then look at the graphs to see after how many iterations the performance improves.
When comparing the average run times, I was dropping the first 40 iterations and working with averages over the last 60 iterations, assuming that 40 iterations was enough for these 10^6 loops to sufficiently warm up.
I'm not sure how better we can measure the warm up in a more precise manner, I'm open to receiving help in that regard.
On the other hand, I think this methodology, while rather simple, may actually be good for us (feel free to prove me wrong).
It may not measure the highest peak performance, if I do not warm up it enough. However, I assume that our users will most often use the Table library on data on the order of magnitude of 10k to 100M rows, and so it seems that we may be most interested in performance on such magnitudes after 1-2 runs. If peak performance is achieved too late, it will not be noticed by the user and what we care most about is that Enso runs smoothly when working live. So this seemed like a good enough measurement of the execution times that the users will 'feel'.
How to compare the results
I may need more clarification what you are asking for here. Practically, I was using the workflow I linked below - I can add it to the repository if you think it is worth it. In the workflow we load a file and can then choose which benchmark we display, we can view the scatterplot and compute average runtimes and relative differences between averages.
In terms of 'what' we are comparing:
I've created a separate Enso type for different types of 'tasks'. Each Enso type represents one 'thing we want to compute' - like adding two columns, executing some function over a column returning a new column or a scalar. Then, each method inside of this type represents one 'approach' - i.e. one particular implementation getting us that results. We can compare various methods performance between each other within a single 'task' to see which one is most efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to run a single benchmark?
TheRun_All.enso
is just a shortcut to run all of these.
I am not sure what you mean by Run_All
. enso$ find . | grep Run_All
yields nothing.
To run a single benchmark, we just need to run the
main
function.
I use the launcher -enso run benchmark-file.enso
.
Do you mean I should run:
enso$ ./built-distribution/enso-engine-0.0.0-dev-linux-amd64/enso-0.0.0-dev/bin/enso --run test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:12:1: error: The `project` keyword was used in an import statement, but the module does not belong to a project.
12 | import project.Table.Performance_Research.Common_Setup.Common_Setup
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:13:1: error: The `project` keyword was used in an import statement, but the module does not belong to a project.
13 | import project.Table.Performance_Research.Helpers
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:32:9: error: The name `Helpers` could not be found.
32 | Helpers.column_from_vector "result" mapped convert_polyglot_dates=convert_polyglot_dates
| ^~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:60:14: error: The name `Common_Setup` could not be found.
60 | main = spec (Common_Setup.Config)
| ^~~~~~~~~~~~
Aborting due to 4 errors and 0 warnings.
Execution finished with an error: Compilation aborted due to errors.
that doesn't seem to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover I guess we are facing a problem of different perspective...
To run a single benchmark, we just need to run the main function.
by running enso --run test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso
I will not run a single benchmark, but five different Bench.measure
!
I need to run just a single Bench.measure
and run it in isolation with arguments to dump IGV graphs. If I run five benchmarks (aka Bench.measure
) at once, I will not even be able to find out which graph belongs to which Bench.measure
.
We really need an ability to run just a single Bench.measure
and as such I'd like to advocate immediate switching to #7101 Bench
builder pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to warm it up?
... dropping the first 40 iterations and
working with averages over the last 60 iterations,
The JMH library has concept of warmup
and measurement
. You are basically saying that @Warmup(iterations=40)
and @Measurement(iterations=60)
. That's a fair methodology.
The more JMH-like our harness is, the easier will be the integration with existing benchmarks. As such I'd like to advocate using the same warmup & measurement terminology as for example here.
assuming that 40 iterations was enough for these 10^6 loops to sufficiently warm up.
There is a way to detect a compilation kicked in unexpectedly and fail the benchmark. I proposed it when I wanted to speed the benchmarks run by lowering the number of warmup iterations as much as possible in #6271
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't seem to work.
That's why I use the launcher 😉
If you are using the runner
directly, you need to add --in-project
with path to project root for this to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/Benchmarks/src/Table/Performance_Research/Column_Bi_Map.enso
Outdated
Show resolved
Hide resolved
@@ -142,6 +142,20 @@ public static Column fromItems(String name, List<Value> items) { | |||
return new Column(name, storage); | |||
} | |||
|
|||
/** Creates a column from an Enso array. No polyglot conversion happens. This is unsafe */ | |||
public static Column fromItemsRaw(String name, List<Object> items, StorageType expectedType) throws ClassCastException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do remember when fromItems
introduced List<Value>
... deciding whether List<Value>
is needed or whether List<Object>
is enough is a delicate dance balancing GraalVM Host Interop, GraalVM version, Enso implementation of interop and needs of the libraries.
In any case, I believe the #1 thing we need is to get stable, independent benchmarks being executed daily by the CI. Without it we will just shift the code back and forth without being sure the system is improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I will keep it as is, but I will be adding tickets to revisit this.
I don't necessarily need another "I wish Enso interpreter would be lightening fast" ticket, but once there is a way to run individual benchmark in separation feel free to report where a single benchmark lacks behind another. There already is but it was reported the "old way". Such a colloquial report is unhelpful as it requires extra time and work to turn the benchmark into JMH benchmark. To avoid such a useless operation, I'd like us to fix the benchmarking infrastructure first before we flood the issue tracker with more #5067-like reports which are too fuzzy from an interpreter prespective. |
This workflow is prepared mostly to analyse the output of benchmarks | ||
from `test/Exploratory_Benchmarks`. See `test/Exploratory_Benchmarks/README.md` | ||
for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works with the output of any of our current text based benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as their name the benchmarks using the pattern <group>.<single-bench-name>
, then yes.
The idea of the workflow is that we display plots comparing <single-bench-name>
inside of a single <group>
. If the benchmarks are also grouped like that and we want to compare them within groups, then yes. If not - we'd need to adapt it but we can re-use at least the data loading part (which is just a few first nodes).
Summarizing findings
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why my review is needed? Maybe because of build.sbt
changes? Anyway (as expressed in the comments), I don't like the (old) way benchmarking is done in Enso and I am not keen on seeing more benchmarks written in the old way. As such I've just started #7324 to give us more reusable "builder API". I would personally prefer if #7324 was integrated first, this PR got adjusted to the new "builder API" and only then integrated. However I am approving anyway, as I believe my approval shouldn't be needed for (almost) pure Enso code changes.
Designing new `Bench` API to _collect benchmarks_ first and only execute them then. This is a minimal change to allow implementation of #7323 - e.g. ability to invoke a _single benchmark_ via JMH harness. # Important Notes This is just the basic API skeleton. It can be enhanced, if the basic properties (allowing integration with JMH) are kept. It is not intent of this PR to make the API 100% perfect and usable. Neither it is goal of this PR to update existing benchmarks to use it (74ac8d7 changes only one of them to demonstrate _it all works_ somehow). It is however expected that once this PR is integrated, the newly written benchmarks (like the ones from #7270) are going to use (or even enhance) the new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thorough exploration of the space.
I'd like us to avoid changing the core type yet but agree the next steps and what we should take for here as immediate changes.
We should then work to fold this into the new benchmark framework being built up by @Akirathan
@@ -142,6 +142,20 @@ public static Column fromItems(String name, List<Value> items) { | |||
return new Column(name, storage); | |||
} | |||
|
|||
/** Creates a column from an Enso array. No polyglot conversion happens. This is unsafe */ | |||
public static Column fromItemsRaw(String name, List<Object> items, StorageType expectedType) throws ClassCastException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I don't think we should add this to the core Column
type.
It can be a static method anywhere from my read so I suggest it goes into the exploratory helper Java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep it in std-lib, because I actually plan to use it in #6111 - there when we know the expected value type, if the value type is not Mixed
nor Date
(etc.), we can guarantee no dates will come in - and so we do not need the conversion to happen. This will make the map operation and construction of columns like integer column much faster.
I can make it a static, but since I plan to use it in the main codebase soon, I did not want to move it to the helpers - since I'd have to move it back again when implementing #6111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair - felt for an experiment we shouldn't be changing the core but happy given next step will be to change it.
- Fixes #7231 - Cleans up vectorized operations to distinguish unary and binary operations. - Introduces MixedStorage which may pretend to be a more specialized storage on demand. - Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type. - Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso - Tests for comparison of Mixed storages with each other and other types - Started using `Set` for `Filter_Condition.Is_In` for better performance. - ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~ - This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371 - ~~Trying to avoid conversions when calling Enso functions from Java.~~ - Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error. - Not sure how much it really helps. [Benchmarks](#7270 (comment)) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results. - Created #7378 to track this.
Pull Request Description
The added benchmark is a basis for a performance investigation.
We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.