-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing issues in models.scanKey #4841
Conversation
I also plan on fixing the lint issues in the |
This parsing code has been pretty highly tuned because it's used in a lot of places (HTTP/UDP inputs, hinted handoff, WAL, cluster writes, exports, etc..). The performance decrease is fairly significant because it reduces throughput by 20-45% in common cases.
The I have not profiled this change, but I suspect the performance decrease is due to that closures and calling a func per char whereas before it would just increment an integer. The code doesn't seem much clearer to me either unfortunately, but I see where you were going with it. I'd suggest reworking this to have I'm 👎 on this current PR, but I think it could be reworked to address these concerns and fix the original parser bugs. |
Hi @jwilder, your points seem completely reasonable to me. This is my first contribution to the project and as you can imagine I have a shallow understanding of the entire codebase, and the impact of I will profile and see what I can do to get the benchmarks up to their previous levels 👍 If that's unsuccessful, then I have another branch waiting that will simply fix the outstanding issues. |
c113d99
to
e995fba
Compare
I've modified the approach to bring performance in line with expectation, while keeping separation of the different parts of the scanning process in different functions. And of course the bugs should still be fixed 😄 Results of six runs of the following benchmark command:
So performance is now on a par or slightly better than before. Thoughts? |
t.Errorf(`ParsePoints("%s") mismatch. got nil, exp error`, "cpu, value=1") | ||
} | ||
|
||
_, err = models.ParsePointsString("cpu,,, value=1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case appears to be missing. Assuming this is covered by the cpu,
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is covered by the cpu,
case, but I didn't intend to leave it out. I've added it back (you can never have enough test cases 😆).
I gave them all the once over and added a couple more.
@e-dard Nice work! Not only fixed the bugs, but code is simpler and faster. One question, but this looks great. 👍 |
@jwilder couple of things:
Consider I think the file could do with a review of all the functions that are incorrectly skipping |
c7f113e
to
04eea17
Compare
@@ -26,6 +26,7 @@ There are breaking changes in this release: | |||
- Scripts are now located in `/usr/lib/influxdb/scripts` (previously `/opt/influxdb`) | |||
|
|||
### Features | |||
- [#4841](https://github.com/influxdb/influxdb/pull/4841): Improve parsing of measurements and tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be a little bit more specific here? The user-visible improvement would be less CPU consumption?, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes. I wrote that before I made the performance improvements. Not added to the CHANGELOG
before, how about:
Improve point parsing speed
Also, I just realised I put it under the features
section when it started off as a bug fix. Should I move it or leave it where it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, it's neither one or other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll leave it where it is. Are you happy with:
Improve point parsing speed
Any better suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me -- thanks. You can mention you linted the code as well.
@e-dard - have you signed the CLA? |
@otoolep I did when I contributed to |
As long as you signed https://influxdb.com/community/cla.html we're good. |
Fix parsing issues in models.scanKey
The PR refactors
models.scanKey
, which is used to parse the measurements and tags. It:scanKey
logic by broadly modelling it as a state machine;cpu,
#4770 (and some other non-documented incorrect error messages);scanKey
function.Initially I planned on simply fixing #3070 and #4770, but I felt that the
scanKey
function was pretty complicated, and had had extra conditionals and checks added over time, some of which were redundant, or unreachable. It seemed like it was worth refactoring this function into something easier to understand.I've ran benchmarks before and after and performance is broadly inline with the previous implementation, if not a little faster for some benchmarks.
(Updated)