-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline Update #629
Pipeline Update #629
Conversation
…unecessary debug statements.
…nd single quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thank @LouisLu705 for good work on a feature with changing needs. I know this is only in a draft state but I wanted to get in some comments now to keep the process moving. @LouisLu705 and I have talked about moving the files in src/server/test/web/csvPipeline/ to be similar to the test/db/data/ directory and excluding them from the header check to avoid that issue. Some CI test issues are due to including the upcoming CSV upload work and will be dealt with in that upcoming PR. I note that I have only done very limited testing of the pipeline and need to do more types of CSV. Finally, this PR will not be merged until the CSV upload PR has cleared due to dependency on that code.
src/server/services/pipeline-in-progress/handleCumulativeReset.js
Outdated
Show resolved
Hide resolved
…larify complex logic.
- Also removed old mamac pipeline file and test - mamac readings generally a whole number but removed round since OED now deals with real values. - The loading of all mamac meters at one time on update was removed since the pipeline does one meter at a time. - Pipeline no longer specifies how moment should parse date/time. Testing found it worked fine without except day first but that was not handled anyway. - Removed updateMetersTests since we are not doing the rejection as done before. I hope this is not overlooking an issue where one update fails and another one does not work correctly.
- The data is sent through the pipeline rather than direct adding to DB. The readings are still done one at a time. More work needed but basic functioning. - Made await array load await as it should. - Change so always add last reading to meter instead of if success. Also, await so that fast next request for same meter works.
If the pipeline is used with a large file with lots of errors, the returned message could be quite large. This limits it to about 75k and 75 messages. The log still holds all the messages.
- If the pipeline is called many times quickly, it can be hard to tell which meter is involved. Thus, the meter name was added to all the messages. - Obvius log file tests failing because 1970-01-01 as the date and this is the same as moment(0) that is used on meter when there is no data. Also, removed overlapping test since this was when assumed each reading was 1 hour. Finally, removed blank line for synthetic CSV data because lead to null reading. - 1970 is used in a other places but none seem to involve adding readings to DB so probably okay. - Fixed the issue with multiple return values from appendMsgTotal so don't need different named variables. - Fixed some check/lint issues. - Updated expected output for new one.
- There were two functions to upload Obvius data. insertObviusData was not used anywhere so deleted. Anything needed was transferred to loadLogfileToReadings and is only one now. - Obvius upload now does each meter as a batch rather than each reading. - Obvius readings uploads now honor meter values for upload parameters. Changed meter creation for config and log so end only is true. - Changed all meter creations that were automatic to set the note to how done and a date/time stamp. - Fix bug where default meter value was ascending but pipeline has increasing
- Curl requests were getting the message and now ones from the web page are getting the same output. It is displayed as an alert so it stays on the screen and is easy to copy. - Moved curl message return until after done with deleting files and refresh. This should make it sooner to when ops actually done. - I had some issue using <string> instead of <void> so left it as void. Maybe it can be figured out at a later point.
Some automated calls ignore the values. Also in tests for now.
- Also fixed type of readingsCumulative
- Updates for meters now works with testing added for this. - The same alert style messages are shown for web page requests as were done for meter curl requests. - Note there is a meterName for curl meter requests but not on the meter web page. This may be fixed later by either adding edit to meter page and/or adding meter name to csv meter page.
We OED's generalization of adding meters/readings, we are beyond the meter types known. Thus, an "other" type has been added. It is used when a meter is automatically created but the type is not know. A site can also use this type. A few tests were changed to use this type to be sure it works.
Originally we planned to user Jan 1, 0001 as the default start/end time on a meter. The code uses moment(0) which is 1970-01-01 00:00:00. This makes the DB consistent with the code even though it is unlikely to ever see the DB default because the code sets a default value on meter creation.
- Almost all were either dealt with in this commit or were already okay. - Moment conversions were set to strict so unlikely a bad value will get by. - Invalid dates or readings (not a number) are now caught with appropriate message where all reading are rejected. - Tests were added for all the new checks.
I carefully read the new pipeline code. Mostly just added or edited comments to make it clearer. Did one small change that does not change any results but makes code safer.
- The CSV meter input page now accepts a meter name. This is used rather than value from CSV in same way a curl from previous code. - Changed a few strings to use MODE instead - Fixed expected output for actual reading value. - Removed stray character in pipe41 CSV. - Fixed pipeline README so descriptions correct.
First, I want to thank @LouisLu705 for all the work on the pipeline. I don't have any additional comments on their code. There are some CodeQL issues that will need to be addressed at some point. It might be after this PR is merged. I'm pushed a lot of changes/commits to the PR. I kept making changes to finish tasks and it got somewhat out of hand where changes in related code outside the original PR is now included. The positive is that the pipeline and related code should now be fully integrated and working with all options. Given the scope of the work, my changes should be reviewed before they are merged with the PR. The changes are discussed in each commit but here are the highlights:
|
- Add some tests for cumulative - Add test for start before end - minor fixes to processData.js
The --rm on the docker command caused running versions of OED to stop. As a compromise, this was removed but it means the container stays around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. I'm very glad that mamac and obvius utilize the same pipeline code now. I don't see any obvious JS errors. I also like the addition of the enums TimeSortTypesJS and BooleanTypesJS in validateCSVUploadParams. In some places we checked if a variable is equal to string 'true', for example, line 150 of /src/server/routes/csv.js. I think this could be changed to utilize BooleanTypeJS as a later TODO.
Changes code to use TimeSortTypesJS & BooleanTypesJS instead of hardcoded values used before.
After my review of the work by @LouisLu705 and the review of my changes by @lindavin, I think this is ready to merge. The CodeQL issues will be addressed at a later time. |
Description
This pipeline change implements processing CSV files for meter and readings data in one place. This new process will check that the readings data does not violate expected constraints and outputs warnings as necessary if errors occur. Currently, this is tested on MAMAC data and works for CSV files which may come with several parameters to indicate how OED should process a certain CSV file such as:
onlyEndTime - CSV files with onlyEndTime values given
cumulative - CSV files with readings which are cumulative
Tgap - the allowed time gap in milliseconds that a gap may occur between two readings
Tlen - the allowed time variation in milliseconds that two readings may deviate from each other
cumulativeReset - true if the cumulative data can reset
resetStart - the start time of the allowed time range a cumulative CSV file may reset
resetEnd - the end time of the allowed time range a cumulative CSV file may reset
Fixes #513 and Fixes #546
See @huss comments below on changes that fixes #510, fixes #512, fixes #515, fixes #565 and fixes #641.
Type of change
(Check the ones that apply by placing an "x" instead of the space in the [ ] so it becomes [x])
Checklist
Limitations
This merges in the csvPipeline. branch creataed by @lindavin.
Currently, we expect that dates are passed in as one of two formats:
(YYYY-MM-DD H:mm) or as (MM-DD-YYYY H:mm) == (2020/4/5 0:30) or as (4/5/2020 0:30)
If the dates are in neither of these forms there may be errors. There is also a slight performance decrease due to checking both formats so if we can default to one specific moment format we may improve performance slightly. The difference I found was 11 seconds longer when testing on OneMinuteTestingData.
Also, we need to connect the user interface to the pipeline so that users can decide if their data has certain parameters such as specific cumulativeResetStart and cumulativeResetEnd times as well as onlyEndTime. Note that currently onlyEndTime is set to be false and if no cumulatveResetStart or cumulativeResetEnd times are given the values respectively default to 0:00:00.000 and 23:59:99.999 indicating that a cumulativeReset may occur from the beginning of any day to the millisecond before the day ends and the next day begins.
We may also consider counting how many errors occur and the client may use a parameter to create a maximum cap before too many errors occur and we drop the CSV file and do not insert anything into the database.
Also, we may make the error messages more descriptive for example letting the user know which reading causes the error including that particular reading value, meter id, start timestamp and end timestamp.
There is now also redundant code in the pipeline that will need to be cleared and testing will need to be done to see if we can use this new pipeline process to handle different types of meters besides Mamac such as Obvius.
Here are some statistics from running the generatedTestData on my local machine with 8 GB of RAM.
fourDayFreqTestData:
generateFourHourTestingData:
generateTwentyThreeMinuteTestingData:
generateFifteenMinuteTestingData:
generateCosineTestingData:
generateSineSquaredTestingData:
generateCosineSquaredTestingData:
generateVariableAmplitudeTestingData:
15Freq1AmpSineTestData:
15Freq2AmpSineTestData:
15Freq3AmpSineTestData:
15Freq4AmpSineTestData:
15Freq5AmpSineTestData:
15Freq6AmpSineTestData:
15Freq7AmpSineTestData:
generateOneMinuteTestingData: