Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Update #629

Merged

Conversation

LouisLu705
Copy link
Contributor

@LouisLu705 LouisLu705 commented Apr 26, 2021

Description

This pipeline change implements processing CSV files for meter and readings data in one place. This new process will check that the readings data does not violate expected constraints and outputs warnings as necessary if errors occur. Currently, this is tested on MAMAC data and works for CSV files which may come with several parameters to indicate how OED should process a certain CSV file such as:

onlyEndTime - CSV files with onlyEndTime values given
cumulative - CSV files with readings which are cumulative
Tgap - the allowed time gap in milliseconds that a gap may occur between two readings
Tlen - the allowed time variation in milliseconds that two readings may deviate from each other
cumulativeReset - true if the cumulative data can reset
resetStart - the start time of the allowed time range a cumulative CSV file may reset
resetEnd - the end time of the allowed time range a cumulative CSV file may reset

Fixes #513 and Fixes #546

See @huss comments below on changes that fixes #510, fixes #512, fixes #515, fixes #565 and fixes #641.

Type of change

(Check the ones that apply by placing an "x" instead of the space in the [ ] so it becomes [x])

  • Note merging this changes the node modules
  • Note merging this changes the database configuration.
  • This change requires a documentation update

Checklist

  • I have followed the OED pull request ideas
  • I have removed text in ( ) from the issue request

Limitations

This merges in the csvPipeline. branch creataed by @lindavin.

Currently, we expect that dates are passed in as one of two formats:

(YYYY-MM-DD H:mm) or as (MM-DD-YYYY H:mm) == (2020/4/5 0:30) or as (4/5/2020 0:30)

If the dates are in neither of these forms there may be errors. There is also a slight performance decrease due to checking both formats so if we can default to one specific moment format we may improve performance slightly. The difference I found was 11 seconds longer when testing on OneMinuteTestingData.

Also, we need to connect the user interface to the pipeline so that users can decide if their data has certain parameters such as specific cumulativeResetStart and cumulativeResetEnd times as well as onlyEndTime. Note that currently onlyEndTime is set to be false and if no cumulatveResetStart or cumulativeResetEnd times are given the values respectively default to 0:00:00.000 and 23:59:99.999 indicating that a cumulativeReset may occur from the beginning of any day to the millisecond before the day ends and the next day begins.

We may also consider counting how many errors occur and the client may use a parameter to create a maximum cap before too many errors occur and we drop the CSV file and do not insert anything into the database.

Also, we may make the error messages more descriptive for example letting the user know which reading causes the error including that particular reading value, meter id, start timestamp and end timestamp.

There is now also redundant code in the pipeline that will need to be cleared and testing will need to be done to see if we can use this new pipeline process to handle different types of meters besides Mamac such as Obvius.

Here are some statistics from running the generatedTestData on my local machine with 8 GB of RAM.

fourDayFreqTestData:

real    0m0.395s
user    0m0.087s
sys     0m0.095s

generateFourHourTestingData:

real    0m0.922s
user    0m0.007s
sys     0m0.006s

generateTwentyThreeMinuteTestingData:

real    0m5.593s
user    0m0.000s
sys     0m0.017s

generateFifteenMinuteTestingData:

real    0m8.075s
user    0m0.010s
sys     0m0.011s

generateCosineTestingData:

real    0m5.260s
user    0m0.015s
sys     0m0.000s

generateSineSquaredTestingData:

real    0m0.181s
user    0m0.000s
sys     0m0.013s

generateCosineSquaredTestingData:

real    0m0.149s
user    0m0.000s
sys     0m0.011s

generateVariableAmplitudeTestingData:

  • 15Freq1AmpSineTestData:

     real    0m16.320s
     user    0m0.000s
     sys     0m0.043s
    
  • 15Freq2AmpSineTestData:

     real    0m16.065s
     user    0m0.001s
     sys     0m0.048s
    
  • 15Freq3AmpSineTestData:

     real    0m16.329s
     user    0m0.010s
     sys     0m0.036s
    
  • 15Freq4AmpSineTestData:

     real    0m15.876s
     user    0m0.012s
     sys     0m0.034s
    
  • 15Freq5AmpSineTestData:

     real    0m15.612s
     user    0m0.001s
     sys     0m0.046s
    
  • 15Freq6AmpSineTestData:

     real    0m16.351s
     user    0m0.011s
     sys     0m0.031s
    
  • 15Freq7AmpSineTestData:

     real    0m15.882s
     user    0m0.010s
     sys     0m0.035s
    

generateOneMinuteTestingData:

real    2m1.651s
user    0m0.020s
sys     0m0.222s

@LouisLu705 LouisLu705 marked this pull request as draft April 26, 2021 06:43
Copy link
Member

@huss huss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thank @LouisLu705 for good work on a feature with changing needs. I know this is only in a draft state but I wanted to get in some comments now to keep the process moving. @LouisLu705 and I have talked about moving the files in src/server/test/web/csvPipeline/ to be similar to the test/db/data/ directory and excluding them from the header check to avoid that issue. Some CI test issues are due to including the upcoming CSV upload work and will be dealt with in that upcoming PR. I note that I have only done very limited testing of the pipeline and need to do more types of CSV. Finally, this PR will not be merged until the CSV upload PR has cleared due to dependency on that code.

src/server/services/pipeline-in-progress/loadArrayInput.js Outdated Show resolved Hide resolved
src/server/test/web/csvPipeline/README Outdated Show resolved Hide resolved
src/server/services/csvPipeline/uploadReadings.js Outdated Show resolved Hide resolved
src/server/services/pipeline-in-progress/processData.js Outdated Show resolved Hide resolved
src/server/services/pipeline-in-progress/processData.js Outdated Show resolved Hide resolved
src/server/services/pipeline-in-progress/processData.js Outdated Show resolved Hide resolved
src/server/services/pipeline-in-progress/processData.js Outdated Show resolved Hide resolved
src/server/services/pipeline-in-progress/processData.js Outdated Show resolved Hide resolved
@lindavin lindavin mentioned this pull request May 2, 2021
5 tasks
huss added 15 commits June 29, 2021 18:14
- Also removed old mamac pipeline file and test
- mamac readings generally a whole number but removed round since
OED now deals with real values.
- The loading of all mamac meters at one time on update was removed
since the pipeline does one meter at a time.
- Pipeline no longer specifies how moment should parse date/time.
Testing found it worked fine without except day first but that was not
handled anyway.
- Removed updateMetersTests since we are not doing the rejection as done
before. I hope this is  not overlooking an issue where one update fails
and another one does not work correctly.
- The data is sent through the pipeline rather than direct adding to DB.
The readings are still done one at a time. More work needed but
basic functioning.
- Made await array load await as it should.
- Change so always add last reading to meter instead of if success.
Also, await so that fast next request for same meter works.
If the pipeline is used with a large file with lots of errors, the
returned message could be quite large. This limits it to about 75k
and 75 messages. The log still holds all the messages.
- If the pipeline is called many times quickly, it can be hard to tell
which meter is involved. Thus, the meter name was added to
all the messages.
- Obvius log file tests failing because 1970-01-01 as the date
and this is the same as moment(0) that is used on meter when
there is no data. Also, removed overlapping test since this was
when assumed each reading was 1 hour. Finally, removed blank
line for synthetic CSV data because lead to null reading.
- 1970 is used in a other places but none seem to involve adding
readings to DB so probably okay.
- Fixed the issue with multiple return values from appendMsgTotal
so don't need different named variables.
- Fixed some check/lint issues.
- Updated expected output for new one.
- There were two functions to upload Obvius data. insertObviusData
was not used anywhere so deleted. Anything needed was transferred
to loadLogfileToReadings and is only one now.
- Obvius upload now does each meter as a batch rather than each
reading.
- Obvius readings uploads now honor meter values for upload
parameters. Changed meter creation for config and log so
end only is true.
- Changed all meter creations that were automatic to set the note to
how done and a date/time stamp.
- Fix bug where default meter value was ascending but pipeline has
increasing
- Curl requests were getting the message and now ones from the web page
are getting the same output. It is displayed as an alert so it stays on
the screen and is easy to copy.
- Moved curl message return until after done with deleting files and
refresh. This should make it sooner to when ops actually done.
- I had some issue using <string> instead of <void> so left it as void.
Maybe it can be figured out at a later point.
Some automated calls ignore the values. Also in tests for now.
- Also fixed type of readingsCumulative
- Updates for meters now works with testing added for this.
- The same alert style messages are shown for web page requests as were
done for meter curl requests.
- Note there is a meterName for curl meter requests but not on the meter
web page. This may be fixed later by either adding edit to meter page
and/or adding meter name to csv meter page.
We OED's generalization of adding meters/readings, we are beyond the
meter types known. Thus, an "other" type has been added. It is used when
a meter is automatically created but the type is not know. A site can
also use this type.
A few tests were changed to use this type to be sure it works.
Originally we planned to user Jan 1, 0001 as the default start/end time
on a meter. The code uses moment(0) which is 1970-01-01 00:00:00.
This makes the DB consistent with the code even though it is unlikely
to ever see the DB default because the code sets a default value on
meter creation.
- Almost all were either dealt with in this commit or were already okay.
- Moment conversions were set to strict so unlikely a bad value
will get by.
- Invalid dates or readings (not a number) are now caught with
appropriate message where all reading are rejected.
- Tests were added for all the new checks.
I carefully read the new pipeline code. Mostly just added or edited
comments to make it clearer.
Did one small change that does not change any results but makes
code safer.
@huss huss mentioned this pull request Jul 5, 2021
- The CSV meter input page now accepts a meter name. This is used rather
than value from CSV in same way a curl from previous code.
- Changed a few strings to use MODE instead
- Fixed expected output for actual reading value.
- Removed stray character in pipe41 CSV.
- Fixed pipeline README so descriptions correct.
@huss
Copy link
Member

huss commented Jul 7, 2021

First, I want to thank @LouisLu705 for all the work on the pipeline. I don't have any additional comments on their code.

There are some CodeQL issues that will need to be addressed at some point. It might be after this PR is merged.

I'm pushed a lot of changes/commits to the PR. I kept making changes to finish tasks and it got somewhat out of hand where changes in related code outside the original PR is now included. The positive is that the pipeline and related code should now be fully integrated and working with all options. Given the scope of the work, my changes should be reviewed before they are merged with the PR. The changes are discussed in each commit but here are the highlights:

  • The curl & web page now have the same abilities. This required many changes including unspecified values now default to what is stored on the meter (when appropriate) so Fixes CSV upload should start with DB data #641; the database was modified so it has values for every value that makes sense in the pipeline; some new abilities to handle all the options; update option now correctly works with DB so functional.
  • Messages from the pipeline are now returned and displayed on a web page popup or as the return HTML for curl requests. The messages were enhanced to make them more useable. A maximum size for the returned message so won't report too many issues but all go to the log.
  • MAMAC and Obvius now go through the pipeline. The duplicate capability from previous work is now gone. Previous code for the MAMAC drop zone was removed. Note that the reading length assumption of the old Obvius code is now gone as pipeline can figure out reading length. Fixes redundate pipeline file #510. Partly addresses Continued Obvius work #564.
  • A new meter type of other was added so automatically created meters of unknown origin have a reasonable type. Fixes other meter type #565.
  • The stream input option has been removed. The new pipeline seems to handle reasonable size CSV files so stream was not worth upgrading. Note testing shows that large files can fail if you don't give JS enough memory. This seems more a developer rather than production server issue.
  • The old Metasys import code was removed. The new pipeline does everything it did and more. Fixes use of Metasys pipeline in OED code #512. Fixes missing meter on Metasys inport of data #515 as making it obsolete.
  • The developer script for inputting test data now goes through the pipeline.
  • Made DB have same default start/end date/time as moment(0) so they are consistent. Also changed some of the tests that happened to use this date/time and was causing failure. Many were left as not causing issue.
  • A script with test CSV files is included that runs many tests against the pipeline. Note the hope is that many can be converted to Mocha/Chai tests so they become part of the continuous integration.
  • Many other changes including removal of obsolete tests, few bug fixes, better error checking on some parameters, enhanced commenting, etc.

- Add some tests for cumulative
- Add test for start before end
- minor fixes to processData.js
The --rm on the docker command caused running versions of OED to stop.
As a compromise, this was removed but it means the container
stays around.
This was referenced Aug 13, 2021
Copy link
Contributor

@lindavin lindavin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. I'm very glad that mamac and obvius utilize the same pipeline code now. I don't see any obvious JS errors. I also like the addition of the enums TimeSortTypesJS and BooleanTypesJS in validateCSVUploadParams. In some places we checked if a variable is equal to string 'true', for example, line 150 of /src/server/routes/csv.js. I think this could be changed to utilize BooleanTypeJS as a later TODO.

Changes code to use TimeSortTypesJS & BooleanTypesJS instead of
hardcoded values used before.
@huss
Copy link
Member

huss commented Aug 16, 2021

After my review of the work by @LouisLu705 and the review of my changes by @lindavin, I think this is ready to merge. The CodeQL issues will be addressed at a later time.

@huss huss marked this pull request as ready for review August 16, 2021 14:27
@huss huss merged commit ae958c7 into OpenEnergyDashboard:development Aug 16, 2021
@huss huss modified the milestone: 0.7 release Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants