You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When setting min_history larger than incoming batch size, the expectation is that the stage will cache rows from incoming batches (for a user) until min_history is met. However, we are seeing that only rows from last batch are being emitted to next stage.
For example, the Azure training example reads in a total of 3239 rows. If we set min_history to 3000, the stage should emit at least 3000 rows for preprocessing/training. Instead, it emits only the 107 rows from the last batch.
Follow instructions to run Azure training example.
Relevant log output
Rolling window completeforgeneric_userin 21.89 ms. Input: 107 rows from 2022-08-27 00:06:18.712616+00:00 to 2022-08-27 23:49:42.173263+00:00. Output: 107 rows from 2022-08-27 00:06:18.712616+00:00 to 2022-08-27 23:49:42.173263+00:00
Preprocessed 107 data forlogsin 2022-08-27 00:06:18.712616+00:00 to 2022-08-27 23:49:42.173263+00:00 in 484.33971405029297 ms
Training AE model for user: 'generic_user'...
Training AE model for user: 'generic_user'... Complete.
ML Flow model upload complete: generic_user:DFP-azure-generic_user:1
Input data rate[Complete]: 3239 messages [00:00, 4587.13 messages/s]
Training rate[Complete]: 107 messages [00:06, 17.24 messages/s]
DFPRollingWindowStage was only emitting last batch once `min_history` was met. This PR updates the stage to emit all accumulated rows meeting configured window history requirements.
Fixes#674
Authors:
- Eli Fajardo (https://github.com/efajardo-nv)
Approvers:
- Michael Demoret (https://github.com/mdemoret-nv)
URL: #683
jjacobelli
pushed a commit
to jjacobelli/Morpheus
that referenced
this issue
Mar 7, 2023
Version
23.01
Which installation method(s) does this occur on?
Docker
Describe the bug.
When setting
min_history
larger than incoming batch size, the expectation is that the stage will cache rows from incoming batches (for a user) untilmin_history
is met. However, we are seeing that only rows from last batch are being emitted to next stage.For example, the Azure training example reads in a total of 3239 rows. If we set
min_history
to 3000, the stage should emit at least 3000 rows for preprocessing/training. Instead, it emits only the 107 rows from the last batch.Minimum reproducible example
Set
min_history=3000
forDFPRollingWindowStage
in dfp_azure_pipeline.py.Follow instructions to run Azure training example.
Relevant log output
Full env printout
Click here to see environment details
Code of Conduct
The text was updated successfully, but these errors were encountered: