-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling architecture #81
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Possibly consider the changes now or on the next iteration.
## TODO: For future | ||
#data_sampling: | ||
## Possible methods: time, count, fraction | ||
## starttime and endtime format is dd-mm-yyyy hh:mm:ss in UTC timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to be able to specify an array of ranges? This way if you wanted a value single range you could just specify that and if you wanted a sequence of ranges this could also be provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add that.
# count: 2 | ||
#method: userid | ||
#config: | ||
# userids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep fixed on subjectID or userID? For historical reasons, SubjectID is the name we chose for the main ID we use on the platform (with potentially UserID being introduced later when we have the self-enrollment portal). It may not result in too much confusion, as this is more on the analysis side, but I'd point this out perhaps it is sensible to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could change it to subjectId. I used user-id because the ID column in the output files is key.userID
@afolarin added the code for multiple time ranges. Please let me know if that looks good to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added user and data sampling mechanisms. User sampling mechanism contains option to choose users by fraction, count and IDs Data sampling mechanisms include choosing data between time ranges, by count and by fraction.