-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PI information for the Cloud realm #1286
Add PI information for the Cloud realm #1286
Conversation
…g. adding group by for PI
Can the current ETL code handle CSV? I'm not sure we want to keep maintaining |
"name": "pi_name", | ||
"type": "varchar(225)", | ||
"nullable": false, | ||
"comment": "Unknown = -1 for global dimensions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. Is the string "-1" being used here for unknown PIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's just a bad copy/paste. I've removed it as it's incorrect
@jtpalmer I looked to see if we were importing csv's with the new ETL code anywhere and I can't find any place we do that. Any importing done with the new ETL code seems to only use json. |
I believe @jpwhite4 also looked and we do not currently have anything in the etl code that handles CSV |
The cloud documentation in the docs directory needs to be updated also. |
Also what happens if a pi to project mapping csv file is not provided? Does everything still work as it did before? |
An update to the docs/cloud.md file is included in this PR. I don't think we have cloud documentation in any other files. Was there a specific file you were thinking of?
If a file isn't provided then the PI for the projects is a -1 in the database and shows up as Unknown in the Usage and Metric Explorer tabs. This also happens for any projects that might not be listed in the ingested csv file. |
In the docs you need to explain what needs to go into the csv file. For example do I put the full name of the person into the file or their system username? How does the name in the file relate to the values in names.csv? Are they associated at all with values in the jobs realm or are they completely separate. |
|
@jpwhite4
I had the docs wrong on the xdmod-ingestor flags to add. The
I changed this in the documentation |
pi,project_name,resource_name | ||
pi2,project_name2,resource_name | ||
|
||
The first column should be the username of the PI as seen in your resources event log files. The second column is the name of the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a concrete example of which field corresponds to the username in the 'resources event log files'? E.G in the openstack data which field is it?
|
||
xdmod-ingest-csv -t cloud-project-to-pi -i /path/to/file.csv | ||
|
||
After importing this data you must ingest it for the date range of any data you have already shredded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should point out that the date does not correspond to the date range of the data. Instead it is the timestamp of when the data were shredded.
The instructions in the docs still do not work for me. I ran the script that posted earlier then ran:
still shows as unknown. |
I can get the pi stuff in there if I re-shred the data, but ideally we don't want to force everyone to re shred everything if a new project is created. There will almost certainly be a lag between the new project showing in the openstack logs and the csv file being updated to reflect the new project data. Can you look into what it would take to be able to have it automatically update when the pi project csv import is run? It shouldn't be too difficult presumably you just need to update the account table when import is run? |
…i infomation to be linked to cloud projects
This is fixed now so that you don't have to re-shred the data. You should just run the xdmod-csv-import and xdmod-ingestor commands and the data should import correctly. |
The tests have log errors:
Since these are new tests, I would not expect them to be giving warnings about the numbers not matching up. Please can you check to see what is the cause of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check regression tests. I would not expect the new tests to be reporting warnings.
@jpwhite4 I just checked the regression test issue. When I added the PI information it seems the regression tests for username has a text change. The PI = Tern text was added
where before it was
It seems it was just a text difference not numbers being off. |
},{ | ||
"name": "principalinvestigator_person_id", | ||
"type": "int(11)", | ||
"nullable": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimensions should not be nullable. You'll need nullable: false. Default -1.
You should also add a comment property that describes this column (DIMENSION: ...)
This PR adds the PI group by to the cloud realm by adding PI data to the cloud aggregation pipeline. Unlike the Jobs and Storage realm, PI information is not included in the data retrieved the resource and must be gathered outside of the cloud systems event logs. To import PI data for the cloud realm a csv file should be made with the following format
This csv will be imported using the xdmod-import-csv command with the
-t
flag set tocloud-project-to-pi
. After importing the data,xdmod-ingestor
should be run which will add the PI to the appropriate table to make sure the PI is associated with a cloud project. A new pipeline calledjobs-cloud-ingest-pi
that ingests the PI information into the same tables that the storage and jobs realm ingest their PI data into.Cloud documentation updates and regression tests for this new feature are also added in this PR.
Tests performed
Tested in docker with test data and with data from the lakeeffect cloud
Types of changes
Checklist: