Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PI information for the Cloud realm #1286

Merged
merged 22 commits into from
May 11, 2020

Conversation

eiffel777
Copy link
Contributor

This PR adds the PI group by to the cloud realm by adding PI data to the cloud aggregation pipeline. Unlike the Jobs and Storage realm, PI information is not included in the data retrieved the resource and must be gathered outside of the cloud systems event logs. To import PI data for the cloud realm a csv file should be made with the following format

pi_name,project_name,resource_name

This csv will be imported using the xdmod-import-csv command with the -t flag set to cloud-project-to-pi. After importing the data, xdmod-ingestor should be run which will add the PI to the appropriate table to make sure the PI is associated with a cloud project. A new pipeline called jobs-cloud-ingest-pi that ingests the PI information into the same tables that the storage and jobs realm ingest their PI data into.

Cloud documentation updates and regression tests for this new feature are also added in this PR.

Tests performed

Tested in docker with test data and with data from the lakeeffect cloud

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@eiffel777 eiffel777 added Category:Cloud Cloud Realm new feature New functionality labels Apr 16, 2020
@eiffel777 eiffel777 added this to the 9.0.0 milestone Apr 16, 2020
@eiffel777 eiffel777 self-assigned this Apr 16, 2020
@jtpalmer
Copy link
Contributor

Can the current ETL code handle CSV? I'm not sure we want to keep maintaining xdmod-import-csv if there are other methods that may be better.

bin/xdmod-import-csv Show resolved Hide resolved
"name": "pi_name",
"type": "varchar(225)",
"nullable": false,
"comment": "Unknown = -1 for global dimensions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Is the string "-1" being used here for unknown PIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's just a bad copy/paste. I've removed it as it's incorrect

@eiffel777
Copy link
Contributor Author

@jtpalmer I looked to see if we were importing csv's with the new ETL code anywhere and I can't find any place we do that. Any importing done with the new ETL code seems to only use json.

jtpalmer
jtpalmer previously approved these changes Apr 16, 2020
@plessbd
Copy link
Contributor

plessbd commented Apr 16, 2020

I believe @jpwhite4 also looked and we do not currently have anything in the etl code that handles CSV

@jpwhite4
Copy link
Member

The cloud documentation in the docs directory needs to be updated also.

@jpwhite4
Copy link
Member

jpwhite4 commented Apr 16, 2020

Also what happens if a pi to project mapping csv file is not provided? Does everything still work as it did before?

@eiffel777
Copy link
Contributor Author

eiffel777 commented Apr 16, 2020

@jpwhite4

The cloud documentation in the docs directory needs to be updated also.

An update to the docs/cloud.md file is included in this PR. I don't think we have cloud documentation in any other files. Was there a specific file you were thinking of?

Also what happens if a pi to project mapping csv file is not provided? Does everything still work as it did before?

If a file isn't provided then the PI for the projects is a -1 in the database and shows up as Unknown in the Usage and Metric Explorer tabs. This also happens for any projects that might not be listed in the ingested csv file.

@jpwhite4
Copy link
Member

In the docs you need to explain what needs to go into the csv file. For example do I put the full name of the person into the file or their system username? How does the name in the file relate to the values in names.csv? Are they associated at all with values in the jobs realm or are they completely separate.

@jpwhite4
Copy link
Member

Also ran the commands listed in the docs and they didn't do anything:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --start-date 2012-01-01 --end-date 2020-12-31

the data still show as unknown in the gui.

image

@jpwhite4
Copy link
Member

BASEDIR=/root/xdmod/tests/ci
REF_SOURCE=`realpath $BASEDIR/../artifacts/xdmod/referencedata`
REPODIR=`realpath $BASEDIR/../../`
REF_DIR=/var/tmp/referencedata

set -e
set -o pipefail

#rm -rf /etc/xdmod

#~/bin/services stop
#rm -rf /var/lib/mysql && mkdir -p /var/lib/mysql


~/bin/services start
mysql -e "CREATE USER 'root'@'gateway' IDENTIFIED BY '';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'gateway' WITH GRANT OPTION;
FLUSH PRIVILEGES;"

expect $BASEDIR/scripts/xdmod-setup-start.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-jobs.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-storage.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-cloud.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-finish.tcl | col -b


xdmod-import-csv -t hierarchy -i $REF_DIR/hierarchy.csv
xdmod-import-csv -t group-to-hierarchy -i $REF_DIR/group-to-hierarchy.csv

for resource in $REF_DIR/*.log; do
   sudo -u xdmod xdmod-shredder -r `basename $resource .log` -f slurm -i $resource;
done

sudo -u xdmod xdmod-shredder -r openstack -d $REF_DIR/openstack -f openstack
sudo -u xdmod xdmod-shredder -r nutsetters -d $REF_DIR/nutsetters -f openstack
sudo -u xdmod xdmod-ingestor

for storage_dir in $REF_DIR/storage/*; do
    sudo -u xdmod xdmod-shredder -f storage -r $(basename $storage_dir) -d $storage_dir
done
last_modified_start_date=$(date +'%F %T')
sudo -u xdmod xdmod-ingestor --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"

sudo -u xdmod xdmod-import-csv -t names -i $REF_DIR/names.csv
sudo -u xdmod xdmod-ingestor
php $BASEDIR/scripts/create_xdmod_users.php

@eiffel777
Copy link
Contributor Author

@jpwhite4
I added documentation updates. Let me know if what I have works or if I should add more.

Also ran the commands listed in the docs and they didn't do anything:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --start-date 2012-01-01 --end-date 2020-12-31

I had the docs wrong on the xdmod-ingestor flags to add. The last-modified-start-date flag should be used instead of start-date and end-date. The command should be

xdmod-ingestor --last-modified-start-date 2012-01-01

I changed this in the documentation

pi,project_name,resource_name
pi2,project_name2,resource_name

The first column should be the username of the PI as seen in your resources event log files. The second column is the name of the project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a concrete example of which field corresponds to the username in the 'resources event log files'? E.G in the openstack data which field is it?


xdmod-ingest-csv -t cloud-project-to-pi -i /path/to/file.csv

After importing this data you must ingest it for the date range of any data you have already shredded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should point out that the date does not correspond to the date range of the data. Instead it is the timestamp of when the data were shredded.

@jpwhite4
Copy link
Member

The instructions in the docs still do not work for me. I ran the script that posted earlier then ran:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --last-modified-start-date 2012-01-01

still shows as unknown.

@jpwhite4
Copy link
Member

I can get the pi stuff in there if I re-shred the data, but ideally we don't want to force everyone to re shred everything if a new project is created. There will almost certainly be a lag between the new project showing in the openstack logs and the csv file being updated to reflect the new project data.

Can you look into what it would take to be able to have it automatically update when the pi project csv import is run? It shouldn't be too difficult presumably you just need to update the account table when import is run?

@eiffel777
Copy link
Contributor Author

I can get the pi stuff in there if I re-shred the data, but ideally we don't want to force everyone to re shred everything if a new project is created. There will almost certainly be a lag between the new project showing in the openstack logs and the csv file being updated to reflect the new project data.

@jpwhite4

This is fixed now so that you don't have to re-shred the data. You should just run the xdmod-csv-import and xdmod-ingestor commands and the data should import correctly.

@jpwhite4
Copy link
Member

jpwhite4 commented May 4, 2020

The tests have log errors:

Cloud/username/cloud_num_sessions_started/aggregate-Quarter-pi IS ONLY ==
Cloud/username/cloud_avg_cores_reserved/aggregate-Year-pi:
Raw Expected:
title
"Average Cores Reserved Weighted By Wall Hours: by System Username"
parameters

"*Restricted To: User = Tern, C"
start,end
2018-04-18,2018-04-30
---------
"System Username","Average Cores Reserved Weighted By Wall Hours"
---------


Raw Actual:
title
"Average Cores Reserved Weighted By Wall Hours: by System Username"
parameters

"*Restricted To: PI = Tern, C OR User = Tern, C"
start,end
2018-04-18,2018-04-30
---------
"System Username","Average Cores Reserved Weighted By Wall Hours"
---------

Since these are new tests, I would not expect them to be giving warnings about the numbers not matching up. Please can you check to see what is the cause of this.

Copy link
Member

@jpwhite4 jpwhite4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check regression tests. I would not expect the new tests to be reporting warnings.

@eiffel777
Copy link
Contributor Author

@jpwhite4 I just checked the regression test issue. When I added the PI information it seems the regression tests for username has a text change. The PI = Tern text was added

"*Restricted To: PI = Tern, C OR User = Tern, C"

where before it was

"*Restricted To: User = Tern, C"

It seems it was just a text difference not numbers being off.

},{
"name": "principalinvestigator_person_id",
"type": "int(11)",
"nullable": true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimensions should not be nullable. You'll need nullable: false. Default -1.
You should also add a comment property that describes this column (DIMENSION: ...)

@eiffel777 eiffel777 merged commit 6e7abfa into ubccr:xdmod9.0 May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Cloud Cloud Realm new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants