Add PI information for the Cloud realm #1286

eiffel777 · 2020-04-16T13:52:32Z

This PR adds the PI group by to the cloud realm by adding PI data to the cloud aggregation pipeline. Unlike the Jobs and Storage realm, PI information is not included in the data retrieved the resource and must be gathered outside of the cloud systems event logs. To import PI data for the cloud realm a csv file should be made with the following format

pi_name,project_name,resource_name

This csv will be imported using the xdmod-import-csv command with the -t flag set to cloud-project-to-pi. After importing the data, xdmod-ingestor should be run which will add the PI to the appropriate table to make sure the PI is associated with a cloud project. A new pipeline called jobs-cloud-ingest-pi that ingests the PI information into the same tables that the storage and jobs realm ingest their PI data into.

Cloud documentation updates and regression tests for this new feature are also added in this PR.

Tests performed

Tested in docker with test data and with data from the lakeeffect cloud

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project as found in the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

…g. adding group by for PI

…c cloud pipeline

…inition file

jtpalmer · 2020-04-16T14:01:08Z

Can the current ETL code handle CSV? I'm not sure we want to keep maintaining xdmod-import-csv if there are other methods that may be better.

bin/xdmod-import-csv

jtpalmer · 2020-04-16T14:09:06Z

configuration/etl/etl_tables.d/cloud_common/staging_cloud_project_to_pi.json

+                "name": "pi_name",
+                "type": "varchar(225)",
+                "nullable": false,
+                "comment": "Unknown = -1 for global dimensions"


I don't understand this comment. Is the string "-1" being used here for unknown PIs?

Yeah, that's just a bad copy/paste. I've removed it as it's incorrect

eiffel777 · 2020-04-16T14:41:47Z

@jtpalmer I looked to see if we were importing csv's with the new ETL code anywhere and I can't find any place we do that. Any importing done with the new ETL code seems to only use json.

plessbd · 2020-04-16T17:41:04Z

I believe @jpwhite4 also looked and we do not currently have anything in the etl code that handles CSV

jpwhite4 · 2020-04-16T18:03:36Z

The cloud documentation in the docs directory needs to be updated also.

configuration/etl/etl_action_defs.d/cloud_common/pi.json

jpwhite4 · 2020-04-16T18:05:57Z

Also what happens if a pi to project mapping csv file is not provided? Does everything still work as it did before?

eiffel777 · 2020-04-16T18:30:31Z

@jpwhite4

The cloud documentation in the docs directory needs to be updated also.

An update to the docs/cloud.md file is included in this PR. I don't think we have cloud documentation in any other files. Was there a specific file you were thinking of?

Also what happens if a pi to project mapping csv file is not provided? Does everything still work as it did before?

If a file isn't provided then the PI for the projects is a -1 in the database and shows up as Unknown in the Usage and Metric Explorer tabs. This also happens for any projects that might not be listed in the ingested csv file.

jpwhite4 · 2020-04-16T19:17:36Z

In the docs you need to explain what needs to go into the csv file. For example do I put the full name of the person into the file or their system username? How does the name in the file relate to the values in names.csv? Are they associated at all with values in the jobs realm or are they completely separate.

jpwhite4 · 2020-04-16T20:14:51Z

Also ran the commands listed in the docs and they didn't do anything:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --start-date 2012-01-01 --end-date 2020-12-31

the data still show as unknown in the gui.

jpwhite4 · 2020-04-16T20:15:16Z

BASEDIR=/root/xdmod/tests/ci
REF_SOURCE=`realpath $BASEDIR/../artifacts/xdmod/referencedata`
REPODIR=`realpath $BASEDIR/../../`
REF_DIR=/var/tmp/referencedata

set -e
set -o pipefail

#rm -rf /etc/xdmod

#~/bin/services stop
#rm -rf /var/lib/mysql && mkdir -p /var/lib/mysql


~/bin/services start
mysql -e "CREATE USER 'root'@'gateway' IDENTIFIED BY '';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'gateway' WITH GRANT OPTION;
FLUSH PRIVILEGES;"

expect $BASEDIR/scripts/xdmod-setup-start.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-jobs.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-storage.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-cloud.tcl | col -b
expect $BASEDIR/scripts/xdmod-setup-finish.tcl | col -b


xdmod-import-csv -t hierarchy -i $REF_DIR/hierarchy.csv
xdmod-import-csv -t group-to-hierarchy -i $REF_DIR/group-to-hierarchy.csv

for resource in $REF_DIR/*.log; do
   sudo -u xdmod xdmod-shredder -r `basename $resource .log` -f slurm -i $resource;
done

sudo -u xdmod xdmod-shredder -r openstack -d $REF_DIR/openstack -f openstack
sudo -u xdmod xdmod-shredder -r nutsetters -d $REF_DIR/nutsetters -f openstack
sudo -u xdmod xdmod-ingestor

for storage_dir in $REF_DIR/storage/*; do
    sudo -u xdmod xdmod-shredder -f storage -r $(basename $storage_dir) -d $storage_dir
done
last_modified_start_date=$(date +'%F %T')
sudo -u xdmod xdmod-ingestor --datatype storage
sudo -u xdmod xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"

sudo -u xdmod xdmod-import-csv -t names -i $REF_DIR/names.csv
sudo -u xdmod xdmod-ingestor
php $BASEDIR/scripts/create_xdmod_users.php

eiffel777 · 2020-04-17T13:49:43Z

@jpwhite4
I added documentation updates. Let me know if what I have works or if I should add more.

Also ran the commands listed in the docs and they didn't do anything:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --start-date 2012-01-01 --end-date 2020-12-31

I had the docs wrong on the xdmod-ingestor flags to add. The last-modified-start-date flag should be used instead of start-date and end-date. The command should be

xdmod-ingestor --last-modified-start-date 2012-01-01

I changed this in the documentation

jpwhite4 · 2020-04-17T19:29:51Z

docs/cloud.md

+    pi,project_name,resource_name
+    pi2,project_name2,resource_name
+
+The first column should be the username of the PI as seen in your resources event log files. The second column is the name of the project


Can you give a concrete example of which field corresponds to the username in the 'resources event log files'? E.G in the openstack data which field is it?

jpwhite4 · 2020-04-17T19:30:38Z

docs/cloud.md

+
+    xdmod-ingest-csv -t cloud-project-to-pi -i /path/to/file.csv
+
+After importing this data you must ingest it for the date range of any data you have already shredded.


Should point out that the date does not correspond to the date range of the data. Instead it is the timestamp of when the data were shredded.

jpwhite4 · 2020-04-17T19:59:25Z

The instructions in the docs still do not work for me. I ran the script that posted earlier then ran:

sudo -u xdmod xdmod-import-csv -t cloud-project-to-pi -i /var/tmp/referencedata/cloud-pi-test.csv
sudo -u xdmod xdmod-ingestor --last-modified-start-date 2012-01-01

still shows as unknown.

jpwhite4 · 2020-04-17T20:15:04Z

I can get the pi stuff in there if I re-shred the data, but ideally we don't want to force everyone to re shred everything if a new project is created. There will almost certainly be a lag between the new project showing in the openstack logs and the csv file being updated to reflect the new project data.

Can you look into what it would take to be able to have it automatically update when the pi project csv import is run? It shouldn't be too difficult presumably you just need to update the account table when import is run?

…i infomation to be linked to cloud projects

eiffel777 · 2020-04-28T13:42:09Z

I can get the pi stuff in there if I re-shred the data, but ideally we don't want to force everyone to re shred everything if a new project is created. There will almost certainly be a lag between the new project showing in the openstack logs and the csv file being updated to reflect the new project data.

@jpwhite4

This is fixed now so that you don't have to re-shred the data. You should just run the xdmod-csv-import and xdmod-ingestor commands and the data should import correctly.

jpwhite4 · 2020-05-04T16:00:05Z

The tests have log errors:

Cloud/username/cloud_num_sessions_started/aggregate-Quarter-pi IS ONLY ==
Cloud/username/cloud_avg_cores_reserved/aggregate-Year-pi:
Raw Expected:
title
"Average Cores Reserved Weighted By Wall Hours: by System Username"
parameters

"*Restricted To: User = Tern, C"
start,end
2018-04-18,2018-04-30
---------
"System Username","Average Cores Reserved Weighted By Wall Hours"
---------


Raw Actual:
title
"Average Cores Reserved Weighted By Wall Hours: by System Username"
parameters

"*Restricted To: PI = Tern, C OR User = Tern, C"
start,end
2018-04-18,2018-04-30
---------
"System Username","Average Cores Reserved Weighted By Wall Hours"
---------

Since these are new tests, I would not expect them to be giving warnings about the numbers not matching up. Please can you check to see what is the cause of this.

jpwhite4

Please check regression tests. I would not expect the new tests to be reporting warnings.

… the PI information

…77/xdmod into cloud-add-pi-information

…add-pi-information

eiffel777 · 2020-05-04T19:08:38Z

@jpwhite4 I just checked the regression test issue. When I added the PI information it seems the regression tests for username has a text change. The PI = Tern text was added

"*Restricted To: PI = Tern, C OR User = Tern, C"

where before it was

"*Restricted To: User = Tern, C"

It seems it was just a text difference not numbers being off.

…add-pi-information

jpwhite4 · 2020-05-05T19:15:58Z

configuration/etl/etl_tables.d/cloud_common/cloudfact_by_.json

+            },{
+                "name": "principalinvestigator_person_id",
+                "type": "int(11)",
+                "nullable": true,


Dimensions should not be nullable. You'll need nullable: false. Default -1.
You should also add a comment property that describes this column (DIMENSION: ...)

eiffel777 added 6 commits April 9, 2020 10:24

adding ability to ingest csv with that has cloud project to pi listin…

2462616

…g. adding group by for PI

adding action defintions for ingesting cloud pi information to generi…

3cfdbc3

…c cloud pipeline

added documentation for adding pi information to cloud realm

9f4f7db

adding regression tests

ec69845

update to aggregation

426cb28

adding correct documentation to staging_cloud_project_to_pi table def…

f36279b

…inition file

eiffel777 added Category:Cloud Cloud Realm new feature New functionality labels Apr 16, 2020

eiffel777 added this to the 9.0.0 milestone Apr 16, 2020

eiffel777 requested review from jtpalmer and ryanrath April 16, 2020 13:52

eiffel777 self-assigned this Apr 16, 2020

jtpalmer suggested changes Apr 16, 2020

View reviewed changes

eiffel777 added 2 commits April 16, 2020 10:33

updating documentation

d71ce86

updating documentation

6382300

jtpalmer previously approved these changes Apr 16, 2020

View reviewed changes

updating integration tests

02ab833

eiffel777 dismissed jtpalmer’s stale review via 02ab833 April 16, 2020 15:25

more updates to pass tests

6b396ca

jpwhite4 reviewed Apr 16, 2020

View reviewed changes

configuration/etl/etl_action_defs.d/cloud_common/pi.json Show resolved Hide resolved

updates to documentation for cloud realm

581fe93

adding updates to test outputs

6f8c59b

jpwhite4 reviewed Apr 17, 2020

View reviewed changes

eiffel777 added 2 commits April 20, 2020 09:06

make it so event log data does not need to be shredded in order for p…

e4db182

…i infomation to be linked to cloud projects

removing trailing whitespace to pass tests

c86d456

Merge branch 'xdmod9.0' into cloud-add-pi-information

18b5b77

jpwhite4 requested changes May 4, 2020

View reviewed changes

eiffel777 added 3 commits May 4, 2020 15:03

updating username tests to reflect text changes as a result of adding…

9429850

… the PI information

Merge branch 'cloud-add-pi-information' of https://github.com/eiffel7…

2316344

…77/xdmod into cloud-add-pi-information

Merge branch 'xdmod9.0' of https://github.com/ubccr/xdmod into cloud-…

8db3d48

…add-pi-information

eiffel777 added 2 commits May 5, 2020 11:03

Merge branch 'xdmod9.0' of https://github.com/ubccr/xdmod into cloud-…

824701f

…add-pi-information

Merge branch 'xdmod9.0' into cloud-add-pi-information

fdcf2a1

jpwhite4 requested changes May 5, 2020

View reviewed changes

updating cloudfact_by_ table definition

e3704e6

jpwhite4 approved these changes May 11, 2020

View reviewed changes

Merge branch 'xdmod9.0' into cloud-add-pi-information

5170d14

eiffel777 merged commit 6e7abfa into ubccr:xdmod9.0 May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PI information for the Cloud realm #1286

Add PI information for the Cloud realm #1286

eiffel777 commented Apr 16, 2020

jtpalmer commented Apr 16, 2020

jtpalmer Apr 16, 2020

eiffel777 Apr 16, 2020

eiffel777 commented Apr 16, 2020

plessbd commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020 •

edited

Loading

eiffel777 commented Apr 16, 2020 •

edited

Loading

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

eiffel777 commented Apr 17, 2020

jpwhite4 Apr 17, 2020

jpwhite4 Apr 17, 2020

jpwhite4 commented Apr 17, 2020

jpwhite4 commented Apr 17, 2020

eiffel777 commented Apr 28, 2020

jpwhite4 commented May 4, 2020

jpwhite4 left a comment

eiffel777 commented May 4, 2020

jpwhite4 May 5, 2020


		xdmod-ingest-csv -t cloud-project-to-pi -i /path/to/file.csv

		After importing this data you must ingest it for the date range of any data you have already shredded.

Add PI information for the Cloud realm #1286

Add PI information for the Cloud realm #1286

Conversation

eiffel777 commented Apr 16, 2020

Tests performed

Types of changes

Checklist:

jtpalmer commented Apr 16, 2020

jtpalmer Apr 16, 2020

Choose a reason for hiding this comment

eiffel777 Apr 16, 2020

Choose a reason for hiding this comment

eiffel777 commented Apr 16, 2020

plessbd commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020 • edited Loading

eiffel777 commented Apr 16, 2020 • edited Loading

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

jpwhite4 commented Apr 16, 2020

eiffel777 commented Apr 17, 2020

jpwhite4 Apr 17, 2020

Choose a reason for hiding this comment

jpwhite4 Apr 17, 2020

Choose a reason for hiding this comment

jpwhite4 commented Apr 17, 2020

jpwhite4 commented Apr 17, 2020

eiffel777 commented Apr 28, 2020

jpwhite4 commented May 4, 2020

jpwhite4 left a comment

Choose a reason for hiding this comment

eiffel777 commented May 4, 2020

jpwhite4 May 5, 2020

Choose a reason for hiding this comment

jpwhite4 commented Apr 16, 2020 •

edited

Loading

eiffel777 commented Apr 16, 2020 •

edited

Loading