Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable show raw data and Job Viewer for the Jobs realm #730

Closed
wants to merge 1 commit into from

Conversation

jpwhite4
Copy link
Member

@jpwhite4 jpwhite4 commented Nov 16, 2018

This pull request enables the Job Viewer functionality for the Jobs realm. This one is quite big, but mostly new code. I decided to include a minor internal API change (there will be a corresponding pull request for SUPREMM to update to use the new API). The API change is in classes/Rest/Controllers/WarehouseControllerProvider.php and in the JobSearchPanel.js

EDIT - actually two minor API changes - the job viewer is now enabled if a module defines a realm config file in rawstatistics.d/. This is needed because multiple modules can now enable the job viewer

integration tests have been added to test the new code.

Couple of things to think about:

  • what access controls are appropriate for the jobs realm raw data? My implementation uses the same access controls as the SUPReMM realm based on the rationale that while aggregated jobs data is public, the details are not.
  • In the current ETLv2 infrastucture forces me to create a seperate definition files for the _by_day stuff even though there is really not much differences. For example:
--- configuration/etl/etl_action_defs.d/jobs/hpc-aggregation-day.json	2018-11-16 11:08:43.000000000 -0500
+++ configuration/etl/etl_action_defs.d/jobs/hpc-aggregation.json	2018-11-16 10:42:34.000000000 -0500
@@ -1,7 +1,7 @@
 {
     "#": "Aggregation of HPC job records and tasks ingested from the XDCDB",
     "table_definition": {
-        "$ref": "${table_definition_dir}/jobs/xdw/jobfact_by_day.json#/table_definition"
+        "$ref": "${table_definition_dir}/jobs/xdw/jobfact_by_.json#/table_definition"
     },

     "#": "The aggregation period query determines which periods need to be aggregated based on added or modified",
@@ -77,8 +77,7 @@
             "node_time": "COALESCE(SUM(task.node_count * ${wallduration_case_statement}), 0)",
             "sum_node_time_squared": "COALESCE(SUM( CAST(POW(task.node_count * ${wallduration_case_statement}, 2) AS DECIMAL(36,4)) ), 0)",
             "sum_weighted_expansion_factor": "SUM( ((task.wallduration + task.waitduration) / task.wallduration) * task.node_count * COALESCE(${wallduration_case_statement}, 0))",
-            "sum_job_weights": "SUM(task.node_count * COALESCE(${wallduration_case_statement}, 0))",
-            "job_id_list": "GROUP_CONCAT(task.job_id)"
+            "sum_job_weights": "SUM(task.node_count * COALESCE(${wallduration_case_statement}, 0))"
         },
         "groupby": [
             "${AGGREGATION_UNIT}_id",

the $ref does not really improve the situation much because the bulk of the file is the column definintions.

@jpwhite4 jpwhite4 requested review from smgallo and plessbd and removed request for smgallo November 16, 2018 16:20
@smgallo
Copy link
Contributor

smgallo commented Nov 16, 2018

For the duplication of the "by-day" file, do you mean that the bulk is in the records section of the source_query? Something to think about is perhaps being able to reference a JSON object and then add/override additional items.

@jpwhite4
Copy link
Member Author

Yes. I'd like to be able to reference the base file and then just insert an extra couple of columns to the table definition and add one line to the aggregation query.

@@ -0,0 +1,66 @@
<?php

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs a block at the top describing what it does and what data it encapsulates.

);
}

public function getJobSummary($user, $jobid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all of these methods return empty arrays? If there is a reason it should be documented in this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is no job summary data for the Jobs realm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for job timeseries data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, lets document that in the class.

@@ -725,6 +725,15 @@ public function getCountQueryString()
$data_query .= ") as a WHERE a.total IS NOT NULL";
return $data_query;
}

protected function nextPdoIndex($value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need documentation here as well.

* @param array $scriptOptions the options to pass to the etlv2 class
*/
protected function runEtlv2(array $scriptOptions) {
if (empty($scriptOptions['chunk-size-days'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a task in Asana to work on putting code like this into a central location https://app.asana.com/0/806625283031961/917407032604343

@@ -71,6 +71,7 @@
"configuration/processor_buckets.json",
"configuration/rawstatistics.json",
"configuration/rawstatistics.d",
{"configuration/rawstatistics.d/z_jobs.json": "rawstatistics.d/z_jobs.json"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "z_jobs" stand for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config files are processed in alphabetical order. If the SUPREMM realm is installed this ensures that the jobs realm is listed after the SUPREMM realm in the realm list in the job viewer. This enforces the principle of least confusion since the default behavour of the job viewer will be to search supremm data (as it always was).

Copy link
Contributor

@plessbd plessbd Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that it matters, but the convention i usually see for this is\d+_\w

lets encrypt uses {0000 - 9999}_name
puppet apache modules uses {00 - 99}_name

@jpwhite4
Copy link
Member Author

Splitting this into two to make it easier to review. Please look at #733 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants