Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes from qstat command now found under native hash #198

Merged
merged 6 commits into from
Jul 9, 2020

Conversation

matthu017
Copy link
Contributor

Added cpu/mem_usage attributes following gridengine/qstat.xsd

@matthu017
Copy link
Contributor Author

The XML schema for qstat can be found at the link below: https://github.com/gridengine/gridengine/blob/master/source/dist/util/resources/schemas/qstat/qstat.xsd
The vagrant-with-gridengine under OSC's ood-images repo did not have many of these XML tags but it should probably work for other HPC centers using SGE. As seen in the discourse below, these tags are present under other production instances:
https://discourse.osc.edu/t/remote-desktop-not-reflecting-correct-number-of-cores/640

@matthu017
Copy link
Contributor Author

Need to change unit testing to read values from native hash.
Change dispatch/submission time from UNIX time to a human readable format

@matthu017 matthu017 linked an issue Jun 9, 2020 that may be closed by this pull request
@ericfranz
Copy link
Contributor

The conversion of these hashes into Info objects is done

listener.parsed_jobs.map{
|job_hash| OodCore::Job::Info.new(
**post_process_qstat_job_hash(job_hash)
)
}
.

The idea is the primary purpose of the info and info_all methods on the adapters is to return an array of objects that have the same interface, regardless of the adapter.

So id, job_owner, job_name, accounting_id, status, procs, queue_name, submission_time, dispatch_time, wallclock_limit etc. are all defined https://github.com/OSC/ood_core/blob/63187f8874e62b752a73e39263f539ce0241495a/lib/ood_core/job/info.rb.

The original idea for native was when building the Slurm and Torque adapters - the Info interface was small, and the qstat/squeue calls returned a lot more information than was in our interface, and Jeremy wanted to capture the original full data somewhere in that object for access.

For example, the actual state in Torque might be Exiting or Transitioning but these exact states do not exist in every adapter. So the common interface to job status is OodCore::Job::Status which Info returns. But native would provide a way to get at the raw data produced by Torque, which would include the original T or E for the status. The native data would also use the original key names. So for example, Info#queue_name in Slurm native attribute might have the key name "partition".

Here is an example of creating an Info object using a subset of the data from torque, and then setting native to the raw data at the end:

Info.new(
id: v[:job_id],
status: get_state(v[:job_state]),
allocated_nodes: allocated_nodes,
submit_host: submit_host,
job_name: v[:Job_Name],
job_owner: job_owner,
accounting_id: v[:Account_Name],
procs: procs,
queue_name: v[:queue],
wallclock_time: duration_in_seconds(v.fetch(:resources_used, {})[:walltime]),
wallclock_limit: duration_in_seconds(v.fetch(:Resource_List, {})[:walltime]),
cpu_time: duration_in_seconds(v.fetch(:resources_used, {})[:cput]),
submission_time: v[:ctime] ? Time.parse(v[:ctime]) : nil,
dispatch_time: v[:stime] ? Time.parse(v[:stime]) : nil,
native: v
)

For the SGE adapter, when parsing the XML we build a hash with the keys matching the info object, so initializing a new Info object is made easier - but of course the native attribute is empty.

There are several approaches.

  1. Build the full native raw data from the XML, alongside the rest. So instead of replacing like this PR currently does:

      def end_JB_job_number
    -    @parsed_job[:id] = @current_text
    +    @parsed_job[:native][:id] = @current_text
      end

    it is done in addition:

      def end_JB_job_number
         @parsed_job[:id] = @current_text
    +    @parsed_job[:native][:job_number] = @current_text
      end

    or

      def end_JB_job_number
         @parsed_job[:id] = @current_text
    +    @parsed_job[:native][:JB_job_number] = @current_text
      end

    Notice that native key is "job_number" or "JB_job_number" not "id" because in SGE parlance it is called a job number, and that is what you see in SGE documentation and in the XSD schema https://github.com/gridengine/gridengine/blob/6a5407d56c85b39290ac2488fb6dec1a4404a974/source/dist/util/resources/schemas/qstat/qstat.xsd#L113

    With this approach, you would add code to capture other tags if you run across them, adding them to native. Examples: slots, tasks, master, tickets, deadline, priority, etc. These are obviously all candidates https://github.com/gridengine/gridengine/blob/6a5407d56c85b39290ac2488fb6dec1a4404a974/source/dist/util/resources/schemas/qstat/qstat.xsd#L113-L165. You could probably just use the short form in the XML if they match what a user might see when using qstat from the command line, or even the full XML tag name such as JB_job_number, JAT_prio, etc.

There is likely a place in the sax parser to add every tag name and value to native - then it is just a change in one location.

@ericfranz
Copy link
Contributor

In fact if you can just add a snippet to capture all tags and put the text contents in the native hash, in the tag end method. But that might be problematic for cases where we have hierarchy of tags; or we might still want to cherry pick tags whose values we want to display to the user.

The XSD file above does not contain all the possible values different versions of SGE provide - just looking at the Discourse topic you mentioned, there are a bunch of tags that are not represented.

If we have the access to the output file path, that can be used for the links to the job directory in the progressive disclosure on the Active Jobs app (Files and Shell links).

@matthu017
Copy link
Contributor Author

I'm not seeing anything regarding output file path in the tags. Is there another way to get the path?

@matthu017
Copy link
Contributor Author

NewAttributesForSGE.txt
This file details potential attributes we could support in the ood_core gem for grid engine

To address the differences between other versions of grid engine one could specify a .yaml or .json file under etc/ood/config/apps/activejobs.
This would allow users to specify their own key-value pairs for attributes that show up in their qstat -j -r -xml in addition to the ones already supported by the gem.

JB_job_number: Job Number   //Or whatever key you want to show up in active jobs

@johrstrom
Copy link
Contributor

I think we're good with this now. Though, I'd ask that you rebase to master and use the uge test file to make a testcase for native.ST_name. Even if we're not going to use it, we'll have some coverage for when/if we do.

@matthu017 matthu017 added the ready label Jul 9, 2020
@johrstrom johrstrom merged commit 2c4756c into master Jul 9, 2020
@johrstrom johrstrom deleted the sge_native_attributes branch July 9, 2020 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SGE provide native attributes
3 participants