Skip to content

Commit

Permalink
Fixed #2. Correction to #2 as follows.
Browse files Browse the repository at this point in the history
GPU jobs were handled correctly UNLESS they were running on the last node reported in qstat. The last node does not have the line of separation that the script searchers for to properly deliminate different nodes in qstat's output. This would cause the last node to grab all of the pending jobs and screw up formatting etc and toss the last node into the pending jobs list and have it be lost for eternity. To mend this, the script now checks if a node is found within the pending jobs and parses it out to be added to the node list.
  • Loading branch information
CodyKank committed Sep 15, 2017
1 parent cf06278 commit df774cc
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 10 deletions.
10 changes: 6 additions & 4 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
* node-search.py started in late July/Early August 2016.

* node-search.py is a utility script, which can be seen as a swiss-army knife
type of program for finding relevant information for the Univa Grid Engine.
type of program for finding relevant information for the Univa Grid Engine(UGE) /
Sun Grid Engine (SGE).

* How Does it work:
node-search.py is a python 3 script, which is placed in the afs space
node-search.py is a python 3 script, which is placed in the AFS space
currently, and it uses the subprocess module to spawn new subprocesses
which are usually running qstat or qconf and gather their output back into
the python script which will then parse that output depending on what its
Expand All @@ -31,7 +32,8 @@
pain to have to load the python 3 module everytime you wanted check the
status of a job, so this bash script sets up the environment temporarily
to run Python 3 scripts. The only configuration needed with this is to
properly specifiy the full path for the actual python 3 script.
properly specifiy the full path for the actual python 3 script and to edit the
paths for the useful module information to properly grab a version of Python 3.

* If you are not a member of CRC and are using the UGE and want to use
this, you may need to play around with trial and error, as there may be
Expand All @@ -46,7 +48,7 @@

* This script is assuming (1) Python 3 is installed, tested with python 3.6.0 and 3.4.0,
(2) the subprocess module is installed [should be by default], and (3) you
have qconf and qstat working and configured.
have qconf, qstat, and xymon working and configured.

* To do:
Add jobs to Host groups --details output
Expand Down
22 changes: 16 additions & 6 deletions node_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -675,15 +675,23 @@ def process_user(user_name):
print('Error: User, ' + user_name + ' is currently not logged on or does not exist.')
sys.exit(25)

qstat = subprocess.getoutput("qstat -F").split('-'.center(81, '-')) #81 -'s
qstat = subprocess.getoutput("qstat -f").split('-'.center(81, '-')) #81 -'s

node_list = []
pending_jobs = ''
pending_search = '#'.center(79, '#') #denotes pending jobs in qstats 79 #'s
pending_search = '#'.center(79, '#') #denotes pending jobs in qstat 79 #'s
#Weeding out nonessential nodes
for node in qstat:
#if 'gpu' in node:
# if 'qa-titanx-001' in node:
# blah = 0
if user_name in (node.split()):
if pending_search in node.split(): #Taking pending jobs out
if pending_search in node: #Taking pending jobs out
if ".crc.nd.edu" in node:
# This means its the last node. We must only accept up tp the pending jobs ONLY. Below we are doing that and taking out an
# Additional newline by stripping it but adding one back in to keep formatting correct. (there were two instead of one).
tempNode = (node[:node.find(pending_search)].rstrip())+'\n'
node_list.append(tempNode)
pending_jobs += (node[node.find(pending_search):]) #reaping pending jobs
else:
node_list.append(node)
Expand All @@ -708,15 +716,15 @@ def process_user(user_name):
temp_node.set_cores(host_total_cores, host_used_cores)
# Reaping the info we want from qstat -F divided up by lines with each node.
# so [25] is line 25 down from the start of that node which contains total_mem
total_mem = host.split('\n')[25]
"""total_mem = host.split('\n')[25]
total_mem = total_mem[total_mem.find('=') +1 :]
used_mem = host.split('\n')[26]
used_mem = used_mem[used_mem.find('=') +1 :]
free_mem = host.split('\n')[27]
free_mem = free_mem[free_mem.find('=') + 1 :]
temp_node.set_total_mem(total_mem)
temp_node.set_used_mem(used_mem)
temp_node.set_free_mem(free_mem)
temp_node.set_free_mem(free_mem)"""
# Obtaining machines's memory information from Xymon's page for this particular node.
#full_page = urllib.request.urlopen("https://mon.crc.nd.edu/xymon-cgi/svcstatus.sh?HOST={0}.crc.nd.edu&SERVICE=memory".format(temp_node))
#mybytes = full_page.read()
Expand All @@ -731,6 +739,8 @@ def process_user(user_name):
# 28 is how many char's that string is (don't want it)
node_stat= host[host.find('qf:min_cpu_interval=00:05:00') + 28\
:host.find('\n---------------------------------------------------------------------------------\n')]
"""Possibly do a host.split('\n') and join the rest of 2 - end"""

# There is always an extra '\n' in here, so subtract 1 to get rid of it
num_jobs = len(node_stat.split('\n')) -1
# If there are any jobs, parse them and gather info
Expand Down Expand Up @@ -787,7 +797,7 @@ def print_detailed_user(node_list, pending_list, user_name, user_jobs, num_cores
# Getting every process of the user to print
for node in node_list:
user_proc_list = []
cleanName = str(node).replace('long@','').replace('debug@','').replace('.crc.nd.edu','')
cleanName = str(node).replace('long@','').replace('debug@','').replace('.crc.nd.edu','').replace('gpu','').replace('gpu-debug','')
full_page = urllib.request.urlopen("https://mon.crc.nd.edu/xymon-cgi/svcstatus.sh?HOST={0}.crc.nd.edu&SERVICE=cpu".format(cleanName))
mybytes = full_page.read() # getting all html into a byte-list
pageStr = mybytes.decode("utf8") # Now the html is in a string
Expand Down

0 comments on commit df774cc

Please sign in to comment.