Create long list files of ALL data sets (to individual versions) for all projects #61

agstephens · 2020-06-18T15:32:44Z

Create some large files from the find command for each project:

badc: cmip5, cmip6, cordex
c3s: cmip5, cmip6, cordex

NOTE: list to the level of all the NetCDF files - under the version directories (not latest).

Store the files in the "smf" GWS.

Write some python to parse them into Pandas DataFrames that we can do interrogation on.

The text was updated successfully, but these errors were encountered:

agstephens · 2020-06-18T15:34:08Z

$ cat make-file-lists.sh
#!/bin/bash

cd /home/users/astephen/roocs/proto-lib-34e/

find -L /badc/cordex/data/cordex/output -iname "*.nc" | grep -v latest > file-list-cordex.txt
find -L /badc/cmip5/data/cmip5/output*/*/*/*  -iname "*.nc" | grep -v latest > file-list-cmip5.txt
find -L /badc/cmip6/data/CMIP6/*/*/*/*  -iname "*.nc" | grep -v latest > file-list-cmip6.txt

[astephen@sci2 proto-lib-34e]$ cat batch-make-file-lists.sh
D=$PWD

bsub -q short-serial \
     -o $D/lotus.out \
     -e $D/lotus.err \
     -W 23:59 \
     $D/make-file-lists.sh

agstephens · 2020-06-18T15:35:01Z

Add some code to ingest them into a DataFrame then save as good DataFrame format (maybe zipped csv).

agstephens assigned ellesmith88 Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create long list files of ALL data sets (to individual versions) for all projects #61

Create long list files of ALL data sets (to individual versions) for all projects #61

agstephens commented Jun 18, 2020 •

edited

Loading

agstephens commented Jun 18, 2020

agstephens commented Jun 18, 2020

Create long list files of ALL data sets (to individual versions) for all projects #61

Create long list files of ALL data sets (to individual versions) for all projects #61

Comments

agstephens commented Jun 18, 2020 • edited Loading

agstephens commented Jun 18, 2020

agstephens commented Jun 18, 2020

agstephens commented Jun 18, 2020 •

edited

Loading