Skip to content

PH5 Creating Validating and Archiving

Nick Falco edited this page Oct 29, 2018 · 17 revisions

Creating a PH5 Experiment

Refer to the documentation below to learn how to create a PH5 experiment. If you have not yet installed PH5, you must first follow the Installation Instructions. Always update to the latest release of PH5 if you have an older version installed.

Document Description
PH5 Long Documentation This complete guide will walk you through generating a PH5 archive for both active and mixed mode experiments.
PH5 in a Nutshell This abbreviated guide will walk you through creating the directories and adding data to the PH5 archive, formatting and loading metadata, calculating derived tables and loading responses and validating

Validating a PH5 Experiment

Please validate your PH5 experiment before submitting it the IRIS DMC for achiving. Failure to submit a complete PH5 experiment will delay the archiving of your experiment.

1.) Check your experiment for completeness using the PH5 validation command (ph5_validate):

  • ph5_validate - Runs a suite of tests to check for metadata accuracy and completeness. PH5 validate is available in PH5 version v4.0.5 and later. Always update to the latest release of PH5.
    • Example: ph5_validate -p <path/to/ph5/dataset/directory/> -n master.ph5 --level info

2.) Run the the following extraction commands and confirm that outputted metadata and waveform data are accurate:

  • ph5tostationxml - extracts station metadata from PH5
    • Example: ph5tostationxml -p <path/to/ph5/dataset/directory/> -n master.ph5 --level response

    • Please run the ph5tostationxml command at the “response” level to ensure that both sensor and digitizer responses are able to be extracted.

  • ph5toms - extracts timeseries data from PH5
    • Example: ph5toms -p <path/to/ph5/dataset/directory/> -n master.ph5 --starttime <YYYY-mm-ddTHH:MM:SS.ss> --stoptime <YYYY-mm-ddTHH:MM:SS.ss>
  • ph5toexml (if applicable) - extracts event (shot) metadata from PH5
    • Example: ph5toexml -p <path/to/ph5/dataset/directory/> -n master.ph5

Archiving a PH5 Experiment

Please refer to the instructions below for archiving PH5 experiments at the IRIS DMC. Before archiving, please review PASSCAL's guidelines for archiving active source datasets.

Archiving a new PH5 Experiment at the IRIS DMC

1.) Validate your experiment by following these steps. Failure to submit a complete PH5 experiment will delay the archiving of your experiment.

2.) Contact the IRIS DMC Data Group at engine_room@iris.washington.edu to coordinate how you are going to send your data to the DMC. Large experiments (>100GB) are normally mailed to the DMC on a disk or uploaded using BBCP. Smaller experiments may be uploaded using FTP.

3.) Once your experiment arrives at the DMC it will be further tested for compatibility with the PH5 Web Services. If your experiment required additional changes by DMC staff, then you may be asked to verify the accuracy of any required changes. Once your experiment is deemed archive ready, it will be made available through the PH5 Web Services.

Archiving Data Service Runs

Data sometimes needs to be collected from the field incrementally due to the limited storage capacity of nodal instruments. This process is often referred to as a service run.

For each service run, users may send the IRIS DMC an updated master.ph5 file that includes any metadata changes, as well as, the new time-series data (miniPH5_#####.ph5 files) that are associated with the coverage of the updated metadata. Please coordinate this by contacting the IRIS DMC Data Group at engine_room@iris.washington.edu.

The process for updating metadata and adding new timeseries data to an existing experiment is outlined below:

Updating metadata to include new data from a service run:

Below is the process for updating the array table since it is most likely to change during a service run. Please note that the Experiment_t table may also need to updated using this procedure if the network start and end times also need to be adjusted.

1.) Dump the arrays table to a KEF file

ph5tokef -p <ph5-directory-path> -n master.ph5 —all_arrays > all_arrays.kef

2.) Make any edits to the network and/or station metadata.

It is not uncommon to need to extend station start and end times to include the addition of new data that was collected during the latest service run. Additionally, one may need to create new epochs if other instrument info was changed, such as station latitude/longitudes. Bottom line is that the metadata should reflect in full detail how the instruments were deployed in the field.

Users may choose to edit the KEF file directory or convert is to a CSV and make edits in Microsoft Excel or another spreadsheet editor (recommended).

To convert KEF file to CSV for editing use:

keftocsv -f all_arrays.kef -o all_arrays.csv

To updated metadata in the CSV file back to to a KEF file use:

csvtokef -f all_arrays.csv -o all_arrays_new.kef

3.) Use the nuke_table (alias to delete_table) command to remove the old metadata.

nuke_table -p <ph5-directory-path> -n master.ph5 -A 1 // deletes array 1 metadata from PH5
nuke_table -p <ph5-directory-path> -n master.ph5 -A 2 // deletes array 2 metadata from PH5
…etc for all arrays

4.) Load the updated metadata into the master.ph5 file

keftoph5 -p <ph5-directory-path> -n master.ph5 -k all_arrays_new.kef

Adding timeseries data from a service run:

New timeseries data for service runs should be added to an existing master.ph5 file using the following processes.

Loading Fairfield SEG-D files

segdtoph5 -f <file_list> -n master.ph5 -U <UTM_zone>

The -U flag specifies which UTM zone the data were collected in so that the converted latitude/longitude will be correct. For UTM zone, you just need the number (no N or S). The <file_list> should contain the complete absolute path to the .rg16 files created after you run unsimpleton (page 5 of the PH5 Long Documentation).

The segdtoph5 command loads the raw waveform data into miniPH5_*****.ph5 files and creates references in the master.ph5 that associate the metadata to the timeseries waveform data.