Releases: eqcorrscan/EQcorrscan
EQcorrscan 0.5.0
This release represents a significant increase in efficiency in large-scale matched-filters in EQcorrscan. Lots of work has gone in to reducing memory usage in the non-correlation components of the matched-filter workflow, streamlining the code, making better use of shared memory multi-threaded parallelism and increasing CPU loads. In our testing we can now achieve and maintain >190% CPU efficiency (e.g. >95% hyperthreaded performance). We can also better load GPUs by making use of concurrent CPU and GPU processing of workflow steps. You should not need to change your code to make use of most of these speed-ups. Hopefully you will notice that you can run larger datasets faster than even!
Changelog
- core.match_filter.tribe
- Significant re-write of detect logic to take advantage of parallel steps (see #544)
- Significant re-structure of hidden functions.
- core.match_filter.matched_filter
- 5x speed up for MAD threshold calculation with parallel (threaded) MAD
calculation (#531).
- 5x speed up for MAD threshold calculation with parallel (threaded) MAD
- core.match_filter.detect
- 1000x speedup for retrieving unique detections for all templates.
- 30x speedup in handling detections (50x speedup in selecting detections,
4x speedup in adding prepick time)
- core.match_filter.template
- new quick_group_templates function for 50x quicker template grouping.
- Templates with nan channels will be considered equal to other templates with shared
nan channels. - New grouping strategy to minimise nan-channels - templates are grouped by
similar seed-ids. This should speed up both correlations and
prep_data_for_correlation. See PR #457.
- utils.pre_processing
_prep_data_for_correlation
: 3x speedup for filling NaN-traces in templates- New function ``quick_trace_select` for a very efficient selection of trace
by seed ID without wildcards (4x speedup). process
,dayproc
andshortproc
replaced bymulti_process
. Deprecation
warning added.multi_process
implements multithreaded GIL-releasing parallelism of slow
sections (detrending, resampling and filtering) of the processing workflow.
Multiprocessing is no longer supported or needed for processing. See PR #540
for benchmarks. New approach is slightly faster overall, and significantly
more memory efficeint (uses c. 6x less memory than old multiprocessing approach
on a 12 core machine)
- utils.correlate
- 25 % speedup for
_get_array_dicts
with quicker access to properties.
- 25 % speedup for
- utils.catalog_to_dd
- _prepare_stream
- Now more consistently slices templates to length = extract_len * samp_rate
so that user receives less warnings about insufficient data.
- Now more consistently slices templates to length = extract_len * samp_rate
- write_correlations
- New option
use_shared_memory
to speed up correlation of many events by
ca. 20 % by moving trace data into shared memory. - Add ability to weight correlations by raw correlation rather than just
correlation squared.
- New option
- _prepare_stream
- utils.cluster.decluster_distance_time
- Bug-fix: fix segmentation fault when declustering more than 46340 detections
with hypocentral_separation.
- Bug-fix: fix segmentation fault when declustering more than 46340 detections
EQcorrscan 0.5.0rc0
Release candidate for version 0.5.0
EQcorrscan 0.4.4
EQcorrscan 0.4.4:
Changelog
- core.match_filter
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs.
Fix: only update peak_cores if it isn't there already.
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs.
- core.match_filter.tribe
- Detect now allows passing of pre-processed data
- core.match_filter.template
- Remove duplicate detections from overlapping windows using
._uniq()
- core.lag_calc._xcorr_interp
- CC-interpolation replaced with resampling (more robust), old method
deprecated. Use new method with use_new_resamp_method=True as **kwarg.
- core.lag_calc:
- Fixed bug where minimum CC defined via min_cc_from_mean_cc_factor was not
set correctly for negative correlation sums.
- utils.correlate
- Fast Matched Filter now supported natively for version >= 1.4.0
- Only full correlation stacks are returned now (e.g. where fewer than than
the full number of channels are in the stack at the end of the stack, zeros
are returned).
- utils.mag_calc.relative_magnitude
- fixed bug where S-picks / traces were used for relative-magnitude calculation
against user's choice. - implemented full magnitude bias-correction for CC and SNR
- utils.mag_calc.relative_amplitude:
- returns dicts for SNR measurements
- utils.catalog_to_dd.write_correlations
- Fixed bug on execution of parallel execution.
- Added parallel-options for catalog-dt measurements and for stream-preparation
before cross correlation-dt measurements. - Default parallelization of dt-computation is now across events (loads CPUs
more efficiently), and there is a new option ``max_trace_workers` to use
the old parallelization strategy across traces. - Now includes
all_horiz
-option that will correlate all matching horizontal
channels no matter to which of these the S-pick is linking.
- utils.clustering
- Allow to handle indirect comparison of event-waveforms when (i.e., events
without matching traces which can be compared indirectly via a third event) - Allows to set clustering method, metric, and sort_order from
scipy.cluster.hierarchy.linkage.
- tribe, template, template_gen, archive_read, clustering: remove option to read
from seishub (deprecated in obspy).
EQcorrscan 0.4.4 Release Candidate 0
Release candidate 0 for release 0.4.4.
- core.match_filter
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs.
Fix: only update peak_cores if it isn't there already.
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs.
- core.match_filter.tribe
- Detect now allows passing of pre-processed data
- core.match_filter.template
- Remove duplicate detections from overlapping windows using
._uniq()
- core.lag_calc._xcorr_interp
- CC-interpolation replaced with resampling (more robust), old method
deprecated. Use new method with use_new_resamp_method=True as **kwarg.
- core.lag_calc:
- Fixed bug where minimum CC defined via min_cc_from_mean_cc_factor was not
set correctly for negative correlation sums.
- utils.correlate
- Fast Matched Filter now supported natively for version >= 1.4.0
- Only full correlation stacks are returned now (e.g. where fewer than than
the full number of channels are in the stack at the end of the stack, zeros
are returned).
- utils.mag_calc.relative_magnitude
- fixed bug where S-picks / traces were used for relative-magnitude calculation
against user's choice. - implemented full magnitude bias-correction for CC and SNR
- utils.mag_calc.relative_amplitude:
- returns dicts for SNR measurements
- utils.catalog_to_dd.write_correlations
- Fixed bug on execution of parallel execution.
- Added parallel-options for catalog-dt measurements and for stream-preparation
before cross correlation-dt measurements. - Default parallelization of dt-computation is now across events (loads CPUs
more efficiently), and there is a new option ``max_trace_workers` to use
the old parallelization strategy across traces. - Now includes
all_horiz
-option that will correlate all matching horizontal
channels no matter to which of these the S-pick is linking.
- utils.clustering
- Allow to handle indirect comparison of event-waveforms when (i.e., events
without matching traces which can be compared indirectly via a third event) - Allows to set clustering method, metric, and sort_order from
scipy.cluster.hierarchy.linkage.
- tribe, template, template_gen, archive_read, clustering: remove option to read
from seishub (deprecated in obspy).
EQcorrscan Version 0.4.3
Changelog
- core.match_filter
- match_filter:
- Provide option of exporting the cross-correlation sums for additional later analysis.
- match_filter:
- core.match_filter.party.write
- BUG-FIX: When
format='tar'
is selected, added a check for .tgz-file suffix before checking the filename against an existing file. Previously, when a filename without '.tgz'-suffix was supplied, then the file was overwritten against the function's intention. - Add option
overwrite=True
to allow overwriting of existing files.
- BUG-FIX: When
- core.match_filter.party.read
- BUG-FIX: Ensure wildcard reading works as expected: #453
- core.match_filter.party.rethreshold:
- added option to rethreshold based on absolute values to keep relevant detections with large negative detect_val.
- core.lag_calc:
- Added option to set minimum CC threshold individually for detections based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- Added the ability of saving correlation data of the lag_calc.
- utils.mag_calc.calc_b_value:
- Added useful information to doc-string regarding method and meaning of residuals
- Changed the number of magnitudes used to an int (from a string!?)
- utils.mag_calc.relative_magnitude:
- Refactor so that
min_cc
is used regardless of whetherweight_by_correlation
is set. See issue #455.
- Refactor so that
- utils.archive_read
- Add support for wildcard-comparisons in the list of requested stations and channels.
- New option
arctype='SDS'
to read from a SeisComp Data Structure (SDS). This option is also available inutils.clustering.extract_detections
and inutils.archive_read._check_available_data
.
- utils.catalog_to_dd
- Bug-fixes in #424:
- only P and S phases are used now (previously spurious amplitude picks were included in correlations);
- Checks for length are done prior to correlations and more helpful error outputs are provided.
- Progress is not reported within dt.cc computation
write_station
now supports writing elevations: #424.
- Bug-fixes in #424:
- utils.clustering
- For
cluster
,distance_matrix
andcross_chan_correlation
, implemented full support forshift_len != 0
. The latter two functions now return, in addition to the distance-matrix, a shift-matrix (both functions) and a shift-dictionary (fordistance_matrix
). New option for shifting streams as a whole or letting traces shift individually (allow_individual_trace_shifts=True
).
- For
- utils.plotting
- Function added (twoD_seismplot) for plotting seismicity (#365).
EQcorrscan 0.4.3 Release Candidate 0
Changelog
- core.match_filter
- match_filter:
- Provide option of exporting the cross-correlation sums for additional later analysis.
- match_filter:
- core.match_filter.party.write
- BUG-FIX: When
format='tar'
is selected, added a check for .tgz-file suffix before checking the filename against an existing file. Previously, when a filename without '.tgz'-suffix was supplied, then the file was overwritten against the function's intention. - Add option
overwrite=True
to allow overwriting of existing files.
- BUG-FIX: When
- core.match_filter.party.read
- BUG-FIX: Ensure wildcard reading works as expected: #453
- core.match_filter.party.rethreshold:
- added option to rethreshold based on absolute values to keep relevant detections with large negative detect_val.
- core.lag_calc:
- Added option to set minimum CC threshold individually for detections based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- Added the ability of saving correlation data of the lag_calc.
- utils.mag_calc.calc_b_value:
- Added useful information to doc-string regarding method and meaning of residuals
- Changed the number of magnitudes used to an int (from a string!?)
- utils.mag_calc.relative_magnitude:
- Refactor so that
min_cc
is used regardless of whetherweight_by_correlation
is set. See issue #455.
- Refactor so that
- utils.archive_read
- Add support for wildcard-comparisons in the list of requested stations and channels.
- New option
arctype='SDS'
to read from a SeisComp Data Structure (SDS). This option is also available inutils.clustering.extract_detections
and inutils.archive_read._check_available_data
.
- utils.catalog_to_dd
- Bug-fixes in #424:
- only P and S phases are used now (previously spurious amplitude picks were included in correlations);
- Checks for length are done prior to correlations and more helpful error outputs are provided.
- Progress is not reported within dt.cc computation
write_station
now supports writing elevations: #424.
- Bug-fixes in #424:
- utils.clustering
- For
cluster
,distance_matrix
andcross_chan_correlation
, implemented full support forshift_len != 0
. The latter two functions now return, in addition to the distance-matrix, a shift-matrix (both functions) and a shift-dictionary (fordistance_matrix
). New option for shifting streams as a whole or letting traces shift individually (allow_individual_trace_shifts=True
).
- For
- utils.plotting
- Function added (twoD_seismplot) for plotting seismicity (#365).
EQcorrscan Version 0.4.2
A Python package for the detection and analysis of repeating and near-repeating seismicity.
Changelog
- Add seed-ids to the _spike_test's message.
- utils.correlation
- Cross-correlation normalisation errors no-longer raise an error
- When "out-of-range" correlations occur a warning is given by the C-function
with details of what channel, what template and where in the data vector
the issue occurred for the user to check their data. - Out-of-range correlations are set to 0.0
- After extensive testing these errors have always been related to data issues
within regions where correlations should not be computed (spikes, step
artifacts due to incorrectly padding data gaps). - USERS SHOULD BE CAREFUL TO CHECK THEIR DATA IF THEY SEE THESE WARNINGS
- utils.mag_calc.amp_pick_event
- Added option to output IASPEI standard amplitudes, with static amplification
of 1 (rather than 2080 as per Wood Anderson specs). - Added
filter_id
andmethod_id
to amplitudes to make these methods more
traceable.
- Added option to output IASPEI standard amplitudes, with static amplification
- core.match_filter
- Bug-fix - cope with data that are too short with
ignore_bad_data=True
.
This flag is generally not advised, but when used, may attempt to trim all
data to zero length. The expected behaviour is to remove bad data and run
with the remaining data. - Party:
- decluster now accepts a hypocentral_separation argument. This allows
the inclusion of detections that occur close in time, but not in space.
This is underwritten by a new findpeaks.decluster_dist_time function
based on a new C-function.
- decluster now accepts a hypocentral_separation argument. This allows
- Tribe:
- Add monkey-patching for clients that do not have a
get_waveforms_bulk
method for use in.client_detect
. See issue #394.
- Add monkey-patching for clients that do not have a
- Bug-fix - cope with data that are too short with
- utils.pre_processing
- Only templates that need to be reshaped are reshaped now - this can be a lot
faster.
- Only templates that need to be reshaped are reshaped now - this can be a lot
Version 0.4.2 Release Candidate 0
Pre-release for 0.4.2 for testing on conda-forge
EQcorrscan Version 0.4.1
A Python package for the detection and analysis of repeating and near-repeating seismicity.
Changelog
- core.match_filter
- BUG-FIX: Empty families are no longer run through lag-calc when using
Party.lag_calc(). Previously this resulted in a "No matching data" error,
see #341.
- BUG-FIX: Empty families are no longer run through lag-calc when using
- core.template_gen
- utils.catalog_to_dd
- Added ability to turn off parallel processing (this is turned off by
default now) forwrite_correlations
- parallel processing for moderate
to large datasets was copying far too much data and using lots of memory.
This is a short-term fix - ideally we will move filtering and resampling to
C functions with shared-memory parallelism and GIL releasing.
See PR #374. - Moved parallelism for
_compute_dt_correlations
to the C functions to
reduce memory overhead. Using a generator to construct sub-catalogs rather
than making a list of lists in memory. See issue #361.
- Added ability to turn off parallel processing (this is turned off by
- utils.mag_calc:
amp_pick_event
now works on a copy of the data by defaultamp_pick_event
uses the appropriate digital filter gain to correct the
applied filter. See issue #376.amp_pick_event
rewritten for simplicity.amp_pick_event
now has simple synthetic tests for accuracy._sim_wa
uses the full response information to correct to velocity
this includes FIR filters (previously not used), and ensures that the
wood-anderson poles (with a single zero) are correctly applied to velocity
waveforms.calc_max_curv
is now computed using the non-cumulative distribution.
- Some problem solved in _match_filter_plot. Now it shows all new detections.
- Add plotdir to eqcorrscan.core.lag_calc.lag_calc function to save the images.
Version 0.4.1 Release Candidate 0
Pre-release for 0.4.1