Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pycbc live] Allowing the use of psd variation in the ranking statistic for pycbc live #4533

Merged
merged 33 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1ab6879
Modifying files to include psd variation in single detector statistic…
ArthurTolley Oct 3, 2023
5bec56e
ending variation.py with a blank line
ArthurTolley Oct 3, 2023
7e696c6
Changing to an increment agnostic solution
ArthurTolley Oct 13, 2023
de73440
Merge branch 'gwastro:master' into psd_var
ArthurTolley Oct 16, 2023
3a2ffe0
removing change already fixed
ArthurTolley Oct 16, 2023
cceb6e6
Merge branch 'psd_var' of github.com:ArthurTolley/pycbc into psd_var
ArthurTolley Oct 16, 2023
ceaa0d0
Updating function names and docstrings
ArthurTolley Oct 18, 2023
6be1307
removing ToDos and adding more helpful comments
ArthurTolley Oct 18, 2023
1e370c9
Removing unused import
ArthurTolley Oct 18, 2023
5764623
Codeclimate fixes
ArthurTolley Oct 19, 2023
37781d9
Removing excess logging and whitespace mistakes
ArthurTolley Oct 20, 2023
d1d7877
Removing unused objects + codeclimate fixes
ArthurTolley Oct 20, 2023
0e6473a
Updating comments and docstrings, removing matchedfilter changes
ArthurTolley Oct 23, 2023
a8900e8
Revert "Updating comments and docstrings, removing matchedfilter chan…
ArthurTolley Oct 23, 2023
8ba940b
Removing matchedfilter changes, updating comments and docstrings
ArthurTolley Oct 23, 2023
ed2dec9
Move --verbose to the end of the commands
ArthurTolley Oct 23, 2023
ac432db
more comment updates
ArthurTolley Oct 23, 2023
0c45583
Repositioning filter recreation
ArthurTolley Oct 23, 2023
49213b4
Changes to comments and removing whitespace
ArthurTolley Oct 30, 2023
d729357
removing refchecks
ArthurTolley Oct 30, 2023
571c0f6
Adding option veification for psd variation
ArthurTolley Oct 30, 2023
880f091
Apply suggestions from code review
ArthurTolley Oct 31, 2023
e48b611
fixing EOL error
ArthurTolley Oct 31, 2023
8329ac7
Refactoring the filter creation function
ArthurTolley Oct 31, 2023
14cb7d6
codeclimate fixes
ArthurTolley Oct 31, 2023
29a866e
undo
ArthurTolley Oct 31, 2023
073e639
full_filt func
ArthurTolley Nov 2, 2023
b1a2948
removing indentation
ArthurTolley Nov 2, 2023
68d9d0f
code climate
ArthurTolley Nov 2, 2023
2a89928
code climate
ArthurTolley Nov 2, 2023
7ed15a2
try to quiet codeclimate
tdent Nov 2, 2023
97c8a3f
codeclimate doesn't know PEP8
tdent Nov 2, 2023
9fd1218
brackets obviate line continuation
tdent Nov 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 46 additions & 1 deletion bin/pycbc_live
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ from pycbc import mchirp_area
from pycbc.detector import ppdets
from pycbc.filter import resample
from pycbc.psd import estimate
from pycbc.psd import variation
from pycbc.live import snr_optimizer

# Use cached class-based FFTs in the resample and estimate module
Expand Down Expand Up @@ -589,7 +590,7 @@ class LiveEventManager(object):
upload_checks, optimize_snr_checks
)
gdb_upload_thread = threading.Thread(target=self.upload_in_thread,
args=thread_args)
args=thread_args)
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
gdb_upload_thread.start()

def check_singles(self, results, psds):
Expand Down Expand Up @@ -996,6 +997,11 @@ parser.add_argument('--embright-massgap-max', type=float, default=5.0, metavar='
'HasMassGap probability.')
parser.add_argument('--skymap-only-ifos', nargs='+',
help="Detectors that only contribute in sky localization")
parser.add_argument('--psd-variation', action='store_true',
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
help="Run the psd variation code to produce psd variation "
"values for each single detector triggers found by "
"the search. Required when using a single detector "
"ranking statistic that includes psd variation.")

scheme.insert_processing_option_group(parser)
LiveSingle.insert_args(parser)
Expand Down Expand Up @@ -1169,13 +1175,21 @@ with ctx:
results = {}
evnt.live_detectors = set()

# Create objects to track whether the psd has been recalculated
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
if args.psd_variation:
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
psd_recalculated = {ifo: False for ifo in ifos}

for ifo in ifos:
results[ifo] = False
status = data_reader[ifo].advance(
valid_pad,
timeout=args.frame_read_timeout
)
if status and psd_count[ifo] == 0:
# If the psd has been recalculated then we need a new
# filter for psd variation calculation
if args.psd_variation:
psd_recalculated[ifo] = False
status = data_reader[ifo].recalculate_psd()
psd_count[ifo] = args.psd_recompute_length - 1
elif not status:
Expand Down Expand Up @@ -1242,6 +1256,37 @@ with ctx:
if len(results[ifo][key]):
results[ifo][key] = results[ifo][key][idx]

# Calculate and add the psd variation for the results
if args.psd_variation:
logging.info('Setting all ifo filters to none')
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
filt = {ifo: None for ifo in ifos}

for ifo in results:
logging.info(f"Calculating PSD Variation Statistic for {ifo}")

if filt[ifo] is None:
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
logging.info(f"Creating {ifo} filter for PSD Variation.")
filt[ifo] = variation.live_create_filter(data_reader[ifo].psd,
args.psd_segment_length,
int(args.sample_rate))

psd_var_ts = variation.live_calc_psd_variation(data_reader[ifo].strain,
filt[ifo],
args.increment)

psd_var_vals = variation.live_find_var_value(results[ifo],
psd_var_ts)

results[ifo]['psd_var_val'] = psd_var_vals

# psd_recalculated is set to False when the psd of an ifo
# is recalculated due to changing. Need to create a new
# filter from this new psd.
if psd_recalculated[ifo] is False:
logging.info(f'Setting {ifo} filter to none')
filt[ifo] = None
psd_recalculated[ifo] = True

# Look for coincident triggers and do background estimation
if args.enable_background_estimation:
coinc_results = coinc_pool.broadcast(get_coinc, results)
Expand Down
5 changes: 3 additions & 2 deletions examples/live/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ python -m mpi4py `which pycbc_live` \
--max-batch-size 16777216 \
--output-path output \
--day-hour-output-prefix \
--sngl-ranking newsnr_sgveto \
--sngl-ranking newsnr_sgveto_psdvar_threshold \
--ranking-statistic phasetd \
--statistic-files statHL.hdf statHV.hdf statLV.hdf \
--sgchisq-snr-threshold 4 \
Expand All @@ -204,7 +204,8 @@ python -m mpi4py `which pycbc_live` \
--single-duration-threshold 7 \
--single-reduced-chisq-threshold 2 \
--single-fit-file single_trigger_fits.hdf \
--verbose
--verbose \
--psd-variation
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved

# If you would like to use the pso optimizer, change --optimizer to pso
# and include these arguments while removing other optimizer args.
Expand Down
6 changes: 6 additions & 0 deletions pycbc/filter/matchedfilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -1662,6 +1662,10 @@ def process_all(self):
veto_info = [tmp[i] for i in sort]

result = self._process_vetoes(result, veto_info)
# TODO: process_vetoes contains 'signal based vetoes'
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
# this should probably also include psdvar vetoes??
# maybe a bit of a change for process vetoes to contain
# process signal based + process psd var
return result

def _process_vetoes(self, results, veto_info):
Expand All @@ -1670,6 +1674,8 @@ def _process_vetoes(self, results, veto_info):
dof = numpy.array(numpy.zeros(len(veto_info)), numpy.uint32, ndmin=1)
sg_chisq = numpy.array(numpy.zeros(len(veto_info)), numpy.float32,
ndmin=1)
psd_var_val = numpy.array(numpy.zeros(len(veto_info)), numpy.float32,
ndmin=1)
results['chisq'] = chisq
results['chisq_dof'] = dof
results['sg_chisq'] = sg_chisq
Expand Down
188 changes: 186 additions & 2 deletions pycbc/psd/variation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import numpy
from numpy.fft import rfft, irfft
import scipy.signal as sig
import scipy.interpolate


import pycbc.psd
Expand Down Expand Up @@ -131,7 +132,7 @@ def calc_filt_psd_variation(strain, segment, short_segment, psd_long_segment,
# Create a bandpass filter between low_freq and high_freq
filt = sig.firwin(4 * srate, [low_freq, high_freq], pass_zero=False,
window='hann', nyq=srate / 2)
filt.resize(int(psd_duration * srate))
filt.resize(int(psd_duration * srate), refcheck=False)
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
# Fourier transform the filter and take the absolute value to get
# rid of the phase.
filt = abs(rfft(filt))
Expand Down Expand Up @@ -210,10 +211,193 @@ def find_trigger_value(psd_var, idx, start, sample_rate):
# Extract the PSD variation at trigger time through linear
# interpolation
if not hasattr(psd_var, 'cached_psd_var_interpolant'):
from scipy import interpolate
psd_var.cached_psd_var_interpolant = \
interpolate.interp1d(psd_var.sample_times.numpy(), psd_var.numpy(),
fill_value=1.0, bounds_error=False)
vals = psd_var.cached_psd_var_interpolant(time)

return vals

# FOR PYCBC LIVE
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved


def live_create_filter(psd_estimated,
psd_duration,
sample_rate,
low_freq=20,
high_freq=480):
"""
Create a filter to be used in the calculation of the psd variation for the
PyCBC Live search. This filter combines a bandpass between a lower and
upper frequency and an estimated signal response so that the variance
will be 1 when the filter is applied to white noise.

Within the PyCBC Live search this filter needs to be recreated every time
the estimated psd is updated and needs to be unique for each detector.

Parameters
----------
psd_estimated : pycbc.frequencyseries
The estimated psd currently being used by the PyCBC Live search for
which we want to measure the psd variation of.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
psd_duration : float
The duration of the estimation of the psd, in seconds.
sample_rate : int
The sample rate of the strain data being search over.
low_freq : int (default = 20)
The lower frequency to apply in the bandpass filter.
high_freq : int (default = 480)
The upper frequency to apply in the bandpass filter.

Returns
-------
full_filt : numpy.ndarray
The complete filter to be convolved with the strain data to
find the psd variation value.

"""

# Create a bandpass filter between low_freq and high_freq once
filt = sig.firwin(4 * sample_rate,
[low_freq, high_freq],
pass_zero=False,
window='hann',
nyq=sample_rate / 2)
filt.resize(int(psd_duration * sample_rate), refcheck=False)

# Fourier transform the filter and take the absolute value to get
# rid of the phase.
filt = abs(rfft(filt))

# Extract the psd frequencies to create a representative filter.
freqs = numpy.array(psd_estimated.sample_frequencies, dtype=numpy.float32)
plong = psd_estimated.numpy()

# Create the filter - bandpass, which weight by f^-7/6,
# and whiten. The normalization is chosen so that the variance
# will be one if this filter is applied to white noise which
# already has a variance of one.
fweight = freqs ** (-7./6.) * filt / numpy.sqrt(plong)
fweight[0] = 0.
norm = (sum(abs(fweight) ** 2) / (len(fweight) - 1.)) ** -0.5
fweight *= norm
fwhiten = numpy.sqrt(2. / sample_rate) / numpy.sqrt(plong)
fwhiten[0] = 0.
full_filt = sig.hann(int(psd_duration * sample_rate)) * numpy.roll(
irfft(fwhiten * fweight), int(psd_duration / 2) * sample_rate)

return full_filt


def live_calc_psd_variation(strain,
full_filt,
increment,
data_trim=2.0,
short_stride=0.25):
"""
Calculate the psd variation in the PyCBC Live search.

The Live strain data is convolved with the filter to produce a timeseries
containing the PSD variation values for each sample. This mean square of
the timeseries is then taken over the short_stride to remove the effects of
short duration glitches and further outliers from the mean are replaced
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
within the array. This array is then further averaged every second to
produce a timeseries that will contain a number of value equal to the
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
increment.

Parameters
----------
strain : pycbc.timeseries
Live data being searched through by the PyCBC Live search.
full_filt : some sort of array, scipy so i guess numpy?
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
A filter created by `live_create_filter`.
increment : float
The number of seconds in each increment in the PyCBC Live search.
data_trim : float
The number of seconds to be trimmed from either end of the convolved
timeseries to prevent artefacts.
short_stride : float
The number of seconds to average the PSD variation timeseries over to
remove the effects of short duration glitches.

Returns
-------
psd_var : pycbc.timeseries
A timeseries containing the PSD variation values for each second
an increments number of seconds prior to the end of the input strain
data.

"""
sample_rate = int(strain.sample_rate)

# Grab the last increments worth of data with extra to account for the need
# to trim the data to remove edge effects.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
astrain = strain.time_slice(strain.end_time - increment - (data_trim*3),
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
strain.end_time)

# Convole the data and the filter to produce the PSD variation timeseries,
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
# then trim the beginning and end of the data to prevent edge effects.
wstrain = sig.fftconvolve(astrain, full_filt, mode='same')
wstrain = wstrain[int(data_trim * sample_rate):-int(data_trim * sample_rate)]

# Create a PSD variation array by taking the mean square of the PSD
# variation timeseries every short_stride
short_ms = numpy.mean(wstrain.reshape(-1, int(sample_rate * short_stride)) ** 2,
axis=1)

# Create an array of averages to substitute out outliers in the PSD
# variation array
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
ave = 0.5 * (short_ms[2:] + short_ms[:-2])
outliers = short_ms[1:-1] > (2. * ave)
short_ms[1:-1][outliers] = ave[outliers]

# Calculate the average of the PSD variation array for every second
m_s = []
for idx in range(int(len(short_ms)/(1/short_stride))):
m_s.append(numpy.mean(short_ms[int((1/short_stride) * idx):int((1/short_stride) * (idx + 1))]))

# Convert m_s to a numpy array
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
m_s = numpy.array(m_s, dtype=wstrain.dtype)

# Convert the m_s numpy array to a pycbc timeseries which now contains the
# psd variation value every second.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
psd_var = TimeSeries(m_s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am not sure why the 'epoch' given here is correct. Why does this look significantly different from the offline code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the offline code, it is generating PSD var values for the whole 512 seconds of data and so it is logical to start from strain.start_time when defining the new timeseries.

In the live code we only want to find PSD var values for the newest increment of strain data so taking the end time of the data and subtracting the increment and the extra data from trimming is how I found the epoch of the psd var timeseries.

Example:
start_time = 0
end_time = 512
increment = 8
data_trim = 2

We take the latest data from 504 - 512, pad the start by 6 seconds and trim 2s from both sides. We calculate the mean square and average each second so we have 10 data points to put into our timeseries and we need to line the epoch up which is: end_time - increment - 2*data_trim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the calculation is done starting with data from 498 through 512, 2s are lost for filter artefacts giving 500 through 510, then the epoch is 1 'trim segment' later than the start of the original data, i.e. 2 rather than 3 'trims' plus the analysis stride ('increment').

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoch is 1 trim segment later than the start of the slice of the data we are using to calculate PSD variation, not the start of the original data because that varies depending on how much data PyCBC Live has in the buffer right now.

I could change the epoch to be epoch=astrain.start_time + data_trim if you would prefer? The change mainly being from strain -> astrain where astrain is the sliced strain data.

Copy link
Contributor

@tdent tdent Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved the conversation as I didn't think there was a problem in the end. But maybe it would be good to add a short comment to explain the time accounting, eg The original data segment started at the strain end time - 'increment' - 3 * 'data_trim', the variation time series starts 1 * data_trim later

delta_t=1.0,
epoch=strain.end_time - increment - (data_trim * 3) + data_trim)

return psd_var


def live_find_var_value(triggers,
psd_var_timeseries):
"""
Interpolate between PSD variation values to find the PSD variation value
associated with a specific trigger.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
triggers : dict
A dictionary containing the trigger values to find the PSD variation
of.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
psd_var_timeseries : pycbc.timeseries
A timeseries containing the PSD variation value for each second of the
latest increment in PyCBC Live. Created by live_calc_psd_variation.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
psd_var_vals : numpy.ndarray
A numpy array containing the PSD variation values associated with the
triggers.
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
"""

# Find gps time of the trigger
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
trigger_times = triggers['end_time']

# Interpolate between values
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved
interpolator = scipy.interpolate.interp1d(psd_var_timeseries.sample_times.numpy(),
psd_var_timeseries.numpy(),
fill_value=1.0,
bounds_error=False)
psd_var_vals = interpolator(trigger_times)
ArthurTolley marked this conversation as resolved.
Show resolved Hide resolved

return psd_var_vals