GSpyNetTree task

GSpyNetTree, the Gravity Spy Convolutional Neural Network Decision Tree, is a data quality report task that uses machine learning to determine whether a glitch is present at the time of a candidate event. GSpyNetTree leverages a decision tree of multilabel CNN classifiers, sorted via total estimated gravitational-wave (GW) candidate mass, and trained with morphologically similar glitches. This task is based on Alvarez-Lopez et al. 2023, and a new paper on the O4-version of GSpyNetTree is in preparation.

Requirements

This task requires the following packages that may not be included in the requirements list for dqrtasks:

  • gwpy

  • gwdetchar

  • tensorflow`

Description

GSpyNetTree leverages the InceptionV3 architecture to classify GW event candidates. GSpyNetTree first intakes the total mass of the candidate, as reported in the preferred event of the associated superevent on GraceDB. As the total mass \(M\) of an event affects its morphological appearance, GSpyNetTree has three different classifiers: the low-mass (LM) classifier (i.e., events with \(M < 50 M_\odot\)), the high-mass (HM) classifier (i.e., \(50 M_\odot \leq M < 250 M_\odot\)), and the extremely high-mass (EHM) classifier (i.e., \(M \geq 250 M_\odot\)). In addition to the most common glitches in all detectors (namely, light scattering, fast scattering, and low-frequency lines), each of the classifiers is trained with morphologically similar glitches (which vary depending on the classifier), given below:

  • Low-mass classifier: Low-frequency blip, blip, scratchy, and koi fish. (Though not morphologically similar, the koi fish class is included to account for very loud glitches that might overlap with a low-mass GW signal).

  • High-mass classifier: Low-frequency blip, blip, tomte, and koi fish.

  • Extremely high-mass classifier: Low-frequency blip, and blip.

For the GW signals, GSpyNetTree generates GW simulations using LALSuite’s inspiral injection function and the waveform model IMRPhenomPv2. GSpyNetTree’s GW examples are uniformly drawn from a total merger mass range of \(5 M_\odot\) to \(350 M_\odot\), with individual masses ranging from \(2 M_\odot\) to \(175 M_\odot\), an signal-to-noise ratio (SNR) range of 8 to 35, and individual component spins ranging from 0.05 to 0.95. In addition to these simulations, GSpyNetTree also considers a “No_Glitch” class in all classifiers. No_Glitch examples are clean detector times in which no data quality issues were identified. These clean times are similar to low SNR signals (particularly for low-mass GW events) in a Q-transform, a time-frequency spectrogram used for classification. If a given superevent is classified as a GW or No_Glitch, no data quality issue is flagged.

GSpyNetTree leverages a multilabel architecture for its CNNs, which means it also considers cases where a GW candidate and a glitch overlap in time (and frequency). With a multilabel architecture, GSpyNetTree is able to predict 0 or more labels for each candidate, by returning a probability ranging from 0 to 1 for each considered class. This way, the sum of the probabilities of all labels is not 1 (as occurs for multiclass classifiers, where the classes are mutually exclusive). Instead, the probability of each label can take any value from 0 to 1, and a label is said to be predicted by GSpyNetTree if its probability is greater than or equal to 0.5. In the case were no label surpasses the 50% threshold, no labels are predicted and a “human input needed” message is displayed.

If GSpyNetTree predicts that a glitch is present (including the case where a GW and/or No_Glitch label is simultaneously predicted with a glitch), GspyNetTree needs to determine if a data quality issue should be flagged. To do this, GSpyNetTree uses the glitch p-value, which ranges from 0 (data quality issue identified) to 1 (no data quality issue identified). A data quality issue is flagged whenever the p-value is below 0.05.

The glitch p-value is calculated as 1 - max(all glitch probabilities), such that if the probability of the glitch is very high, the p-value will be near zero and a data quality issue will be flagged. Similarly, in cases were GSpyNetTree is very confident about a GW/No_Glitch prediction, the glitch probabilities are generally very low and the glitch p-value will be almost 1. Note that the GW/No_Glitch probability is not used to calculate the glitch p-value.

Example command-line

This is the help message shown when running GSpyNetTree:

$ dqr-gspynettree --help
usage: dqr-gspynettree [-h]
                       [--log-level {DEBUG,INFO,WARNING,ERROR}]
                       [--log-file LOG_FILE] --output-dir OUTPUT_DIR
                       [--id ID] --ifo {H1,L1,V1} --channel CHANNEL --gps GPS
                       --start START --end END --mtotal MTOTAL
                       [--frametype FRAMETYPE]
                       [--p-value-threshold P_VALUE_THRESHOLD]
                       [-V] [--lm-model LM_MODEL] [--hm-model HM_MODEL]
                       [--ehm-model EHM_MODEL]

GSpyNetTree: Gravity Spy Convolutional Neural Network Decision Tree

optional arguments:
  -h, --help            show this help message and exit
  --log-level {DEBUG,INFO,WARNING,ERROR} log level
  --log-file LOG_FILE   write logs to file (default: log file in output-dir)
  --output-dir OUTPUT_DIR output directory
  --id ID               identifier for event of interest
  --ifo {H1,L1,V1}      target detector for event of interest (ex. H1)
  --channel CHANNEL     target channel for analysis (ex. H1:GDS-CALIB_STRAIN)
  --gps GPS             GPS time for event of interest
  --start START         GPS start time for event of interest
  --end END             GPS end time for event of interest
  --mtotal MTOTAL       CBC total mass for event of interest
  --frametype FRAMETYPE Data frametype
  --p-value-threshold P_VALUE_THRESHOLD
                        Defined threshold for the p-value (Default: 0.05)
  -V, --version         show the program version number and exit
  --lm-model LM_MODEL   saved TensorFlow model for low mass classifier
  --hm-model HM_MODEL   saved TensorFlow model for high mass classifier
  --ehm-model EHM_MODEL saved TensorFlow model for extreme high mass classifier

Classifier for glitch-GW discrimination based on strain data qscans.

Example config

[gspynettree]
description = Gravity Spy Convolutional Neural Network Decision Tree
librarian = sofia.alvarez@ligo.org
tier = 1
question = Is the superevent a glitch or does it overlap with a glitch?
iterate = l1 h1 v1
executable = gspynettree
arguments = "--output-dir ${outdir} --id ${graceid} --ifo ${ifo} --channel ${channel} --gps ${t_0} --start ${t_start} --end ${t_end} --mtotal ${mtotal}"

Example results page for a GW with no data quality issues recognized by the task

Full webpage can be accessed here.

Example results page for a fast scattering glitch

Full webpage can be accessed here.