A PySpark-based Locality-Sensitive Hashing (LSH) model applied to the Dark Energy Survey Y3 Gold coadd catalog to perform an approximate similarity search for Low Surface Brightness Galaxies.
A paper and a poster detailing the techniques were published in the LatinX in AI (LXAI) Research Workshop at NeurIPS 2022! (paper)
Low Surface Brightness Galaxies (LSBGs) constitute an important segment of the galaxy population, however, due to their diffuse nature, their search is challenging. The detection of LSBGs is usually done with a combination of parametric methods and visual inspection, which becomes unfeasible for future astronomical surveys that will collect petabytes of data. Thus, in this work we explore the usage of Locality-Sensitive Hashing for the approximate similarity search of LSBGs in large astronomical catalogs. We use 11670190 objects from the Dark Energy Survey Y3 Gold coadd catalog to create an approximate
Our LSH pipeline is available on lsh.py
, and it contains the code we used to load & process the data as well as the code we used for our model and for our tests. In it we search for both LSBGs and artifacts (objects that are similar to LSBGs, but aren't considered LSBGs).
For our data we use the Dark Energy Survey Y3 Gold coadd catalog to train our model and perform our searches. The query we used to gather the data on DESacces is available on query.sql
.
We used Tanoglidis et al.'s LSBG catalog for our LSBG keys and Tanoglidis et al.'s LSBG artifact catalog for our artifact keys.
We obtain our results by performing searches with random keys (on lsh.py
). We used the code available on results_viz.py
to create our visualizations. The figs/
directory contains the visualizations for our other keys.