This repository contains the code associated with the paper "Optimized sampling of SDSS-IV MaStar spectra for stellar classification using supervised models". The project focuses on the application of active learning methodologies to enhance the efficiency and accuracy of classifying stellar spectra. Traditional machine learning approaches for spectral classification often require vast amounts of labeled data, which is labor-intensive to obtain. Active learning, by contrast, strategically selects the most informative samples for labeling, thus minimizing the required labeled dataset while maximizing model performance.
- Clone the repository:
git clone https://github.com/rehamelkholy/StellarAL.git
- Navigate to the project directory:
cd StellarAL
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- On Windows:
.\venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
- Install the required dependencies:
pip install -r requirements.txt
- To reproduce the results of the paper, you can run the
ipynb
files in the same order as in Files. - The original SDSS data files used in this study can be downloaded from the following links:
README.md
: Documentation filerequirements.txt
: List of Python dependenciesutils.py
: Pre-defined functions collected in one script to be imported at the beginning of eachipynb
file
sec2_data.ipynb
: Initial data preparation and explorationsec3_1_preprocessing.ipynb
: Data pre-processing before applying AL & ML methodssec3_rand_vs_modal.ipynb
: Testing different AL sampling strategies against a random-sampling baselinesec3_n_instances.ipynb
: Testing performance improvement with increasing number of instances using the highest-performing AL sampling strategysec3_4_metrics.ipynb
: Plotting an illustration of the AUC for different example models
This project is licensed under the GPL License.
If you use this code in your projects, you can cite it as
@misc{elkholy2024,
title={Optimized sampling of SDSS-IV MaStar spectra for stellar classification using supervised models},
author={R. El-Kholy and Z. M. Hayman},
year={2024},
eprint={2406.18366},
archivePrefix={arXiv},
primaryClass={astro-ph.SR},
url={https://arxiv.org/abs/2406.18366},
}
Dr. Reham El-Kholy has a PhD from Cairo University where she works as a Lecturer of Astronomy. If you have any questions, requests, or suggestions, you can contact her at relkholy@sci.cu.edu.eg. We hope you will find this useful!