This repo explore and model the dataset of data provided by Kaggle with the intent to find stars with at least one exoplanet orbiting around. The data describe the change in flux (light intensity) of several thousand stars. Each star has a binary label of 2 or 1. 2 indicated that that the star is confirmed to have at least one exoplanet in orbit; some observations are in fact multi-planet systems.
Planets themselves do not emit light, but the stars that they orbit do. If said star is watched over several months or years, there may be a regular 'dimming' of the flux (the light intensity). This is evidence that there may be an orbiting body around the star.
The repository contains two main notebooks:
data_explotation.ipynb This first notebook aims to explore the data and get some important insights about the dataset. Specifically a visualization of how the time-series looks like has shown how (often) stars with exoplanets and stars without exoplanets have different range of flux fluctuation over time. This propriety can be levered by a ML/DL model to distinguish the two classes.
model_train.ipynb The second notebook contains all the code used to model the data with SVC - Support Vector Classifier used as baseline model and a CNN coded with TensorFlow. The latter are reported in the final notes (at the botton of the notebook) performs better since can better explore the features provided.
In the following folder an implementation of a small pipeline with xgboost.
In order to run the model some basic libraries are needed. Run the following command-lines to create a new conda environment and install the required libraries.
conda create -n exo_hunting python=3.6 -y
conda activate exo_hunting
pip install numpy, tensorflow-gpu, seaborn, scikit-learn
The data used in the repo is provided by Kaggle: Exoplanet Hunting in Deep Space.
It can be downloaded with Kaggle API with the following command:
kaggle datasets download -d keplersmachines/kepler-labelled-time-series-data
- DNN algorithm over the dataset.
- Run model training on GCP ML engine.