The current codebase will be updated soon to reflect the results found in the following paper:
Deep object detection for waterbird monitoring using aerial imagery
Krish Kabra*,1,
Alexander Xiong*,1,
Wenbin Li*,1,
Minxuan Luo1,
William Lu1,
Raul Garcia1,
Dhananjay Vijay1,
Jiahui Yu1,
Maojie Tang1,
Tianjiao Yu1,
Hank Arnold2,
Anna Vallery2,
Richard Gibbons3,
Arko Barman1
* equal contribution
1Rice University, Houston, TX 77005, USA
2Houston Audubon Society, Houston, TX 77079, USA
3American Bird Conservancy, The Plains, VA 20198, USA
Stay tuned!
Table of Contents
In order to both improve the accuracy of bird counts as well as the speed, Houston Audubon and students from the D2K capstone course at Rice University develop machine learning and computer vision algorithms for the detection of birds using images from UAVs, with the specific goals to:
- Count and survey the number of birds.
- Identify different species of detected birds.
The following open source packages are used in this project:
- Numpy
- Pandas
- Matplotlib
- OpenCV
- Detectron2
- WAndB
code
.
├── configs
├────── (useful sweep config files for WAndB)
├── scripts
├────── data_exploration.py
├── utils
├────── config.py
├────── cropping.py
├────── dataloader.py
├────── evaluation.py
├────── plotting.py
├────── trainer.py
├── README.md
├── requirements.txt
├── data_exploration.py
├── Audubon-Bird-Detection-Tutorial.ipynb
├── train_net.py
├── wandb_train_net.py
- Clone the repository
- Install Pytorch Installation instructions here
- Install Detectron2 Installation instructions here
- Install other dependencies
- Execute the scripts as required.
git clone https://github.com/RiceD2KLab/Audubon_F21.git
Requirements: Linux or macOS with Python ≥ 3.6
pip3 install torch==1.10.0+cu102 torchvision==0.11.1+cu102 -f https://download.pytorch.org/whl/cu102/torch_stable.html
Requirements: Linux or macOS with Python ≥ 3.6
For Windows: Detectron2 is continuously built on Windows with CircleCI. However, official support for it is not provided.
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
# (add --user if you don't have permission)
# Or, to install it from a local clone:
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
# On macOS, you may need to prepend the above commands with a few environment variables:
CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ...
pip install requirements.txt
See train_net.py, wandb_train_net.py, or Colab Notebook for usage of code.
Houston Audubon has provided us a 52 GB image dataset consisting of images captured using DJI M300RTK UAV with a P1 camera attachment. The images are typically 8192 x 5460 high-resolution images. The dataset contains 3 GB annotated images with corresponding CSV files for each image specifying species labels and bounding box locations. The annotated dataset features 19276 birds of 15 species, and the remaining 50.5 GB are raw images without annotations. The CSV files contain:
- species id: unique species id in integer
- species label: species label in words
- x: x min of a bounding box
- y: y min of a bounding box
- width: width of a bounding box
- height: height of a bounding box
The data wrangling module of the pipeline largely involves preparing the data to be fed into deep learning models used to detect objects, namely birds. Our data wrangling process includes:
- Tiling
- Data Augmentation
Principally, deep learning models train faster and have better performances on smaller images. For instance, 600 × 600 pixels is usually an ideal image size for typical object detection deep learning models. Therefore, our first attempt was to split the 8192 × 5460 images into tiles. The size of generated images can be specified by setting parameters and is default to be 600 × 600.
A caveat of this approach is that unavoidably some birds will be cut into two parts and appear in two neighboring patches, as seen in Figure 2. In addition, as counting the number of birds is among the objectives, the same problem needs to be tackled in the detection phase as well. In this case, only the generated image with over 50% fraction of the cropped bird keeps the bounding box, while the remaining fraction of the bounding box in another image is discarded. This means that we are training the model to detect both complete birds and partial birds.
In the detection stage, we will also try to come up with a proper merging mechanism to merge partial detection in neighboring patches and count as one if repeated counting is a common pattern in detection.
Deep learning models are effective with about 1,000 images per class, but some bird species do not have abundant training samples in our dataset. Our team plans to make deep learning models more robust via data augmentation, which means training models with synthetically modified data:
- rotation: Orthogonal or non-orthogonal rotations. Rotation is a natural data augmentation step for our data at hand because the bird images are taken from different angles by drones.
- random crop: Randomly sample a section from the image and resize it to the original image size.
These data augmentation steps help models adapt to different orientations, locations, and scales of the same object class, and will boost the performance of the models.
We utilized the imgaug library to generate modified images. We have tried several types of augmentations: flipping, blurring, adding Gaussian noise and changing color contrasts.
For the time being, our model is only trained on original data. We plan to retrain our model on the augmented dataset and compare performances. We are generating a larger training set using the augmentation methods mentioned above. Specifically, both the original images and the transformed images will be fed to the model in the training phase, but only original images will be used for evaluation and testing purposes.
We utilize a RetinaNet and Faster R-CNN module both with a ResNet-50-FPN backbone. We first train our model to perform the simple task of detecting birds with no distinction of species. We then train the model to identify bird species: namely, Brown Pelicans, Laughing Gulls, Mixed Terns, Great Blue Herons, and Great Egrets/White Morphs.
Due to the lack of annotated data available for other bird species, we re-label all other bird species under the "Other/Unknown" category.
Note: The model weights used to initialize both the bird-only and bird-species detector come from a pre-trained model on the MS COCO dataset.
- Bird-only detector (RetinaNet ResNet-50 FPN)
- Bird species detector (Faster R-CNN ResNet-50 FPN)
Birds | |
---|---|
AP (IoU = 0.5) | 93.7% |
AP (IoU = 0.75) | 26.4% |
mAP | 43.7% |
The high AP of 93.7% using an IoU threshold of 0.50 is very promising.
The mAP of 43.7% is comparableto the state-of-the-art results for challenging object detection tasks such as on the COCO dataset.
Brown Pelican | Laughing Gull | Mixed Tern | Great Blue Heron | Great Egret/White Morph | Other/Unknown | Overall | |
---|---|---|---|---|---|---|---|
AP (IoU = 0.5) | 98.8% | 100.0% | 97.6% | 98.5% | 96.9% | 0.0% | 82.0% |
The higher AP for all bird species using an IoU threshold of 0.50 in comparison to the bird-only detector is excellent, except for the “Other/Unknown” categroy, where the model drastically fails to classify. Nevertheless, we can combine the results from a bird-only detector and bird-species detector to recover the poor performance of the "Other/Unknown" bird category.
Krish Kabra
Email: krish.kabra@rice.edu
GitHub: @krishk97
Minxuan Luo
Email: ml122@rice.edu
GitHub: @minxuanluo
Alexander Xiong
Email: xionga27@rice.edu
GitHub: @awx1
William Lu wyl1@rice.edu
Email:
GitHub: @
Anna Vallery
Email: avallery@houstonaudubon.org
Richard Gibbons Lu
Email: rgibbons@houstonaudubon.org
Hank Arnold
Email: hmarnold@msn.com
✤ This was the project for the course COMP 449/549 - Machine Learning and Data Science Projects (Fall 2021), at Rice University