🤖Auto Caption Generation for Images📸

Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. Deep Learning using CNNs-LSTMs can be used to solve this problem of generating a caption for a given image, hence called Image Captioning.

⌛Output that we get💻

There's a lot of biasing as well, since training data wasn't big enough!

🖐️LETS TAKE A QUICK DIVE INTO THIS BIT OF MAGIC!😇

Dataset:

Flickr 8k (containing 8k images),
Flickr 30k (containing 30k images),
MS COCO (containing 180k images), etc.

Point to Note:

Here I have used the Flickr8k dataset based on the availability of standard computational resources. This dataset is the best for 8GB RAM, and takes about 25mins/epoch training on a CPU. Flickr30k and MS COCO may need about 32GB-64GB RAM based on how it's processed. Consider using AWS EC2 workstation for the best and fastest output. Its paid tho😞!

General Architecture

Model Architecture(VGG16 + LSTMs)

We remove the last 2 layers of VGG16 and pass it to 👇

📊Data that we feed into the Network!📁

References and Bibliography:

https://towardsdatascience.com/image-captioning-with-keras-teaching-computers-to-describe-pictures-c88a46a311b8
https://towardsdatascience.com/image-captioning-in-deep-learning-9cd23fb4d8d2
https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
https://www.youtube.com/watch?v=NmoW_AYWkb4
https://www.kaggle.com/shadabhussain/automated-image-captioning-flickr8

Paper

https://arxiv.org/abs/1411.4555

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🤖Auto Caption Generation for Images📸

⌛Output that we get💻

There's a lot of biasing as well, since training data wasn't big enough!

🖐️LETS TAKE A QUICK DIVE INTO THIS BIT OF MAGIC!😇

Dataset:

General Architecture

Model Architecture(VGG16 + LSTMs)

We remove the last 2 layers of VGG16 and pass it to 👇

📊Data that we feed into the Network!📁

References and Bibliography:

Paper

Files

README.md

Latest commit

History

README.md

File metadata and controls

🤖Auto Caption Generation for Images📸

⌛Output that we get💻

There's a lot of biasing as well, since training data wasn't big enough!

🖐️LETS TAKE A QUICK DIVE INTO THIS BIT OF MAGIC!😇

Dataset:

General Architecture

Model Architecture(VGG16 + LSTMs)

We remove the last 2 layers of VGG16 and pass it to 👇

📊Data that we feed into the Network!📁

References and Bibliography:

Paper