The program takes an image from the Flicker8 image set and outputs a description of the picture it "sees". Currently it only works with Flicker8 images for the following reasons:
The image feature vector extracted from InceptionV3 (please see arhitecture below) is pre-extracted for all Flicker8 images and stored on this repo. This could be easly converted in real time extractions for new images, on a better computer.
The image descriptions (text generations) have been trained on the Flicker8 captions. This could be extended on a larger data set, like Flicker30. The steps are however the same, as in this repo.
I have used the following model arhitecture, as depicted in the picture below.
The interface is simple, and built in Flask (see gif).
For this pupose I have used the following tools:
- Python
- Keras: InceptionV3, LSTM and Dense layers
- word2vec, for pre-training word emedding
- HTML, CSS, Flask
In a terminal:
- Clone this repo:
git clone
- Install the necessary libraries:
pip install -r requirements.txt
- Make sure you are in the main directory Generate_image_captions
- Download the Flicker8 image set.
- Run these three commands:
export FLASK_DEBUG=1
flask run
- Open the listed localhost in a browser.
- Try out some phothos!
The project was tested on Chrome, Firefox and Opera!