Show and tell

This project implements the Show and Tell paper In simple words, it takes an image and produce an English sentence that describes the image.

Model Short Summary

Resnet: takes an image and produce a 300-dimension image embedding vector LSTM: takes the image embedding vector above and a start word token(STK) to produce a prediction for next word (as a probability distribution) Beam Search: Find the optimal sequence of words given by the probability distribution given at each time step(optimal only under some beam size, not optimal in general).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Show and tell

Model Short Summary

Some Screenshots

Files

README.md

Latest commit

History

README.md

File metadata and controls

Show and tell

Model Short Summary

Some Screenshots