- Training Documents using Tags.
- Predicting both tags and embedding vectors of test document to classify them, and to find the nearest document in train set.
- Two documents which have the most similar Doc2Vec embeddings are similar documents.
- Crop the heading part of images
- Find a pretrained feature vector online on tfhub.dev or other sources.
- Run these pretrained feature vectors on all the templates ( training data) , and store them.
- Take any input from the input folder ( test set), get its feature vectors.
- Using distance metric like Euclidean, Manhattan to find which image in template is nearest to the Input
- Add tags while training, return tags during prediction # Completed
- Predict multiple files at the same time, return a dictionary of outputs
- Find if there is a function in gensim for prediction, instead of manually calcuating distances