Skip to content

This project uses Deep Learning to extract Salient text from an image using State-of-the-Art Vision Transformer Architecture.

License

Notifications You must be signed in to change notification settings

reidenong/VisuallySalientText

Repository files navigation

Visually Salient Text (VST)

Visual saliency is the distinct quality which makes some items stand out from the others and grab our attention.

This project uses Deep Learning to extract Salient text from an image using State-of-the-Art Vision Transformer Architecture.


image


The model used is the Visual Saliency Transformer, which was trained on a synthetically generated dataset which focused on textual saliency considerations. This Dataset consists of images in the formats of news articles, memes, advertisements and other commonly found internet images. Usage of Text Saliency Models include filtering out noise in text-rich environments, as well as improving OCR quality when in the wild.


Examples of Text Saliency used with EasyOCR

image

Raw Text: TAROT PREDICTS HUNG HOUSE INDIA TODAY IN UTTAR PRADESH 540 INDIA EXCLUSIVE TODAY MAN WHO SPOKE T0 SAIFULLAH DURING ENCOUNTER JALu Msn B PM Iop SheeLA BaJaJ, TarOT CarD READER Mt indiatoday-in NeWS LUCKNOW ENCOUNTER FLASH Saifullah died in exchange 0f fire Pm

Salient Text: TAROT PREDICTS HUNG HOUSE INDIA IN UTTAR PRADESH INDIA TODAY NeWS LUCKNOW ENCOUNTER FLASH Saifullah died in exchange of fire


image

Raw text: NEED TO LOSE 30 POUNDS? TRY SENSA FREEI SENSA" is clinically proven to help you lose 30 Ibs without dieting or spending all your time working out: Just sprinkle on your food; eat and lose weight! GET A GYM BODY WIthout GOING TO THE GYM NO COUNTING CALORIES NO STIMULANTS NO PILLS for Doesn t taste of the your foodl Try SENSA'FREEI Mfll SensaOftercom /OKer (8001750-6971 VoI | CLINICALLY PROVEN: 100% SATISFACTION GUARANTEED: SENSA eocicgdodged Ca Cla npmnd nn nandtan GNCLVWcll Oeoi S S ne nite Deantat #op hanehroloadceoh A A dtatd CCdedoDado nolcamnatndndniot enn GPECIa< OKI 6 change ncadans SENSA CL

Salient Text: 30 POUNDS? TRY SENSA FREEI GET A GYM BODY Try SENSA'FREEI


Usage of VST

Directory Structure of Key Components

VisuallySalientText
├── VST_DEMO.ipynb
├── Models
    ├── PretrainedModels
    |   └── 80.7_T2T_ViT_t_14.pth.tar***
    ├── Checkpoints
    |   └── RGB_VST.pth***
    └── Decoder.py, Transformer.py, ...
├── Data
    ├── OCSD
    │   ├── OCSD-TR     (training set)
    │   │   ├── OCSD-TR-Image
    │   │   │   └── img0.jpg, img1.jpg, ...
    │   │   └── OCSD-TR-Mask
    │   │   │   └── img0.png, img1.png, ...
    │   │   └── OCSD-TR-Contour
    │   │   │   └── img0.png, img1.png, ...
    │   ├── OCSD-TE     (testing set)
    │   │   ├── images
    │   │   │   └── img0.jpg, img1.jpg...
...

(***) Create the directories and download their respective model/weights for PretrainedModels and Checkpoints

  • The directory structure here is for the Optical Character Saliency Dataset, but will also work for any dataset with Image-Mask-Contour formatted directories
  • Due to the small, convoluted nature of optical characters, the Contour Masks are largely unecessary for text saliency and can be replaced with a copy of the saliency masks

Saliency Inference / Testing

For images in the directory Data/Dataset/images/image0.jpg

$ python VST.py --test_paths Dataset/


Saliency Mask Visualization (Overlay)

  • Refer to PredictionHeatmapVisualization.ipynb

Saliency-OCR Integration with EasyOCR

For images in the directory Data/Dataset/images/image0.jpg and masks in the directory Predictions/Dataset/RGB_VST/.

$ python SalOCR.py --imagefilepath Data/Dataset/images/ --maskfilepath Predictions/Dataset/RGB_VST/

Text Output will be in TextOutput/ in JSON format.


Training

$ python VST.py --Training True --Testing False

About

This project uses Deep Learning to extract Salient text from an image using State-of-the-Art Vision Transformer Architecture.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published