GitHub

News

Oct 15 Update: We decided to release the output files of our baseline models in case they will be helpful for future investigations. Feel free to check it out!

Oct 9 Update: Please note that we've updated the image reading method from cv2 to PIL in the demo notebook. ImageFile.LOAD_TRUNCATED_IMAGES = True is the key to avoid "Image NoneType error".

Download Data

Main Data

The main data is split into two files. One for train+val (36,766+4,966 samples) and the other for test (7,540 samples).

Images

The large img file is compressed and split into 51 chunks of 1GB. Download all chunks before moving to next step.

To unzip and merge all chunks, run 7z x imgs.7z.001

We also provide google drive download links

You are good when you have WebQA_train_val.json, WebQA_test.json, imgs.lineidx and imgs.tsv.

Explore Data

Output Format (A json file with guids as keys)

{<guid>: {'sources': [<image_id>/<snippet_id>, ..., ],
          'answer': "xxxxxxx" },
 <guid>: {...},
 <guid>: {...},

}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
baseline_output_files		baseline_output_files
demo		demo
LICENSE		LICENSE
README.md		README.md
download_imgs(deprecated).sh		download_imgs(deprecated).sh
eval_webqa.md		eval_webqa.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Download Data

Explore Data

Output Format (A json file with guids as keys)

About

Releases

Packages

Contributors 2

Languages

License

WebQnA/WebQA

Folders and files

Latest commit

History

Repository files navigation

News

Download Data

Explore Data

Output Format (A json file with guids as keys)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages