Scripts that were used to scrape and process data from Yandex.Q. The resulting dataset can be found here.
Some scripts are messy, but they get the job done.
parse_questions_search.py
- to parse questions by searching all 4 letter combinations, because of the 1000 items limit per searchparse_question_ids.py
- to parse question ids by using question recommendation endpointget_ids.py
- to extract ids from questions that were retrieved from searchparse_qa.py
- to parse all question info from ids collected