DEPRECATED: This project is out of support due to several changes in the Twitter (and hence, `tweepy`) API.

Spritzer Tweets

Download and process tweets from the "Spritzer" Twitter archive

Requires

python >= 3.8 and poetry

What to do

Install requirements

pip install poetry
git submodule init
git submodule update
poetry install

Download data

python bin/get_data.py

Warning: this will take hours

Unzip data

cd data
find . -name "*.tar" | xargs -n1 tar xf
# -P4 is for 4 parallel processes
find . -name *.bz2 | xargs -n1 bzip2 _P4 -d
cd ..

Generate plain text file

python bin/generate_text_file tweets.txt

or...

Save to mongo database

python bin/save_to_mongo.py <mongo_db> <lang>

Note that bin/save_to_mongo.py also erases files as they are processed

Scraping more data from users

After saving to mongo, one thing we can do to expand our database is to fetch tweets from the user we got in the previous stage.

To do so, just run

python get_tweets_from_users.py <mongo_db> <app_files>

Beware that this will take days too! So run it and go do something else

Generating txt dumps

To dump everything

python bin/async_generate_txts_line_by_line.py spritzer-tweets dumps/spanish_tweets --num_workers 100

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
bin		bin
config		config
notebooks		notebooks
tweepyrate @ 9468429		tweepyrate @ 9468429
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEPRECATED: This project is out of support due to several changes in the Twitter (and hence, `tweepy`) API.

Spritzer Tweets

Requires

What to do

Scraping more data from users

Generating txt dumps

About

Releases

Packages

Languages

pysentimiento/spritzer-tweets

Folders and files

Latest commit

History

Repository files navigation

DEPRECATED: This project is out of support due to several changes in the Twitter (and hence, tweepy) API.

Spritzer Tweets

Requires

What to do

Scraping more data from users

Generating txt dumps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

DEPRECATED: This project is out of support due to several changes in the Twitter (and hence, `tweepy`) API.

Packages