GitHub - bitscott1102/twitter-2020

Make sure read the document Twitter Data dictionary to get familar with what's inside the File. For example:

Attribute	Type	Description
created_at	String	UTC time when this Tweet was created. Example:`"created_at": "Wed Oct 10 20:19:24 +0000 2018"`

Please install (Requires Node v10 or higher.# Global so it can be called from anywhere)

npm install -g json2csv # or as a dependency of a project
npm install json2csv --save
sudo npm install -g json2csv json2csv -h
npm install flat

pip3 install flatten_json
pip3 install pandas
pip install -U click

Open the terminal of folder where you have your JSONL file. Run the command, and enter the File name without .jsonl and the number of tweets you want to sample. When a jsonl file is huge, It'll be tricky to check all the information. So we can use a sample CSV, which only takes first N tweets.

python3 convert_file.py

On terminal after the code is finished, three files will be created:

1_flatten_file.json: This is a new json file with all the nested object inside the .jsonl file have been flattened.
1_all_.csv: This file will contains all the information in last file, such as text, user_name, etc.
1_first_3_tweets.csv: This file is the sample file of 1_all_.csv

Please Check the sampled CSV, and decide which columns you need. For exmple, I'm instered in created_at, id, text, user_friends_count, user_followers_count, please go back to terminal, and input:

json2csv -i 1_flatten_file.json -f created_at,id,text,user_friends_count,user_followers_count -o 1_out.csv

A new CSV called 1_out.csv has been created. You can name the outfile any way you like. But the input file has to be the flatten json file. Also be aware that the name should be exactly same as the sample CSV.

Make sure to visit json2csv for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
1.jsonl		1.jsonl
Readme.md		Readme.md
convert_file.py		convert_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

bitscott1102/twitter-2020

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages