polar

Content detection of the Polar Dataset

Due March 3, 2016

Instructions: pip install tika

#Instructions for EMR Spark-Zeppelin-Python

Start an EMR cluster with Spark 1.6.0, Hadoop and Zeppelin installed (you have to use the advanced settings page)
Be sure to set TCP incoming and outgoing for SSH on 22 & TCP on 8890 both ways.
Install boto3 via sudo pip install boto3
Check out Zeppelin here: (note you will have to put in your Amazon IP. http://52.90.101.143:8890/

#How to see our visualizations: run python -m http.server in the main folder after cloning the repository to your local computer. #Mime Types http://localhost:8000/mime_types_we_chose/MIME_types.html

#BFA: http://localhost:8000/BFA_Dhruv/d3_histograms/ #BFD: http://localhost:8000/bfd_json/atomXML.html , as well as

atomXML.html
difXML.html
octet_stream.html
pdf.html
rdfxml.html
rssxml.html
xhtmlXML.html

#Mime Types that we chose in a barplot http://localhost:8000/mime_types_we_chose/

We spent alot of time trying to get it set up with Spark, but had many issues with versioning and dependencies. You can see our attempt code here: http://localhost:8000/zeppelin-spark/

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
15type_bfc_matrix		15type_bfc_matrix
15type_signature_json		15type_signature_json
BFA_Dhruv		BFA_Dhruv
FHT		FHT
bfd_json		bfd_json
full_lists		full_lists
mime_types_we_chose		mime_types_we_chose
zeppelin-spark		zeppelin-spark
.gitignore		.gitignore
BFA_hang.py		BFA_hang.py
BFC_cross.py		BFC_cross.py
BFD-Readme.txt		BFD-Readme.txt
BFD.py		BFD.py
CS599.pdf		CS599.pdf
LICENSE		LICENSE
README.md		README.md
assignment.pdf		assignment.pdf
existing_mime_types.json		existing_mime_types.json
first_122k_application_pdf.signature2		first_122k_application_pdf.signature2
main.py		main.py
mime_type_groundtruth_generator.sh		mime_type_groundtruth_generator.sh
s3_dl.py		s3_dl.py
samplefile.txt		samplefile.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

polar

About

Releases

Packages

Contributors 3

Languages

License

Kaspect/polar

Folders and files

Latest commit

History

Repository files navigation

polar

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages