-
Notifications
You must be signed in to change notification settings - Fork 81
Capstone Project 2
- https://www.dataquest.io/blog/build-a-data-science-portfolio/
- https://www.dataquest.io/blog/data-science-portfolio-project/
- https://www.analyticsvidhya.com/blog/2017/10/essential-nlp-guide-data-scientists-top-10-nlp-tasks/#
- https://www.dataquest.io/blog/data-science-portfolio-machine-learning/
- https://www.dataquest.io/blog/how-to-setup-a-data-science-blog/
- https://www.kdnuggets.com/2018/09/5-resources-inspire-data-science-project.html
- http://www.omdbapi.com/
- https://www.imdb.com/interfaces/
- https://datasets.imdbws.com/
- https://stackoverflow.com/questions/1966503/does-imdb-provide-an-api
- Interesting project: https://blog.dataiku.com/whats-in-an-emoji-decoding-millennial-speak-with-data-science
- https://monkeylearn.com/sentiment-analysis/
Passion Project:
-
http://lilt.ics.hawaii.edu/papers/2014/Ibanez-Suthers-HICSS-2014.pdf
-
http://uu.diva-portal.org/smash/get/diva2:846981/FULLTEXT01.pdf
-
https://faculty.ist.psu.edu/xu/papers/Chen_etal_SocialCom_2012.pdf
-
https://ro.ecu.edu.au/cgi/viewcontent.cgi?article=1044&context=asi
-
https://www.glassdoor.com/Job/nlp-data-scientist-jobs-SRCH_KO0,18.htm
Interesting projects:
- https://nycdatascience.com/blog/student-works/majority-illusion-effect-on-news-through-social-media/
- https://nycdatascience.com/blog/student-works/capstone/fashion-rec/
- https://nycdatascience.com/blog/student-works/spatial-data-science-applied-arcpy-scikit-learn-for-predicting-hotel-room-prices/
- https://nycdatascience.com/blog/student-works/employee-retention-analysis-capstone/
- https://nycdatascience.com/blog/student-works/web-scraping/build-near-real-time-twitter-streaming-analytical-pipeline-scratch-using-spark-aws/
- https://nycdatascience.com/blog/student-works/knowhere-app-automatically-profiling-commute-using-smart-phone-sensor-data/
- https://nycdatascience.com/blog/student-works/book-rating-prediction-recommendation-engine/
- https://nycdatascience.com/blog/student-works/15212/
- https://www.datasciencecentral.com/profiles/blogs/sample-projects-for-data-scientists-in-training
- https://cds.nyu.edu/text-data-speaker-series/'
NLP Projects with Cases:
-
Get emails with cases using "Cases with Emails Report" from sfdc and narrow the analysis to the first email that generates the case i.e. the customer query
-
https://emerj.com/ai-podcast-interviews/nlp-for-customer-service-how-does-it-work/
-
https://www.answeriq.com/blog/ai-101-the-basics-of-automation-for-customer-support
-
https://blog.aimultiple.com/customer-service-ai/ https://callminer.com/blog/leveraging-natural-language-processing-to-its-fullest/
-
https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/63661
-
https://www.kdnuggets.com/2018/08/practitioners-guide-processing-understanding-text-2.html
-
https://www.kdnuggets.com/2018/08/practitioners-guide-processing-understanding-text-2.html
Chatbots:
- https://chatbotslife.com/ultimate-guide-to-leveraging-nlp-machine-learning-for-you-chatbot-531ff2dd870c
- https://chatbotslife.com/ultimate-guide-to-leveraging-nlp-machine-learning-for-you-chatbot-531ff2dd870c
Resources:
-
https://courses.lumenlearning.com/boundless-psychology/chapter/introduction-to-language/
-
https://realpython.com/python-web-scraping-practical-introduction/
-
https://realpython.com/modern-web-automation-with-python-and-selenium/
-
https://nycdatascience.com/blog/student-works/web-scraping/scraping-gofundme/
-
https://stackoverflow.com/questions/33503993/read-in-all-csv-files-from-a-directory-using-python
Reading in a bunch of files + OS Walk:
- https://docs.python.org/3/library/os.html
- https://automatetheboringstuff.com/chapter8/
- https://stackoverflow.com/questions/14183930/os-walk-get-directory-names
- https://stackoverflow.com/questions/16597265/appending-to-an-empty-data-frame-in-pandas
- https://www.pythonforbeginners.com/concatenation/string-concatenation-and-formatting-in-python
- https://stackoverflow.com/questions/28799353/python-giving-filenotfounderror-for-file-name-returned-by-os-listdir
- https://realpython.com/python-csv/
- https://stackoverflow.com/questions/43981725/data-argument-cant-be-an-iterator?rq=1
DataFrames.plot()
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
- https://stackoverflow.com/questions/21654635/scatter-plots-in-pandas-pyplot-how-to-plot-by-category
- https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html
- https://stackoverflow.com/questions/21654635/scatter-plots-in-pandas-pyplot-how-to-plot-by-category
- https://stackoverflow.com/questions/25830588/r-lattice-like-plots-with-python-pandas-and-matplotlib
- http://seaborn.pydata.org/tutorial/axis_grids.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.hist.html
- https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
Subset rows based on column conditions:
-
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
-
https://stackoverflow.com/questions/15235703/typeerror-unsupported-operand-types-for-str-and-str
-
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
-
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
-
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
-
https://pythonprogramming.net/concatenate-append-data-analysis-python-pandas-tutorial/
-
https://stackoverflow.com/questions/47074958/python-function-partial-string-match
-
https://stackoverflow.com/questions/10406130/check-if-something-is-not-in-a-list-in-python
-
https://stackoverflow.com/questions/20968823/in-python-iterate-over-each-string-in-a-list
-
https://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-substring-method
-
https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.append.html
-
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
https://en.wikipedia.org/wiki/Hierarchical_clustering#Divisive_clustering
Using Python + Sqlite:
- https://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly
- http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html
- https://medium.com/@mahmudahsan/how-to-use-python-sqlite3-using-sqlalchemy-158f9c54eb32
- https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91
Reading in csvs using python:
- https://www.alexkras.com/how-to-read-csv-file-in-python/ <+ creating tuple which can be easily inserted
- reading csv into dictionary: https://developer.rhino3d.com/guides/rhinopython/python-csv-file/
- https://stackabuse.com/reading-and-writing-csv-files-in-python/ <= more about reading csv in as a dictionary
Tutorials involving sql-ite and python:
- Text Analytics Series: https://tutorials.datasciencedojo.com/text-analytics-with-r-overview/
- https://stackoverflow.com/questions/38287772/cbow-v-s-skip-gram-why-invert-context-and-target-words
- https://www.quora.com/What-are-the-continuous-bag-of-words-and-skip-gram-architectures
- https://stanford.edu/~kartiks2/kickstarter.pdf
- https://stanford.edu/~kartiks2/kickstarter.pdf
- https://towardsdatascience.com/should-i-use-kickstarter-to-fund-my-idea-2a56b40c9d44
- https://nlp.stanford.edu/courses/cs224n/2015/reports/19.pdf
Unrelated: Model Interpretability:
-
https://twitter.com/srikanth_vikas/status/1092585613314772992
-
https://stackoverflow.com/questions/37917615/assign-the-result-of-a-loop-to-a-variable-in-python
-
https://www.datacamp.com/community/tutorials/scope-of-variables-python#LEGB
-
https://machinelearningmastery.com/introduction-to-expected-value-variance-and-covariance/
-
https://stackoverflow.com/questions/22611446/perform-2-sample-t-test
https://nlp.johnsnowlabs.com/quickstart.html
https://www.oreilly.com/library/view/apache-spark-for/9781785880100/ch06.html
BigQuery:
- https://www.kaggle.com/rtatman/sql-scavenger-hunt-handbook/?utm_medium=blog&utm_source=medium&utm_campaign=fcc
- https://medium.freecodecamp.org/the-four-data-science-skills-i-didnt-learn-in-grad-school-and-how-to-learn-them-f2b039fc0f59
- https://www.kaggle.com/sohier/beyond-queries-exploring-the-bigquery-api
- https://cloud.google.com/blog/products/gcp/busting-12-myths-about-bigquery
- https://cloud.google.com/bigquery/docs/bigqueryml-scientist-start
- https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start
NLP: