Reading list for research papers in Data Science field. Keep Updating!
Inspired by Andrew Ng's advice on building a Machine Learning career, I feel learning from reading in Data Science really necessary. Also, I found it super useful during preparing paper presentation within company teams where I'm doing co-op. I've learned a lot through reading and summarizing and the whole process could have been enjoyable and less painfull if I did not consider it too difficult and high-level. So I decided to make it on a regular basis.
- I do not want myself end up reading for reading. After finishing one paper, I should at lease have to be able to fill the following template
- What did the author(s) try to accomplish?
- What were the key elements of the approach(s)?
- List the concepts/techniques that are new to me and highlight whatever I feel necessary to put into my skill set if there is any.
- Thoughts and questions
- Pick one paper and write comments/reviews notes in more detail (like a tech post)
at least biweekly(this unfortunatly has to degrade to monthly before I find new jobs)
If it's simply answering questions in first bullet, mark as [summary]. If containing second case, mark as [note] (either way just create a new folder and a readme file under it for better formatting)
- 1. Describing like Humans: on Diversity in Image Captioning (have presented to the company teams internally)
- 2. Distributed Representations of Words and Phrases and their Compositionality
- 3. The Structural Topic Model and Applied Social Science
- 4. Sentiment Analysis of Movie Review Comments
- 5. Going Deeper with Convolutions
- 6. Class-Balanced Loss Based on Effective Number of Samples
- 7. SocialStories: Segmentating Stories within Trending Twitter TopicsSlides
- 8. Responsible Team Players Wanted: an Analysis of Soft Skill Requirements in Job Advertisements
- 9. (Google AI Blog) Learning Cross-Modal Temporal Representations from Unlabeled Videos
- 10. A Balanced Perspective on Prediction and Inference for Data Science in Industry
- 11. Anxious Depression Prediction in Real-time Social Data
- 12. NGBoost: Natural Gradient Boosting for Probabilistic Prediction
- 13. (Blog) Are All Kernels Cursed?
- 14. Clust-LDA: Joint Model for Text Mining and Author Group Inference
- 15. XGBoost: A Scalable Tree Boosting System
- 16. LightGBM: A Highly Efficient Gradient Boosting Decision Tree
- 17. SSD: Single Shot MultiBox Detector
- 18. (Blog) NLP's ImageNet Moment has arrive
- 19. Evading Real-Time Person Detectors by Adversarial T-shirt
- 20. Snapshot ENsembles: Train 1, Get M for Free
- 21. Hierachical Attention Networks for Document Classification
- 22. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- 23. The Annotated Transformer
- 24. Causality in Machine Learning
- 25. Batch Normalization: Accelerating Deep Network Training by Reduce Internal Covariate Shift
- 26. DiffTaiChi: Differentiable Programming for Physical Simulation
- 27. Delayed Impact of Fair Machine Learning
- 28. The Dark Secrets of BERT
- 29. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Model
- 30. MapReduce: Simplifed Data Processing on Large Clusters
- 31. On Random Sampling over Joins
- 32. Model-powered Conditional Independence Test
- 33. BERT, ELMo, & GPT-2: How contextual are contextualized word representations?
- 34. The Google File System
- 35. Lung Infection Quantification of COVID-19 in CT Images with Deep Learning
- 36. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList