Stack Overflow Data Analysis

My take on extracting insights from the dataset that stack overflow periodically dumps. Using machine learning models I try to find the approximate amount of time it should take for a question to get answered.

Getting Started

Download anaconda and open the file. Download the dataset from here https://www.kaggle.com/stackoverflow/stacksample and place it in the directory containing the directory containing the code file. Once the setup is complete. Just hit run and you'll see the output of the specific blocks.

Methodology

To first undetstand how the questions are rated on stack overflow I took into account the advice penned down by the highest rated stack overflow user, Jon Skeet, and the paper on Evidence-based Guidelines for Writing Questions on Stack Overflow https://arxiv.org/abs/1710.04692 as reference and went forward on checking if a question is good enough or not.

The idea was to give each question a particular score and any question which has a score above the threshold holds a reasonable chance at being answered.

We could use other characterstics of the question such as the askers reputation, the question tags and the current trending tags to try to predict a timeline of responses to such a question.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
stackoverflow-da.ipynb		stackoverflow-da.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stack Overflow Data Analysis

Getting Started

Methodology

License

Acknowledgments

About

Releases

Packages

Languages

RonakKhandelwal/Stack-Overflow

Folders and files

Latest commit

History

Repository files navigation

Stack Overflow Data Analysis

Getting Started

Methodology

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages