A Statistical overview of the Support Tickets data at GCDO, IBM.
Pandas 1.0.1, numpy 1.17.4, scikit-learn 0.20.0 The code is developed using python3 (3.7.0), and the above libraries. It should mostly run on a different version of the above libraries too.
The IBM Global Chief Data Office (GCDO) is a world-wide team of highly skilled engineers, who develop and maintain the key initiatives that drive IBM's Data and AI transformation.
In the GCDO, we have the Cognitive Enterprise Data Platform (CEDP) that serves as the backbone for data and AI processes across the IBM enterprise. Previously siloed data, converges onto one platform and provides a reliable data source.
CEDP offerings consist of various tools and frameworks. It has a dedicated support team to address concerns/issues on all of its offerings. CEDP Support team uses Jira for issue/ticket tracking and resolution.
For this project, I was interested in exploring a few trends of CEDP Support Tickets for a period of 30 months since initiation of the project. The data is proprietary to IBM, and hence I've uploaded only anonymised sample data of 10 rows.
- Average resolution time for a ticket. How has it progressed over time? How is the distribution?
- Busiest hours of the day, week. Monthy patterns among tickets raised
- Correlation of the resolution time with other factors.
- Prediction of ticket labels from the ticket summary.
There is one notebook file jira_data_analysis.ipynb
. If you wish to test this, please execute the last 3 cells.
In the last three cells, the pre-trained model is loaded and executed on test data.
File classifier.pkl
is where the trained model is stored. jira.csv
contains anonymised sample data.
The main findings of this study can be found here. The jira_data_analysis.ipynb
file has cells explaining
answers/trends for each of the questions above.
Credits to CEDP Support team for providing me access to the data. The data is proprietary to IBM. The sole purpose of the project is to demonstrate few trends in the Ticket data. Without proper authorisation, the code/model should not be used elsewhere.