Fix a critical bug of Splunk results reader, lack of pagination #657

Tatsuya-hasegawa · 2023-04-18T04:28:16Z

Hello msticpy team,

I found a critical bug which cannot retrieve only first 100 records from Splunk with SplunkGeneral.get_events_parameterized function in spite of add_query_items='', count=0.

msticpy version = 2.4.0 (latest)

Evidence file

Diff Out[7] and Out[8] cells of msticpy_splunk_reader_bug.ipynb.
- Out[7] was only 100 records.

It happens when oneshot=False and I found that the pagination was not implemented in normal search (i.e. kwargs_normalsearch = {"exec_mode": "normal"}).
That is why Splunk SDK's resultReader never fetch except first 100 records.

This PR is to implement the pagination and add paginate_width implicit option to the function.

In addition, I suggest the replacement from ResultReader to JSONResultReader to avoiding the deprecation warning.

The warning was below on the previous code.

/site-packages/msticpy/data/drivers/splunk_driver.py:234: DeprecationWarning: ResultsReader is a deprecated function. Use the JSONResultsReader function instead in conjuction with the 'output_mode' query param set to 'json'
  reader = sp_results.ResultsReader(query_job.results())

I replaced as following.

reader = sp_results.JSONResultsReader(search_results) #due to DeprecationWarning of normal ResultsReader

Thank you.
I'm happy this will be merged.
Best regards,

update to latest

Tatsuya-hasegawa · 2023-04-18T04:47:12Z

@microsoft-github-policy-service agree

Tatsuya-hasegawa · 2023-04-18T05:00:47Z

I prepared my PoC notebook file and my test Splunk csv data for this bug debugging.
They are stored in the following locations.

https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_qp_splunk.ipynb
https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_splunk_reader_paging-test.csv

Thank you.
Best regards,

Tatsuya-hasegawa · 2023-04-18T06:29:56Z

On the other hands, add_query_items='', count=0 options in normal search mode worked fine in the past official notebook
https://github.com/microsoft/msticpy/blob/main/docs/notebooks/Splunk-DataConnector.ipynb

I'm thinking the diff was caused by Splunk python SDK updates.

FYI: My Splunk Versions is Splunk Enterprise 8.2.3.

ianhelle · 2023-04-18T16:35:43Z

Hey thanks @Tatsuya-hasegawa - Assuming the tests complete OK, I will complete this.
We do have someone working on an update for the Splunk driver but they seems to have gone dark on us. This is great!

Tatsuya-hasegawa · 2023-04-18T21:33:14Z

Thank you for the reply, @ianhelle

This CI test looks throwing this error, however is_ready() function certainly exists on splunk client and also work fine on my test splunk instance.

https://docs.splunk.com/DocumentationStatic/PythonSDK/1.7.2/client.html#splunklib.client.Job.is_ready

>               while not query_job.is_ready():
154
E               AttributeError: '_MockAsyncResponse' object has no attribute 'is_ready'

I feel the Mock response may not be enough comparing to the real splunk client's function.

Please feel free to teach me if there is an additional anything I had better improve in order for this PR to be merged.
I like MSTICpy so I would like to try to make PRs from now on. :)

Sincerely,

Updated failing tests for new code.

ianhelle · 2023-04-20T23:19:43Z

Yeah, the tests are mocked because we can't run against a live splunk instance.
I've pushed some changes that fix the tests (with added mock methods).
Also re-factored the code a little to keep the methods a little more modular and re-formatted long lines.
I also changed the "pagination_width" parameter to page_size, which I think is a bit more intuitive.
And I added a timeout parameter (default 60s) since we have while loop that would otherwise run forever, if something went wrong.
I haven't been able to test the code against a live splunk instance though so would appreciate if you could do that.

ianhelle · 2023-04-20T23:20:01Z

/azpipelines run

azure-pipelines · 2023-04-20T23:20:10Z

Azure Pipelines successfully started running 1 pipeline(s).

ianhelle

Don't really need changes - if you could check that my code changes look OK and test them in your environment it would be super helpful.

Tatsuya-hasegawa · 2023-04-21T20:21:04Z

Hello, I will check your changed code right now, however I am in a location without my dev PC now, so I will spend some time more than usual.
I just wonder even if the code you changed is reflected, will I be reflected as a contributor ?

ianhelle · 2023-04-21T21:09:17Z

No hurry at all - I will on vacation for the next 2 weeks, so whenever you get a chance.
You will be the primary author - no danger of me taking credit for your work. I think I might be recorded as co-author but mostly all I did was move your code around. :-D

Tatsuya-hasegawa · 2023-04-21T22:03:46Z

@ianhelle

Thanks !
It doesn't seem like it's going to end anytime soon on this pc, so I'll test it next Monday on my usual dev PC.
Also I have checked your code and much appreciate your refactoring work!
I could learn this project's code style. :-D

Tatsuya-hasegawa · 2023-04-24T21:09:16Z

@ianhelle

I have tested your committed code in my test Splunk.
That worked well as I expected.
This is the evidence notebook file.
https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_splunk_9_0_4_reader_merged.ipynb

I'm looking forward to seeing this fix on version 2.5,0.

Thank you very much for this handling !
Best regards,

ianhelle · 2023-05-08T19:23:38Z

/azpipelines run

azure-pipelines · 2023-05-08T19:23:48Z

Azure Pipelines successfully started running 1 pipeline(s).

ianhelle · 2023-05-10T22:14:52Z

/azpipelines run

azure-pipelines · 2023-05-10T22:15:00Z

Azure Pipelines successfully started running 1 pipeline(s).

Tatsuya-hasegawa · 2023-05-15T08:09:52Z

Thank you for merging.
I appreciate your work very much.

Tatsuya-hasegawa added 5 commits March 18, 2023 05:55

Merge pull request #1 from microsoft/main

a2109e0

update to latest

Merge branch 'microsoft:main' into main

170bd0e

Merge branch 'microsoft:main' into main

40a3520

fix a critical bug of splunk result reader

843ac7c

typo pagenate -> paginate

73aa9a9

Tatsuya-hasegawa mentioned this pull request Apr 18, 2023

for splnk query provider pagination poc Tatsuya-hasegawa/MSTICPy_utils#2

Merged

Merge branch 'main' into fixbug_splunk_results_reader_pagenate

8a990b4

Refactored code and reformatted long lines.

7b9fdc3

Updated failing tests for new code.

ianhelle requested changes Apr 21, 2023

View reviewed changes

ianhelle self-assigned this Apr 21, 2023

ianhelle added this to the Release 2.5.0 milestone Apr 21, 2023

ianhelle approved these changes May 8, 2023

View reviewed changes

Merge branch 'main' into fixbug_splunk_results_reader_pagenate

76689e7

Merge branch 'main' into fixbug_splunk_results_reader_pagenate

ff11a8e

ianhelle merged commit 62cfaa5 into microsoft:main May 10, 2023

Tatsuya-hasegawa deleted the fixbug_splunk_results_reader_pagenate branch June 5, 2023 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a critical bug of Splunk results reader, lack of pagination #657

Fix a critical bug of Splunk results reader, lack of pagination #657

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 18, 2023

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

ianhelle commented Apr 18, 2023

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

ianhelle commented Apr 20, 2023

ianhelle commented Apr 20, 2023

azure-pipelines bot commented Apr 20, 2023

ianhelle left a comment

Tatsuya-hasegawa commented Apr 21, 2023 •

edited

Loading

ianhelle commented Apr 21, 2023

Tatsuya-hasegawa commented Apr 21, 2023

Tatsuya-hasegawa commented Apr 24, 2023 •

edited

Loading

ianhelle commented May 8, 2023

azure-pipelines bot commented May 8, 2023

ianhelle commented May 10, 2023

azure-pipelines bot commented May 10, 2023

Tatsuya-hasegawa commented May 15, 2023

Fix a critical bug of Splunk results reader, lack of pagination #657

Fix a critical bug of Splunk results reader, lack of pagination #657

Conversation

Tatsuya-hasegawa commented Apr 18, 2023 • edited Loading

Tatsuya-hasegawa commented Apr 18, 2023

Tatsuya-hasegawa commented Apr 18, 2023 • edited Loading

Tatsuya-hasegawa commented Apr 18, 2023 • edited Loading

ianhelle commented Apr 18, 2023

Tatsuya-hasegawa commented Apr 18, 2023 • edited Loading

ianhelle commented Apr 20, 2023

ianhelle commented Apr 20, 2023

azure-pipelines bot commented Apr 20, 2023

ianhelle left a comment

Choose a reason for hiding this comment

Tatsuya-hasegawa commented Apr 21, 2023 • edited Loading

ianhelle commented Apr 21, 2023

Tatsuya-hasegawa commented Apr 21, 2023

Tatsuya-hasegawa commented Apr 24, 2023 • edited Loading

ianhelle commented May 8, 2023

azure-pipelines bot commented May 8, 2023

ianhelle commented May 10, 2023

azure-pipelines bot commented May 10, 2023

Tatsuya-hasegawa commented May 15, 2023

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 18, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 21, 2023 •

edited

Loading

Tatsuya-hasegawa commented Apr 24, 2023 •

edited

Loading