Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a critical bug of Splunk results reader, lack of pagination #657

Merged

Conversation

Tatsuya-hasegawa
Copy link
Contributor

@Tatsuya-hasegawa Tatsuya-hasegawa commented Apr 18, 2023

Hello msticpy team,

I found a critical bug which cannot retrieve only first 100 records from Splunk with SplunkGeneral.get_events_parameterized function in spite of add_query_items='', count=0.

msticpy version = 2.4.0 (latest)

Evidence file

It happens when oneshot=False and I found that the pagination was not implemented in normal search (i.e. kwargs_normalsearch = {"exec_mode": "normal"}).
That is why Splunk SDK's resultReader never fetch except first 100 records.

This PR is to implement the pagination and add paginate_width implicit option to the function.

In addition, I suggest the replacement from ResultReader to JSONResultReader to avoiding the deprecation warning.

The warning was below on the previous code.

/site-packages/msticpy/data/drivers/splunk_driver.py:234: DeprecationWarning: ResultsReader is a deprecated function. Use the JSONResultsReader function instead in conjuction with the 'output_mode' query param set to 'json'
  reader = sp_results.ResultsReader(query_job.results())

I replaced as following.

reader = sp_results.JSONResultsReader(search_results) #due to DeprecationWarning of normal ResultsReader

Thank you.
I'm happy this will be merged.
Best regards,

@Tatsuya-hasegawa
Copy link
Contributor Author

@microsoft-github-policy-service agree

@Tatsuya-hasegawa
Copy link
Contributor Author

Tatsuya-hasegawa commented Apr 18, 2023

I prepared my PoC notebook file and my test Splunk csv data for this bug debugging.
They are stored in the following locations.

https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_qp_splunk.ipynb
https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_splunk_reader_paging-test.csv

Thank you.
Best regards,

@Tatsuya-hasegawa
Copy link
Contributor Author

Tatsuya-hasegawa commented Apr 18, 2023

On the other hands, add_query_items='', count=0 options in normal search mode worked fine in the past official notebook
https://github.com/microsoft/msticpy/blob/main/docs/notebooks/Splunk-DataConnector.ipynb

I'm thinking the diff was caused by Splunk python SDK updates.

FYI: My Splunk Versions is Splunk Enterprise 8.2.3.

@ianhelle
Copy link
Contributor

Hey thanks @Tatsuya-hasegawa - Assuming the tests complete OK, I will complete this.
We do have someone working on an update for the Splunk driver but they seems to have gone dark on us. This is great!

@Tatsuya-hasegawa
Copy link
Contributor Author

Tatsuya-hasegawa commented Apr 18, 2023

Thank you for the reply, @ianhelle

This CI test looks throwing this error, however is_ready() function certainly exists on splunk client and also work fine on my test splunk instance.

https://docs.splunk.com/DocumentationStatic/PythonSDK/1.7.2/client.html#splunklib.client.Job.is_ready

>               while not query_job.is_ready():
154
E               AttributeError: '_MockAsyncResponse' object has no attribute 'is_ready'

I feel the Mock response may not be enough comparing to the real splunk client's function.

Please feel free to teach me if there is an additional anything I had better improve in order for this PR to be merged.
I like MSTICpy so I would like to try to make PRs from now on. :)

Sincerely,

Updated failing tests for new code.
@ianhelle
Copy link
Contributor

Yeah, the tests are mocked because we can't run against a live splunk instance.
I've pushed some changes that fix the tests (with added mock methods).
Also re-factored the code a little to keep the methods a little more modular and re-formatted long lines.
I also changed the "pagination_width" parameter to page_size, which I think is a bit more intuitive.
And I added a timeout parameter (default 60s) since we have while loop that would otherwise run forever, if something went wrong.
I haven't been able to test the code against a live splunk instance though so would appreciate if you could do that.

@ianhelle
Copy link
Contributor

/azpipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@ianhelle ianhelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really need changes - if you could check that my code changes look OK and test them in your environment it would be super helpful.

@Tatsuya-hasegawa
Copy link
Contributor Author

Tatsuya-hasegawa commented Apr 21, 2023

Hello, I will check your changed code right now, however I am in a location without my dev PC now, so I will spend some time more than usual.
I just wonder even if the code you changed is reflected, will I be reflected as a contributor ?

@ianhelle
Copy link
Contributor

No hurry at all - I will on vacation for the next 2 weeks, so whenever you get a chance.
You will be the primary author - no danger of me taking credit for your work. I think I might be recorded as co-author but mostly all I did was move your code around. :-D

@ianhelle ianhelle self-assigned this Apr 21, 2023
@ianhelle ianhelle added this to the Release 2.5.0 milestone Apr 21, 2023
@Tatsuya-hasegawa
Copy link
Contributor Author

@ianhelle

Thanks !
It doesn't seem like it's going to end anytime soon on this pc, so I'll test it next Monday on my usual dev PC.
Also I have checked your code and much appreciate your refactoring work!
I could learn this project's code style. :-D

@Tatsuya-hasegawa
Copy link
Contributor Author

Tatsuya-hasegawa commented Apr 24, 2023

@ianhelle

I have tested your committed code in my test Splunk.
That worked well as I expected.
This is the evidence notebook file.
https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/qp_splunk_poc_bugfix/msticpy_splunk_9_0_4_reader_merged.ipynb

I'm looking forward to seeing this fix on version 2.5,0.

Thank you very much for this handling !
Best regards,

@ianhelle
Copy link
Contributor

ianhelle commented May 8, 2023

/azpipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ianhelle
Copy link
Contributor

/azpipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ianhelle ianhelle merged commit 62cfaa5 into microsoft:main May 10, 2023
@Tatsuya-hasegawa
Copy link
Contributor Author

Thank you for merging.
I appreciate your work very much.

@Tatsuya-hasegawa Tatsuya-hasegawa deleted the fixbug_splunk_results_reader_pagenate branch June 5, 2023 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants