Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Multiprocessing support for SplunkDF #12

Merged
merged 22 commits into from
Jun 19, 2020
Merged

Multiprocessing support for SplunkDF #12

merged 22 commits into from
Jun 19, 2020

Conversation

DavidJBianco
Copy link
Contributor

Because the Splunk API is so incredibly slow, this branch ditches it's oneshot() function in favor of the lower-level Splunk Jobs API. Since we had to write our own results retrieval code, we used Python's built-in multiprocessing module to retrieve results in parallel. The default is now to retrieve results with a single worker, which decreased total search time by about 45% while retrieving 1mil rows in testing.

This PR also includes some code cleanups to improve the flow between SplunkDF.search_df() and SplunkDF.search() as well as adding both uniprocessing and multiprocessing versions of the same unit tests.

The Splunk oneshot() API was super slow (seemingly CPU bound). Removed
it entirely and now use Splunk's Jobs API to retrieve results when the
'limit' paramter is specified. Further, we always use multiprocessing
in these cases, even if just with the default worker count of 1.
Limit searches are *much* faster now. Searches with no limit
specified still use the Splunk export() API, which is slower, but
ensures that all results are retrieved.
…g for all search types.

The search() and search_df() functions now use 'args' and 'kwargs' to handle their args, which
makes it MUCH easier to pass the searches through.  Changes to the search() code also to provide
basic sanity checking on the args, including enforcing the previous default values.

Both parallel and export() searches now use common code to process their results.  This was not
previously the case, leading to bad things like Splunk messages not being handled consistently,
internal fields not being filtered correctly, etc.
@DavidJBianco DavidJBianco merged commit d3a029c into master Jun 19, 2020
@DavidJBianco DavidJBianco deleted the splunkresultsapi branch June 19, 2020 13:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant