This repository has been archived by the owner on Jul 16, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 24
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Splunk oneshot() API was super slow (seemingly CPU bound). Removed it entirely and now use Splunk's Jobs API to retrieve results when the 'limit' paramter is specified. Further, we always use multiprocessing in these cases, even if just with the default worker count of 1. Limit searches are *much* faster now. Searches with no limit specified still use the Splunk export() API, which is slower, but ensures that all results are retrieved.
…g for all search types. The search() and search_df() functions now use 'args' and 'kwargs' to handle their args, which makes it MUCH easier to pass the searches through. Changes to the search() code also to provide basic sanity checking on the args, including enforcing the previous default values. Both parallel and export() searches now use common code to process their results. This was not previously the case, leading to bad things like Splunk messages not being handled consistently, internal fields not being filtered correctly, etc.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Because the Splunk API is so incredibly slow, this branch ditches it's
oneshot()
function in favor of the lower-level Splunk Jobs API. Since we had to write our own results retrieval code, we used Python's built-inmultiprocessing
module to retrieve results in parallel. The default is now to retrieve results with a single worker, which decreased total search time by about 45% while retrieving 1mil rows in testing.This PR also includes some code cleanups to improve the flow between
SplunkDF.search_df()
andSplunkDF.search()
as well as adding both uniprocessing and multiprocessing versions of the same unit tests.