Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Explorer: Search across all columns for keywords #3302

Open
jmcphers opened this issue May 29, 2024 · 6 comments
Open

Data Explorer: Search across all columns for keywords #3302

jmcphers opened this issue May 29, 2024 · 6 comments
Labels
area: data explorer Issues related to Data Explorer category. enhancement New feature or request support
Milestone

Comments

@jmcphers
Copy link
Collaborator

jmcphers commented May 29, 2024

Discussed in https://github.com/posit-dev/positron-beta/discussions/203

Originally posted by mikemahoney218 May 29, 2024
We've got a number of workflows that involve telling users to open a data frame in RStudio's View()er and searching for keywords in the data frame:
image

I don't see an obvious way to do this in Positron. Is there one?

Also requested by another beta user: https://github.com/posit-dev/positron-beta/discussions/203#discussioncomment-9596158

@jmcphers jmcphers transferred this issue from another repository May 29, 2024
@jmcphers jmcphers changed the title Text search in View(df)? Data Explorer: Search across all columns for keywords May 29, 2024
@jmcphers
Copy link
Collaborator Author

Probably a better example from the discussion:

image

There's nothing in the existing protocol that would let us do this since filtering is currently a per-column construct, but the user is looking for something that just lets them search across all the columns at once, like RStudio has.

@jmcphers jmcphers added enhancement New feature or request area: data explorer Issues related to Data Explorer category. labels May 29, 2024
@jthomasmock jthomasmock modified the milestones: Release Candidate, Future Jun 4, 2024
@jthomasmock
Copy link
Contributor

I agree this is a nice "quick filter", but quite different from our existing implementation.

Taking some notes for the future on this.

In RStudio this is:

  • Global search
  • Immediately applies a filter operation

@wesm
Copy link
Contributor

wesm commented Jul 22, 2024

This would be straightforward to implement at the data explorer protocol level, but a couple of challenges

  • Since the search is text-based, columns that aren't strings already have to be converted to strings, so this increases the cost
  • For very large data frames, globally searching every column in the table could be arbitrarily expensive

I think probably the way to deal with this would be to have a cap on the number of "cells" that this feature is willing to search. Consider searching a column of numbers for values that have the pattern 44 (on my x86 laptop):

In [16]: s = pd.Series(np.random.randn(1000000))

In [17]: %time s.map(str).str.contains('44')
CPU times: user 428 ms, sys: 29.9 ms, total: 458 ms
Wall time: 458 ms

So searching on the order of 1 million "cells" would yield a ~half-second cost, more on slower computers. If a dataset has more than 1M cells (or whatever the upper limit is), we could do one of

  • Provide a UI option for the user to opt in to the exhaustive search (useful for when that's definitely what you want, but as the default it could lead to the UI being frozen pending an expensive global search) and error / refuse to search without selecting this option when the dataset is large
  • For large datasets, only display matches in the first X rows of the dataset (where X is determined based on the overall limit L divided by the number of rows)

@jmcphers
Copy link
Collaborator Author

Requested again on the discussion forum here: #4661

@wesm
Copy link
Contributor

wesm commented Sep 12, 2024

This shouldn't be too difficult to implement in the backend so whenever there is enough bandwidth to add the frontend UX for this we could definitely make it happen

@jthomasmock
Copy link
Contributor

Also requested in: #4746

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: data explorer Issues related to Data Explorer category. enhancement New feature or request support
Projects
None yet
Development

No branches or pull requests

4 participants