Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC indexing: Google search for 'pandas greater than' returns pandas.DataFrame.ge as #1, and pandas.DataFrame.gt/pandas.Series.gt aren't even found #32491

Closed
smcinerney opened this issue Mar 6, 2020 · 3 comments

Comments

@smcinerney
Copy link

Problem description

(I'm aware that Google SEO is not under pandas' control. However this one's quite important, and nowhere in the pandas doc does it actually tell us where/how to report SEO fails, if at all. So please tell us how and where to report things like this. Are all pandas doc pages automatically submitted to be crawled by Google?)

The indexing of pandas DataFrame/Series operators seems to have some holes:

A Google search for 'pandas greater than' returns 'pandas.DataFrame.ge' as #1, and pandas.DataFrame.gt/pandas.Series.gt aren't even found, let alone in the top 10.

  1. The Enable element-wise comparison operations in DataMatrix objects #1 hit should be pandas.DataFrame.gt
  2. pandas.Series.gt should also be found, reindex_like function #2, but they're not. Similarly Binary operations on int DataMatrix #3 pandas.Series.between, pandas.Series.ge etc.

A Google search for 'pandas less than' is slightly better:

  1. Enable element-wise comparison operations in DataMatrix objects #1 hit is pandas.DataFrame.le (pandas 1.0.1 doc)
  2. reindex_like function #2 hit is pandas.Series.le (pandas 1.0.1 doc)
  3. Binary operations on int DataMatrix #3 hit is pandas.Series.between (pandas 0.23.1 doc(?), not 1.0.x)
  4. The Enable element-wise comparison operations in DataMatrix objects #1 and reindex_like function #2 hits should be pandas.DataFrame.lt/pandas.Series.lt

Expected Output

as above

@TomAugspurger
Copy link
Contributor

Are all pandas doc pages automatically submitted to be crawled by Google

We don't submit anything. We don't have a robots.txt to disallow crawling.

What do you suggest we do?

@smcinerney
Copy link
Author

smcinerney commented Mar 6, 2020

Was the cause due to a) selective crawling, or b) not understanding the doc page text itself, that the wrong page gets indexed as #1, and the right pages never get indexed at all? The solution depends on which is the cause. Anyone here who knows about crawling and SEO? (I know almost nothing, I can't comment).

Also, I would have expected that as long as https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html and https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html are crawled, the operators should be indexed, although not necessarily #1 hits for their keywords.

@jbrockmendel
Copy link
Member

Closing as nothing-we-can-do-about-this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants