Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Split api.rst in sections #24451

Closed
datapythonista opened this issue Dec 27, 2018 · 18 comments · Fixed by #24462 or #24909
Closed

DOC: Split api.rst in sections #24451

datapythonista opened this issue Dec 27, 2018 · 18 comments · Fixed by #24462 or #24909
Labels
Blocker Blocking issue or pull request for an upcoming release Docs Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@datapythonista
Copy link
Member

datapythonista commented Dec 27, 2018

The pandas API reference page is huge. Which makes it difficult for users to find information, and for us to edit.

Given the size of the page, I think it'd make more sense to split the current page api.rst into different pages, one per section:

  • api/series.rst
  • api/functions.rst
  • ...

The document is already divided into sections, I think the current division would make sense with few changes:

  • Input/Output
  • General functions
  • Series
  • DataFrame
  • Panel
  • Index (will include all the Index classes/section in the top level of api.rst)
  • Scalars
  • Date Offsets and Frequencies
  • Window
  • GroupBy
  • Resampling
  • Style
  • Plotting
  • General utility functions
  • Extensions

Does anybody think a single page is better? Any other division you think of?

@datapythonista datapythonista added Docs Needs Discussion Requires discussion from core team before further action labels Dec 27, 2018
@benjaminr
Copy link
Contributor

I agree, think this would definitely more approachable split up into pages; it's far too monolithic as it stands. Makes sense to follow the sections already defined.

@mroeschke
Copy link
Member

Makes sense to me. One small nit is that I would combine Date Offsets with Frequencies since they are essentially the same thing and Frequencies only has the to_offset function under that section.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 2, 2019

@datapythonista I agree with your rationale ... but, this completely breaks all existing links to the api docs.

There might be some easy fixes (adding a (temporary) redirect, or keeping the generated pages in /generated/.., or ...). But at least we need to do something about it for 0.24.0 I think.

This will be in general a problem when reorganizing the docs, so it would be good to find some remedies for this.

@jorisvandenbossche
Copy link
Member

So there are two issues:

The pandas API reference page is huge. Which makes it difficult for users to find information

This depends a bit on how the page is used. I personally use it for finding (the docstring page of) a certain function/method typically with ctrl-f. Having it split over many pages makes this actually harder .. (although I am probably an atypical user of the page).

@datapythonista
Copy link
Member Author

the path of the generated files is decided by sphinx based on where the autosummary directives are. Before they were in the root, in api.rst, so they were generated in the generated/ directory in the root, and not they are generated in api, as the autosummary directives are in files there. We could have :toctree: ../generated/ I guess, but I don't think we want that.

Regarding the redirects it was briefly discussed in #23708 (comment)

And as said there, in my opinion, redirecting everything (also consider what I'm proposing in #24499) is not worth the effort and complexity. It'd be surely nice to have a better 404 that let the users continue the navigation and find the new page, but I wouldn't build a system to add redirects to possibly every single page of the docs.

I see the convinience of a ctrl+f in a single page, but I'd leave that for the pdf version of the docs, and move to the opposite direction in the html, and not have to ctrl+f anything because everything is easy to navigate.

@jorisvandenbossche
Copy link
Member

We could have :toctree: ../generated/ I guess, but I don't think we want that.

Why wouldn't we want that?

And as said there, in my opinion, redirecting everything (also consider what I'm proposing in #24499) is not worth the effort and complexity

I certainly agree that at some point when properly reorganising the full doc structure, it is going to be impossible (and not worth the effort) to keep urls working.
But, here (speaking about the generated pages for a moment) it is about 1) some of the most visited pages, and 2) a case where it is relatively easy to do redirects or to avoid the need for redirects (by keeping the original url).

@datapythonista
Copy link
Member Author

I'm waiting to see if anyone else wants to give feedback in #24499 before working on it. But I don't see the reorganisation of the full docs happening at "some point", but in the near future. :)

I think not much later after that is merged, it would be a good time to change the sphinx style, and we could switch to pandas.io (and dev.pandas.io) at the same time.

While I totally understand your point, my opinion is that it's better to forget about redirects (including deleting the redirects we already have). Search engines will take some time until they have everything indexed again. But I'd just make sure that any url under pandas.pydata.org explains that we moved and link the new page, and IMO that should be enough for the time search engines take to be updated.

An advantage I see besides the simplicity, is that if we redirect users to the new home pages, they'll see the value in the new navigation and get used to it (instead of using search engines).

@jorisvandenbossche
Copy link
Member

It's not only about search engines, but also all the links to docstring pages in blogs, StackOverflow answers, ...

When we would move to pandas.io, I assume we will also have redirects from pandas.pydata.org/pandas-docs/ over there no?

An advantage I see besides the simplicity, is that if we redirect users to the new home pages, they'll see the value in the new navigation and get used to it (instead of using search engines).

Certainly true for all the user guide docs. But even with a fantastic navigation, I will probably still keep using google to go to a specific docstring page :)

Anyway, I still feel it is not needed to break all the API pages (if we would change the url, I think I would actually rather go with /reference or simply api/ instead of /generated or /api/generated. I don't think generated adds much to the long url)

@datapythonista
Copy link
Member Author

I see your point, there is obviously a tradeoff on not breaking the urls around, and keeping things simple.

I'm biased towards keeping things simple. But if there is agreement that the impact of breaking the old links would be bad enough that it's worth adding the complexity of the redirects to the docs, that's ok with me.

@datapythonista
Copy link
Member Author

About the generated/, I agree, I didn't come with a better name and that's why I left it. api/ is taken by the API sections, and I wouldn't have the generated pages in the same level. But something else sounds good. And I'd also consider having different directories for the generated pages, there are 2,700 html files in the generated/ directory.

@jorisvandenbossche
Copy link
Member

Personally I think it is worth it (but happy to hear other people's thoughts as well), and I think the redirect (for simple old->new cases) is not that complex.
I want to note that for the API pages, we also have the option to not need any redirect, as we can keep the current urls by doing ../generated/ instead of generated/ in the API autosummaries ?

And I'd also consider having different directories for the generated pages, there are 2,700 html files in the generated/ directory.

Is that for a developer perspective? (open that folder in a file browser can indeed be a bit annoying)
Because for a user, I don't care about categorizing them (only makes the urls even longer)

BTW, what do you think of using 'reference' instead of 'api' ? (although that also makes the url longer .. :-))

@datapythonista
Copy link
Member Author

Assuming we all agree that the generated files should be separate from the API section pages, we need to names, one for the top directory (now api) and one for the generated pages (now generated).

sklearn uses modules instead of api, and also generated. I don't have a preference, reference sounds good (not sure for which of the cases) and generated doesn't sound great. Can't think on anything better.

You've seen the issue for the redirects, there is a PR open already. I think that should be good enough.

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Jan 21, 2019
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 0.24.0 Jan 23, 2019
@jorisvandenbossche jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label Jan 23, 2019
@TomAugspurger
Copy link
Contributor

This can be closed right, as the redirects are now in place?

@datapythonista
Copy link
Member Author

Not yet, we need to add the list of pages to redirect to redirects.csv first. I'll probably create a different ticket, but I'd leave this open until I do.

@jorisvandenbossche
Copy link
Member

If we choose a new name, I think I would go for '/reference/api'.

Or '/api/reference', or leave the '/api/generated/.

@jorisvandenbossche
Copy link
Member

@datapythonista do you have time to do the redirects now? Otherwise I can also look at it, as this one is blocking the release.

@datapythonista
Copy link
Member Author

Will have a look now, but I won't move anything if you want to have it now, just create the redirects.

I mentioned in another issue, can you take a look at #24890. I think it should go into 0.24, that's the last piece of the restructuring, the way it is now doesn't make much sense (having all the comparison_with... in the top level, and other things).

@datapythonista
Copy link
Member Author

@jorisvandenbossche opened #24909. Any thoughts on merging #24890 before the release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Docs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants