Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for formatting Python Markdown code blocks. #2721

Closed
wants to merge 21 commits into from
Closed

Support for formatting Python Markdown code blocks. #2721

wants to merge 21 commits into from

Conversation

na-stewart
Copy link

@na-stewart na-stewart commented Dec 21, 2021

Description

What?

I've added support for formatting Python Markdown code blocks.

Why?

These changes allow the user to format markdown files and strings. In one instance of my own (https://github.com/sunset-developer/sanic-security), I have a plethora of Python examples in my project's README.md. However, none of these examples are formatted with Black and doing so would entail me going through the pain staking process of replacing each code block example. With these changes, black can be used to automatically format all markdown code blocks instantly, and conveniently.

How?

When Black is ran, the user would add the --docs flag in order to format any markdown files. Once ran, each code block in the Markdown is retrieved via regex and added to a list. Then, each one of these code blocks is formatted. Once each code block is formatted, the original non-formatted code block is replaced by the formatted code block and the new formatted Markdown string is returned. If no changes are made or the Markdown is empty, a NothingChanged error is thrown.

Testing?

Unit tests have been written to make sure all desired functionality works as intended. Test data has also been created to be used as reference.

Functionality tested:

  • Empty Markdown raises NothingChanged error.
  • Markdown without Python code blocks raises NothingChanges error.
  • Markdown code block formatting cross-referenced with already formatted Markdown code blocks.
  • --docs flag.
  • Formatting Markdown strings and files.

I am very open to feedback and any changes you would like me to make. Thank you for your time.

If you like these changes, I plan on adding support for .rst and .tex files as well.

Checklist - did you ...

  • Add a CHANGELOG entry if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation?

@na-stewart na-stewart changed the title Markdown support. Support for formatting Python Markdown code blocks. Dec 21, 2021
@isidentical
Copy link
Collaborator

Just as a FYI, there is also blacken-docs which pretty much does this (though in an extended scope with additional support for rst/latex etc.)

@isidentical
Copy link
Collaborator

Also the previous issue on the same idea (but in an extended scope, with rST) #294

@felix-hilden
Copy link
Collaborator

felix-hilden commented Dec 28, 2021

Thanks for submitting this! Personally though, I'm still on the fence because blacken-docs exists, although I haven't tried using it. And if we were to expand this to RST, as we should, that domain could be a bit more complex since there are lots of blocks that are essentially Python but use a different lexer or an entirely different directive, like matplotlib's .. plot:: which still contains Python code. Are there similar edge cases for markdown?

This is not a thorough review, but in fairness the implementation is neat. I'd prefer the option be called --md if we are only dealing with markdown, and let's not restrict ourselves to only process documentation MD.

And if we're formatting lots of extra file types like MD and IPython, maybe the options should be something entirely different. Listing black . --ipynb --md --rst --xml --txt seems suboptimal.

@na-stewart
Copy link
Author

Thanks for submitting this! Personally though, I'm still on the fence because blacken-docs exists, although I haven't tried using it. And if we were to expand this to RST, as we should, that domain could be a bit more complex since there are lots of blocks that are essentially Python but use a different lexer or an entirely different directive, like matplotlib's .. plot:: which still contains Python code. Are there similar edge cases for markdown?

This is not a thorough review, but in fairness the implementation is neat. I'd prefer the option be called --md if we are only dealing with markdown, and let's not restrict ourselves to only process documentation MD.

And if we're formatting lots of extra file types like MD and IPython, maybe the options should be something entirely different. Listing black . --ipynb --md --rst --xml --txt seems suboptimal.

I did quite a bit of research on the topic and I don't believe that markdown has really any edge cases when it comes to code blocks. That's why the implementation of markdown formatting was pretty simple.

@na-stewart
Copy link
Author

This is not a thorough review, but in fairness the implementation is neat. I'd prefer the option be called --md if we are only dealing with markdown, and let's not restrict ourselves to only process documentation MD.

if we're formatting lots of extra file types like MD and IPython, maybe the options should be something entirely different. Listing black . --ipynb --md --rst --xml --txt seems suboptimal.

The biggest reason I wanted to have a single tag (--docs) for this is because in my current setup, we would have to create fields in Mode for allow_md, allow_rst along with is_md, and is_rst unless there are better ways of implementing this. It is intended to not strictly format documentation unlike setting black to Juypter mode, which would strictly format Juypter Notebook files.

@felix-hilden
Copy link
Collaborator

I get that, but having "docs" for only md when there's no implementation for rst is a bit confusing, especially when the md files aren't necessarily documentation.

Richard had a good point on discord: maybe it would be better to have the option force all inputs to be md, like ipynb currently does.

But we still have to figure out if we want this at all. It could be better to define a public Python API and let other packages do extras like this. Although I'm not sure if there are many use cases beyond notebooks and documentation. Console scripts are one (>>> prefixed), but in the best case our work could be almost done with this and RST.

@Oisov
Copy link

Oisov commented Dec 29, 2021

Am I missing something or is this outside of the scope of the project? Even the project as is, is hard enough to maintain (For the awesome people who have the time and knowledge!)

From what I can tell Python should format .py files and that is it. If you want to format markdown files, one should ideally be using a markdown tool which then hooks into black

Ideally my guess is that the optimal would be to parse the file using tree-sitter

https://github.com/ikatyang/tree-sitter-markdown

To get all the Python codeblocks (regex feels shaky at best for me) and then pipe those through black and back.

If this passes, what about jupiter files, rst was mentioned, and there exists several other formats which are python flavored. While I truly appreciate the effort, my personal opinion (which does not hold much weight, just an avid user) is that this is outside the scope of the project currently.

@felix-hilden
Copy link
Collaborator

It's not so simple, because we already have notebook support. But could you elaborate on the additional formats? It could be helpful when deciding if we want to go down the route of supporting different formats ourselves.

@isidentical
Copy link
Collaborator

There is also mdformat-black which is not a tool directly, but a plugin for an existing markdown formatter to format code blocks with black.

@hauntsaninja
Copy link
Collaborator

Thank you! Given the existence of tools like https://github.com/adamchainz/blacken-docs and https://github.com/hukkin/mdformat-black, I think we don't need to include this in Black proper. The long tail of formats is best dealt with by the ecosystem and we should do things like #779 to help support the ecosystem build on Black.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants