Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argparse: Hybrid help text formatter #57015

Open
GraylinKim mannequin opened this issue Aug 22, 2011 · 23 comments
Open

argparse: Hybrid help text formatter #57015

GraylinKim mannequin opened this issue Aug 22, 2011 · 23 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@GraylinKim
Copy link
Mannequin

GraylinKim mannequin commented Aug 22, 2011

BPO 12806
Nosy @ssokolow, @jwilk, @merwok, @davesteele, @perette
PRs
  • bpo-12806: Add argparse FlexiHelpFormatter #22129
  • Files
  • argparse_formatter.py: The HelpFormatter subclass in a runnable example script.
  • paraformatter.py: ParagraphFormatter derived from GraylinKim's version.
  • wrap_sample.py
  • try_12806_4.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-08-22.00:46:17.782>
    labels = ['type-feature', 'library']
    title = 'argparse: Hybrid help text formatter'
    updated_at = <Date 2021-01-04.15:14:25.540>
    user = 'https://bugs.python.org/GraylinKim'

    bugs.python.org fields:

    activity = <Date 2021-01-04.15:14:25.540>
    actor = 'daves'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2011-08-22.00:46:17.782>
    creator = 'GraylinKim'
    dependencies = []
    files = ['22977', '28091', '35306', '36168']
    hgrepos = []
    issue_num = 12806
    keywords = ['patch']
    message_count = 23.0
    messages = ['142651', '142652', '143018', '144353', '144510', '144513', '144518', '144522', '144533', '149532', '153962', '153965', '176258', '176259', '218395', '218397', '223210', '224345', '260259', '326376', '355184', '376495', '384327']
    nosy_count = 11.0
    nosy_names = ['bethard', 'ssokolow', 'jwilk', 'eric.araujo', 'zbysz', 'denilsonsa', 'rurpy2', 'GraylinKim', 'paul.j3', 'daves', 'perette']
    pr_nums = ['22129']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue12806'
    versions = ['Python 3.3']

    @GraylinKim
    Copy link
    Mannequin Author

    GraylinKim mannequin commented Aug 22, 2011

    When using argparse I frequently run into situations where my helper text is a mix of prose and bullets or options. I need the RawTextFormatter for the bullets, and I need the default formatter for the prose (so the line wraps intelligently).

    The current HelpFormatter classes are marked as public by name only, so sub-classing them with overrides to get the desired functionality isn't great unless it gets pushed upstream. To that end, I've attached a subclass implementation that I've been using for the following effect:

    Example:
        >>> parser = argparse.ArgumentParser(formatter_class=FlexiFormatter)
        >>> parser.add_argument('--example', help='''\
        ...     This argument's help text will have this first long line\
        ...     wrapped to fit the target window size so that your text\
        ...     remains flexible.
        ...
        ...         1. This option list
        ...         2. is still persisted
        ...         3. and the option strings get wrapped like this with an\
        ...            indent for readability.
        ...
        ...     You must use backslashes at the end of lines to indicate that\
        ...     you want the text to wrap instead of preserving the newline.
        ...    
        ...     As with docstrings, the leading space to the text block is\
        ...     ignored.
        ... ''')
        >>> parser.parse_args(['-h'])
    usage: argparse_formatter.py [-h] [--example EXAMPLE]
    
    optional arguments:
      -h, --help         show this help message and exit
      --example EXAMPLE  This argument's help text will have this first
                         long line wrapped to fit the target window size
                         so that your text remains flexible.
    
                             1. This option list
                             2. is still persisted
                             3. and the option strings get wrapped like
                                this with an indent for readability.
    
                         You must use backslashes at the end of lines to
                         indicate that you want the text to wrap instead
                         of preserving the newline.
    
                         As with docstrings, the leading space to the
                         text block is ignored.
    
    
                                 1. This option list
                                 2. is still persisted
                                 3. and the option strings get wrapped like
                                    this with an indent for readability.
    
                             You must use backslashes at the end of lines to
                             indicate that you want the text to wrap instead
                             of preserving the newline.
    
                             As with docstrings, the leading space to the
                             text block is ignored.
    

    If there is interest in this sort of thing I'd be happy to fix it up for inclusion.

    @GraylinKim GraylinKim mannequin added the stdlib Python modules in the Lib dir label Aug 22, 2011
    @GraylinKim
    Copy link
    Mannequin Author

    GraylinKim mannequin commented Aug 22, 2011

    I just noticed that the example output above repeats with a different indent. The attached formatter isn't broken, I just messed up the editing on my post. The repeated text isn't part of the output (and shouldn't be there).

    While I'm certainly at fault here, a feature to preview your post before final submission would likely help people like me to catch these sorts of errors before spamming the world with them. :)

    Apologies for the double post.

    @merwok
    Copy link
    Member

    merwok commented Aug 26, 2011

    Steven: What do you think?

    GraylinKim: You can open a feature request for message preview on the metatracker (see “Report Tracker Problem” in the sidebar).

    @merwok merwok added the type-feature A feature request or enhancement label Aug 26, 2011
    @denilsonsa
    Copy link
    Mannequin

    denilsonsa mannequin commented Sep 20, 2011

    I was about to suggest this feature. I had the exact same need: a formatter that preserves newlines (and maybe whitespace), but that also automatically wraps the lines.

    In other words, the behavior would be similar to CSS property white-space: pre-wrap;

    @zbysz
    Copy link
    Mannequin

    zbysz mannequin commented Sep 24, 2011

    This is a great idea! I think that this formatter is much more useful than the default one. I was pretty surprised the first time I added a multi-paragraph epilog to a program and it got all jumbled into a single line. I would vote to make this (or a variant, see the comments below) the default formatter for the description and epilog fields.

    Notes on the interface (a bit of bike-shedding :)):

    Continuation backslashes are best avoided. Other means, like parenthesis in case of expressions, are encouraged to avoid adding backslashes. Continuation symbols also interfere with paragraph reflowing in text editors.
    Using backslash continuations wouldn't be necessary, if the rules were slightly changed to be more LaTeX like: just line-wrap paragraphs, and paragraphs are separated by a blank line. This would make it impossible to have two line-wrapped parts not separated by an empty line, but I think that this is less common and less important than having a "natural" representation of paragraphs.

    In current implementation, lines that are not wrapped (IIUC) are those which start with *, +, >, <anything>., or <something>). This seems error prone. Maybe it would be better to just detect lines which are indented at least one space in comparison to the first line. This would work for examples and lists:
    """
    this is a text that is subject to line-wrapping
    and this line not
    and this line would again be wrapped:

    • and a line, not merged with the one above or below
    • and a second point, not merged with the one above
      """

    Review of argparse_formatter.py:

    list_match = re.match(r'( *)(([-+>]+|\w+\)|\w+\.) +)',line)
    A dash '-' has special meaning in brackets:
    [
    -+>] means (characters in range from '*' to '+', or '>').
    Because star and plus are adjacent in ASCII, this is equivalent to
    [*+>]
    but quite unclear.

    if(list_match):
    Parenthesis unnecessary.

    lines = list()
    Why not just 'lines = []'?

    One a side note: due to bpo-13041 the terminal width is normally stuck
    at 80 chars.

    @GraylinKim
    Copy link
    Mannequin Author

    GraylinKim mannequin commented Sep 24, 2011

    I fully support taking blank line based line-wrapping approach and agree with Zbyszek's suggested indentation approach as well. I am not sure why they didn't occur to me at the time but they are certainly a more effective and widely adopted approaches to the structured text problem.

    I suppose here is where I should volunteer to update the patch file...

    Re: Bike-shedding

    dash '-' has special meaning in brackets:

    Good catch, I had intended on '-' being a valid list item character. It clearly needs to be escaped. Not that it would matter given your proposed alternative.

    > if(list_match):
    Parenthesis unnecessary.

    In my defense I have the sadistic pleasure of coding in PHP where they are necessary for 8 hours a day for my day job. I can only apologize profusely for my offense and beg for forgiveness :)

    > lines = list()
    Why not just 'lines = []'?

    Not to get off topic, but I happen to like list() and dict() instead of [] and {} for empty collections. If there are non-religious reasons for avoiding this practice I'll consider it. I don't want to invoke a holy war here, just wondering if there are practical reasons.

    One a side note: due to bpo-13041 the terminal width is normally stuck
    at 80 chars.

    Not a good reason to remove the flexibility from the implementation I don't think.

    @denilsonsa
    Copy link
    Mannequin

    denilsonsa mannequin commented Sep 25, 2011

    Good catch, I had intended on '-' being a valid list item character.
    It clearly needs to be escaped.

    Either escaped, or it can be the first character in the set.

    but I happen to like list() and dict() instead of [] and {} for
    empty collections.

    I just checked PEP-8, and unfortunately this is not mentioned in there. Maybe you could open a new issue (or post in a mailing list) to ask about which style should be recommended, and then add the conclusion to PEP-8.

    One a side note: due to bpo-13041 the terminal width is normally stuck
    at 80 chars.

    Getting the console width (and height) is something so common that I believe there should be a built-in Python library for doing that. And it's also something hard to do correctly (get COLUMNS variable, or use ioctl, or trap SIGWINCH signal, or do something completely different on non-unix, or fallback to hardcoded default or user-supplied default)

    What's more, since built-in argparse module needs this feature, that is another good reason to get it inside standard Python library. It has been proposed before, but the issue was closed: bpo-8408

    Anyway, although I believe this is important, it is off-topic in this issue.

    @zbysz
    Copy link
    Mannequin

    zbysz mannequin commented Sep 25, 2011

    On 09/25/2011 01:50 AM, Graylin Kim wrote:
    >
    > Graylin Kim<graylin.kim@gmail.com>  added the comment:
    >
    > I fully support taking blank line based line-wrapping approach and agree with Zbyszek's suggested indentation approach as well. I am not sure why they didn't occur to me at the time but they are certainly a more effective and widely adopted approaches to the structured text problem.
    >
    > I suppose here is where I should volunteer to update the patch file...
    >
    >
    > Re: Bike-shedding
    >
    >> dash '-' has special meaning in brackets:
    >
    > Good catch, I had intended on '-' being a valid list item character. It clearly needs to be escaped. Not that it would matter given your proposed alternative.
    >
    >>>   if(list_match):
    >> Parenthesis unnecessary.
    >
    > In my defense I have the sadistic pleasure of coding in PHP where they are necessary for 8 hours a day for my day job. I can only apologize profusely for my offense and beg for forgiveness :)
    >

    :)

    >> lines = list()
    > Why not just 'lines = []'?

    Not to get off topic, but I happen to like list() and dict() instead of [] and {} for empty collections. If there are non-religious reasons for avoiding this practice I'll consider it. I don't want to invoke a holy war here, just wondering if there are practical reasons.

    In general brevity is good, but I agree that this is just a style
    question, and not very important here.

    This wasn't my intention, I was only saying that due to this bug the
    wrapping uses fixed width, but I'm hoping that bpo-13041 will be
    successfully resolved.

    @zbysz
    Copy link
    Mannequin

    zbysz mannequin commented Sep 25, 2011

    [I now see that roundup ate half of my reply. I have no idea why,
    because the e-mail is formatted correctly. Maybe I'll have more
    luck this time, but since there's no preview, I must try to see.]

    On 09/25/2011 01:50 AM, Graylin Kim wrote:
    >>>   if(list_match):
    >> Parenthesis unnecessary.
    >
    > In my defense I have the sadistic pleasure of coding in PHP where
    > they are necessary for 8 hours a day for my day job. I can only
    > apologize profusely for my offense and beg for forgiveness

    :)

    >> lines = list()
    > Why not just 'lines = []'?

    Not to get off topic, but I happen to like list() and dict() instead
    of [] and {} for empty collections. If there are non-religious
    reasons for avoiding this practice I'll consider it. I don't want to
    invoke a holy war here, just wondering if there are practical reasons.

    In general brevity is good, but I agree that this is just a style question, and not very important here.

    > One a side note: due to bpo-13041 the terminal width is normally stuck
    at 80 chars.
    Not a good reason to remove the flexibility from the implementation
    I don't think.
    This wasn't my intention, I was only saying that due to this bug the wrapping uses fixed width, but I'm hoping that bpo-13041 will be successfully resolved.

    @bethard
    Copy link
    Mannequin

    bethard mannequin commented Dec 15, 2011

    As I understand it the current proposal is:

    • Wrap each paragraph separately
    • Don't wrap any lines indented by at least one additional space

    This sounds like a useful formatter. I would probably call it "PargraphWrappingFormatter" or something like that which is more descriptive than FlexiFormatter.

    Sadly, it can't be the default, since that would break backwards compatibility, but I'd certainly agree to an obvious note somewhere in the docs recommending the use of this formatter instead of the current default.

    @zbysz
    Copy link
    Mannequin

    zbysz mannequin commented Feb 22, 2012

    I suppose here is where I should volunteer to update the patch file...
    @GraylinKim: do you still intend to work on this?

    @GraylinKim
    Copy link
    Mannequin Author

    GraylinKim mannequin commented Feb 22, 2012

    I'd be willing to at some point but I cannot see myself getting around to
    it in the near future.

    If someone else wants to offer an implementation that would be great.

    On Wed, Feb 22, 2012 at 10:42 AM, Zbyszek Szmek <report@bugs.python.org>wrote:

    Zbyszek Szmek <zbyszek@in.waw.pl> added the comment:

    > I suppose here is where I should volunteer to update the patch file...
    @GraylinKim: do you still intend to work on this?

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue12806\>


    @rurpy2
    Copy link
    Mannequin

    rurpy2 mannequin commented Nov 23, 2012

    I happened upon this issue while Googling for a formatter with the behavior described here.

    I put together a formatter derived from the code submitted by GraylinKim (2011-08-22) and offer it for consideration (though it is missing some things like docstrings and hasn't been tested very thoroughly).

    As per other comments, it uses additional indentation rather than leading special characters to start a new block. Differently than GraylinKim's code, additional indentation suppresses wrapping or any formatting. However, it would be easy to change that as I hope is obvious from the code.

    There are two common ways of denoting a block of text (a block being text that should be reformatted as a single unit; aka paragraph)

    1. A group of text lines ending with newlines followed by a blank line to denote the end of the block.

    2. A single (long) text line where the terminating newline denotes the end of the block (i.e. one line == one block).

    Both occur in the context of argparse help text:

    Example of #1:
    p.add_argument (....,
    help='''block1 block1 block1 block1
    block1 block1 block1 block1
    block1 block1 block1 block1

           block2 block2 block2 block2
           block2 block2 block2 block2''')
    

    Examples of #2:
    p.add_argument (....,
    help='block1 block1 block1 block1 '
    'block1 block1 block1 block1 '
    'block1 block1 block1 block1 \n'
    ''
    'block2 block2 block2 block2 '
    'block2 block2 block2 block2 ')

    p.add_argument (....,
    help='''block1 block1 block1 block1 \
    block1 block1 block1 block1 \
    block1 block1 block1 block1 \

           block2 block2 block2 block2 \
           block2 block2 block2 block2 ''')
    

    There is no way, when reading lines of text, to tell whether one is reading text in the form of #1 or #2, when one sees a newline. So a formatter really needs to be able to be told which form it is being given. This seems to require two separate formatter classes (though they call common code.)

    The first form (call it multiline blocked text) is formatted by ParagraphFormatterML. The second form (call it single-line blocked text; I often use form #2a) by ParagraphFormatter.

    @rurpy2
    Copy link
    Mannequin

    rurpy2 mannequin commented Nov 23, 2012

    Additional comment loosely related to the ParagraphFormatter offered in previous comment...

    [If this is not the right venue -- perhaps a new issue or one of the python mail lists would be better -- please tell me.]

    I notice that argparse.ArgumentParser requires a class (as opposed to instance) for the formatter_class parameter. A cursory look at argparse gives me the impression that this is only so that ArgumentParser can instantiate the instance with a 'prog' argument.

    If ArgumentParser accepted a HelpFormatter object (rather than a class), then a user could instantiate a custom formatter class with arguments that would customize its behavior. For example, I would be able to write a single ParagraphFormatter class that was instantiated like

      formatter = ParagraphFormatter (multiline=False)

    or

      formatter = ParagraphFormatter (multiline=True)

    If one has other requirements, perhaps apply one kind of formatting to description/epilogue text and another to the arguments text, then there is an even greater multiplicity of classes that could be avoided by instantiating a single formatter class with arguments.

    So why can't ArgumentParser look at the formatter_class value: if it's a class proceed as now, but if it's an class instance, simply set its ._prog attribute and use it as is:

        def _get_formatter(self):
            if isinstance (self.formatter_class, <type type>): 
                return self.formatter_class(prog=self.prog)
    	else:
                self.formatter_class._prog = prog
    	    return self.formatter_class

    Of course the "formatter_class" parameter name would then require a little explanation but that's what documentation is for.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 13, 2014

    An alternative to passing a Formatter instance to the parser is to use a wrapper function. HelpFormatter.__init__ takes several keyword args. '_get_formatter' does not use those. However we could define:

        def format_wrapper(**kwargs):
            # class 'factory' used to give extra parameters
            def fnc(prog):
                cls = argparse.HelpFormatter
                return cls(prog, **kwargs)
            return fnc

    and use that to set the 'width' of the formatter object.

        parser =  argparse.ArgumentParser( formatter_class =  format_wrapper(width=40))

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented May 13, 2014

    An alternative to adding a 'ParagraphFormatter' class to 'argparse', is to format the individual text blocks PRIOR to passing them to the 'parser', and use the 'RawTextHelpFormatter'.

    In the attached script I use a simple function that applies 'textwrap' to each 'line' of the text. Description, epilog, and argument help are formatted in roughly the same manner as in paraformatter.py, but without as many bells and whistles.

        def mywrap(text,**kwargs):
            # apply textwrap to each line individually
            text = text.splitlines()
            text = [textwrap.fill(line,**kwargs) for line in text]
            return '\n'.join(text)
    
        parser = argparse.ArgumentParser( formatter_class = argparse.RawTextHelpFormatter,
            description = mywrap(description),
            epilog = mywrap(epilog, width=40))

    I suspect there are tools for doing similar formatting, starting with 'markdown' or 'rsT' paragraphs (though HTML is the usual output). As the formatting becomes more complex it is better to use existing tools than to write something new for 'argparse'.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 16, 2014

    Apparently bpo-13923 is related to this.

    @paulj3
    Copy link
    Mannequin

    paulj3 mannequin commented Jul 30, 2014

    In http://bugs.python.org/issue22029 argparse
    CSS white-space: like control for individual text blocks

    I propose a set of str subclasses that can be used to define the wrapping style of individual text blocks. The idea is adapted from the HTML '

    ' tag, and the CSS white-space: option.

    argparse.WhitespaceStyle is a cover class that defines various utility methods, including _str_format which handles all of the % formatting. The individual subclasses implement their own version of _split_lines and _fill_text. I chose a standard set of classes based on the CSS white-space options:

    Normal() - full white space compression and wrapping. This is the default default of text in argparse.

    Pre() - preformatting, the same as the Raw formatters

    NoWrap() - Pre plus whitespace compression

    PreLine()
    
    PreWrap()

    In HelpFormatter, _split_lines, _fill_lines, _str_format delegate the action to text's own methods. Plain text is handled as Normal().

    I also defined a WSList class. This is a list of Style class objects. It has the same API as the Style classes, iterating over the items.

    Where possible these methods try to return an object of the same type as self.
    ------------------

    Here I demonstrate two ways that these classes could be used to implement a hybrid formatter.

    The first is a simple adaptation of the PareML formatter from paraformatter.py. It shows how a custom style class could be defined.

    The second is defines a preformat function, which converts the text block into a WSList, a list of style text objects. The wrappable paragraphs are Normal(), the preformatted indented lines are Pre(). Blank lines are Pre(' ').

    I've explored writing a Hanging class, which performs a hanging indent on list item sentences.

    @ssokolow
    Copy link
    Mannequin

    ssokolow mannequin commented Feb 14, 2016

    @GraylinKim:

    In the interest of people like myself who wander in here via Google, would you mind stating, for the record, what license argparse_formatter.py is under?

    @perette
    Copy link
    Mannequin

    perette mannequin commented Sep 25, 2018

    I would find this a useful feature.

    @davesteele
    Copy link
    Mannequin

    davesteele mannequin commented Oct 23, 2019

    I came across this thread after making a simple argparse formatter for preserving paragraphs. The submissions here look better than that effort. Here is a quick, hacky look at the patches from one perspective.

    I wanted to prefer ParagraphFormatterML, but didn't like that it doesn't appear to wrap bullet lines, and it wrapped help and epilogs to different lengths. For all options I found an initial textwrap.dedent() was needed to get the results I expected. When I did the dedent with ParagraphFormatter*, a subsequent textwrap.indent(" ") hack was needed to restore spaces at the wrap point. FlexiFormatter was incomplete - epilogs weren't affected.

    Ultimately, I settled on reworking FlexiFormatter. My version has the following changes:

    • Refactor the formatting code out a la ParagraphFormatter, and add to _fill_text() as well, so formatting is available for both epilogs and option help
    • Add a leading textwrap.dedent(), to get it to feel more like HelpFormatter.

    Note

    • The result requires line feed escapes within paragraphs.
    • I'm not using the "indent" argument for _fill_text(), with no apparent consequences.
    • Automated tests show that FlexiFormatter adds a space to each blank line. I decided that was not a problem

    Code is at:
    https://github.com/davesteele/argparse_formatter
    https://pypi.org/project/argparse-formatter/

    Regarding licensing, my contributions (and presumably the others') is addressed by the CLA.

    I'd very much like to see something from this thread merged. This looks to me to be good enough. Any objections to a pull request?

    @davesteele
    Copy link
    Mannequin

    davesteele mannequin commented Sep 7, 2020

    I've submitted FlexiHelpFormatter as PR22129.

    This adds the FlexiHelpFormatter class to argparse.

    It supports wrapping text, while preserving paragraphs. Bullet lists are supported.

    There are a number of differences, relative to the latest patch in the issue report:

    • single line feeds in a paragraph are allowed
    • the code is refactored to avoid duplication
    • test failure fixes (mostly whitespace)
    • Tests and documentation are included.
       >>> parser = argparse.ArgumentParser(
       ...     prog='PROG',
       ...     formatter_class=argparse.FlexiHelpFormatter,
       ...     description="""
       ...         The FlexiHelpFormatter will wrap text within paragraphs
       ...         when required to in order to make the text fit.
       ...
       ...         Paragraphs are preserved.
       ...
       ...         It also supports bulleted lists in a number of formats:
       ...           * stars
       ...           1. numbers
       ...           - ... and so on
       ...         """)
       >>> parser.add_argument(
       ...     "argument",
       ...     help="""
       ...         Argument help text also supports flexible formatting,
       ...         with word wrap:
       ...             * See?
       ...         """)
       >>> parser.print_help()
       usage: PROG [-h] option

    The FlexiHelpFormatter will wrap text within paragraphs when required to in
    order to make the text fit.

    Paragraphs are preserved.

    It also supports bulleted lists in a number of formats:
    * stars
    1. numbers
    - ... and so on

    positional arguments:
    argument Argument help text also supports flexible formatting, with word
    wrap:
    * See?

    optional arguments:
    -h, --help show this help message and exit

    @davesteele
    Copy link
    Mannequin

    davesteele mannequin commented Jan 4, 2021

    For those looking for a solution now, see https://pypi.org/project/argparse-formatter/

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: Features
    Development

    No branches or pull requests

    1 participant