Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csplit: Add missing precision syntax #5794

Closed
wants to merge 4 commits into from

Conversation

spineki
Copy link
Contributor

@spineki spineki commented Jan 5, 2024

Summary

This pull request adds support for the precision syntax in Csplit filename-suffix patterns (-b flag).

Example: to create files where the suffix is 10-elements long, padded to the left (-), with 3-digit precision (.3), in lower hex "x"

csplit -b '%-10.3x' in 1

Additionally, it allows user to use #, 0 flags at the same time.
Csplit allows split files to be named with fixed precision, independent from outer width.

Initial Issue

The initial example in the linked issue is now fixed. You can check the expected behaviour with the following

$ echo > in
$ /usr/bin/csplit -b '%0#6.3x' in 1
0
1
$ ls
...
xx   000
xx 0x001
$ echo > in
$ cargo run csplit -b '%0#6.3x' in 1
0
1
$ ls
...
xx   000
xx 0x001

Remarks

C and Rust format syntaxes differ when dealing with zero precision, octal prefix or precision in case of alternative mode (C doesn't take the extra 0x or o into account while Rust does).

To avoid redundant code, I split the extraction in two formatting steps. First, creating the inner formatted number, with a length of precision, then, adding additional padding, filling and direction for a total length of width.

There are a lot of corner cases, don't hesitate to point something I may have missed !

Related Issue

#5709

Tests

I added additional tests to check patterns involving multiple flags or single-dot-precision syntax.
I modified some tests because octal notation is not formatted the same way in Rust or in C. (0 in front of n in C, o in front of n in Rust)

@spineki spineki force-pushed the fix-csplit-precision-flags branch from 039d298 to b257c8f Compare January 5, 2024 23:01
@spineki spineki force-pushed the fix-csplit-precision-flags branch from b257c8f to a481519 Compare January 5, 2024 23:30
Copy link

github-actions bot commented Jan 6, 2024

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/timeout/timeout is no longer failing!

@sylvestre
Copy link
Contributor

Could you please add an integration test in tests/by-util/test_csplit.rs ?
thanks

Copy link

github-actions bot commented Jan 6, 2024

GNU testsuite comparison:

Congrats! The gnu test tests/timeout/timeout is no longer failing!

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This looks pretty good, but unfortunately duplicates a lot of the printf style formatting that is also present in other utils. If we keep separate implementations, we end up with incompatibilities and code that is hard to maintain.

The GNU docs say:

Use format as the output file name suffix. When this option is specified, the suffix string must include exactly one printf(3)-style conversion specification, possibly including format specification flags, a field width, a precision specification, or all of these kinds of modifiers. The format letter must convert a binary unsigned integer argument to readable form. The format letters ‘d’ and ‘i’ are aliases for ‘u’, and the ‘u’, ‘o’, ‘x’, and ‘X’ conversions are allowed. The entire format is given (with the current output file number) to sprintf(3) to form the file name suffixes for each of the individual output files in turn. If this option is used, the --digits option is ignored.

Source: https://www.gnu.org/software/coreutils/manual/html_node/csplit-invocation.html

So instead of reimplementing it, we should probably use our printf code. In particular, you could look how the printing code in seq is implemented. I think that is very similar, apart from the fact that we're printing integers instead of floats here.

See the seq code here:

let format = match options.format {
Some(f) => {
let f = Format::<num_format::Float>::parse(f)?;
Some(f)
}
None => None,
};

And the corresponding code from our formatting code:

pub struct Format<F: Formatter> {

Sorry, I wish we could have alerted you sooner about that functionality! Nevertheless, the tests you've made will be very useful to see if the printf version actually works!

@spineki
Copy link
Contributor Author

spineki commented Jan 8, 2024

edit: fixed bad copy-paste. c printf gives indeed before 000052 after. Sorry...

Thanks for the insights !

I tested the existing printf features to see if I could rely on it.

However

#include <stdio.h>

int main()
{
    printf("before %#15.6o after", 42);
    return 0;
}

gives:
before          000052 after
cargo run printf "before %#15.6o after", 42
before          0o52   after
gives

current formatting

  • doesn't handle precision (.3) (makes sense for the current issue)
  • doesn't use correct octal prefix (o instead 0)
  • something is wrong with the spacing (three spaces after 52 instead of 1)

I may have missed something, but from what I see, the uucore/src/lib/features/format would need deeper changes to correct both printf and csplit implementation.

@tertsdiepraam
Copy link
Member

Nice find! You're right about those incompatibilities, but we should fix those in printf!

By the way, I get this from both GNU printf in the terminal and printf in C:

before          000052 after

Maybe we could open separate issues for each of these problems.

@tertsdiepraam
Copy link
Member

@spineki, I'm gonna close this since I think we agree this should be handled by the printf mechanisms we already have. Feel free to open a new PR for that. Thanks for helping us find the issues with printf!

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Feb 7, 2024

Where were we with fixing the printf issues?

We've had:

What remains to be done?

Edit: Looks like this is still broken:

printf 'before %#15.6o after' 42

Expected:

before          000052 after

uutils:

before          052    after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants