-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csplit: Add missing precision syntax #5794
Conversation
039d298
to
b257c8f
Compare
b257c8f
to
a481519
Compare
GNU testsuite comparison:
|
Could you please add an integration test in tests/by-util/test_csplit.rs ? |
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This looks pretty good, but unfortunately duplicates a lot of the printf
style formatting that is also present in other utils. If we keep separate implementations, we end up with incompatibilities and code that is hard to maintain.
The GNU docs say:
Use format as the output file name suffix. When this option is specified, the suffix string must include exactly one printf(3)-style conversion specification, possibly including format specification flags, a field width, a precision specification, or all of these kinds of modifiers. The format letter must convert a binary unsigned integer argument to readable form. The format letters ‘d’ and ‘i’ are aliases for ‘u’, and the ‘u’, ‘o’, ‘x’, and ‘X’ conversions are allowed. The entire format is given (with the current output file number) to sprintf(3) to form the file name suffixes for each of the individual output files in turn. If this option is used, the --digits option is ignored.
Source: https://www.gnu.org/software/coreutils/manual/html_node/csplit-invocation.html
So instead of reimplementing it, we should probably use our printf
code. In particular, you could look how the printing code in seq
is implemented. I think that is very similar, apart from the fact that we're printing integers instead of floats here.
See the seq
code here:
coreutils/src/uu/seq/src/seq.rs
Lines 111 to 117 in 8e83b34
let format = match options.format { | |
Some(f) => { | |
let f = Format::<num_format::Float>::parse(f)?; | |
Some(f) | |
} | |
None => None, | |
}; |
And the corresponding code from our formatting code:
pub struct Format<F: Formatter> { |
Sorry, I wish we could have alerted you sooner about that functionality! Nevertheless, the tests you've made will be very useful to see if the printf
version actually works!
edit: fixed bad copy-paste. c printf gives indeed Thanks for the insights ! I tested the existing However #include <stdio.h>
int main()
{
printf("before %#15.6o after", 42);
return 0;
}
gives:
before 000052 after cargo run printf "before %#15.6o after", 42
before 0o52 after
gives current formatting
I may have missed something, but from what I see, the uucore/src/lib/features/format would need deeper changes to correct both printf and csplit implementation. |
Nice find! You're right about those incompatibilities, but we should fix those in By the way, I get this from both GNU
Maybe we could open separate issues for each of these problems. |
@spineki, I'm gonna close this since I think we agree this should be handled by the |
Where were we with fixing the We've had:
What remains to be done? Edit: Looks like this is still broken:
Expected:
uutils:
|
Summary
This pull request adds support for the
precision
syntax in Csplit filename-suffix patterns (-b
flag).Example: to create files where the suffix is 10-elements long, padded to the left (-), with 3-digit precision (.3), in lower hex "x"
Additionally, it allows user to use
#
,0
flags at the same time.Csplit allows split files to be named with fixed precision, independent from outer width.
Initial Issue
The initial example in the linked issue is now fixed. You can check the expected behaviour with the following
Remarks
C and Rust format syntaxes differ when dealing with zero precision, octal prefix or precision in case of alternative mode (C doesn't take the extra
0x
oro
into account while Rust does).To avoid redundant code, I split the extraction in two formatting steps. First, creating the inner formatted number, with a length of
precision
, then, adding additional padding, filling and direction for a total length ofwidth
.There are a lot of corner cases, don't hesitate to point something I may have missed !
Related Issue
#5709
Tests
I added additional tests to check patterns involving multiple flags or single-dot-precision syntax.
I modified some tests because octal notation is not formatted the same way in Rust or in C. (
0
in front ofn
in C,o
in front ofn
in Rust)