Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk-full errors are not correctly reported #280

Open
nh2 opened this issue Nov 2, 2023 · 5 comments
Open

Disk-full errors are not correctly reported #280

nh2 opened this issue Nov 2, 2023 · 5 comments

Comments

@nh2
Copy link

nh2 commented Nov 2, 2023

Throughout the code, PoissonRecon does not check the return values of write()/fwrite() correctly.

While in some parts of the code there's certainly awareness that writes can fail, e.g. when the disk is full, such as here with a bool being returned:

bool write( const std::vector< std::vector< C > > &c )
{
size_t sz = c.size();
if( !write( sz ) ) return false;
bool ret = true;
for( size_t i=0 ; i<sz && ret ; i++ ) ret &= write( c[i] );
return ret;
}

much of the code simply calls write(), ignoring the returned bool and continuing:

stream.write( _index );
stream.write( _range );


Because of this, you can get the client failing with:

[RECEIVE 1]  288   B: 0.0 (s)
[PROCESS 1]         : 1652.8 (s), 132842.0 (MB) / 132842.0 (MB)
[SEND    1]   96   B: 0.0 (s)
[CACHE   1]         : 490.9 (s) , 214887 (MB)
[ERROR] Src/FEMTree.inl (Line 372)
        FEMTree
        Failed to read transform

and the server saying:

[RECEIVE 2]   96   B: 2066.2 (s)
[PROCESS 2]         : 0.0 (s), 5.2 (MB) / 5.2 (MB)
[SEND    2]   32   B: 0.0 (s)
[ERROR] Src/Socket.h (Line 59)
        socket_receive
        Failed to read from socket

when in fact e.g. the --tempDir is full and the error should really be no space left on device.

@nh2
Copy link
Author

nh2 commented Nov 2, 2023

I recommend the following approach:

  • If there are no cases where you actually want to handle write failing, throw an exception.
  • In the long run, additionally move off fwrite() (C stdlib) and use write() (POSIX) instead, because the FILE based functions do not report what the error was (errno), see e.g. here.
    C++ IO streams are also notoriously bad for not reporting errno, so POSIX functions are the simplest solution to get good error messages like no space left on device (with perror() or strerror()).

@nh2
Copy link
Author

nh2 commented Nov 2, 2023

Same issue for read() functions, which also do not do failure checks (errors can happen e.g. if the disk is bad, the network disk is disconnected, and so on).

nh2 added a commit to nh2/PoissonRecon that referenced this issue Nov 2, 2023
Avoids silent continuing after failed writes, because
currently callers do not check the returned bool.
@nh2
Copy link
Author

nh2 commented Nov 2, 2023

Some improvements for this in my branch: master...nh2:PoissonRecon:nh2fixes-v15.03

nh2 added a commit to nh2/PoissonRecon that referenced this issue Nov 2, 2023
Avoids silent continuing after failed writes, because
currently callers do not check the returned bool.
@nh2
Copy link
Author

nh2 commented Nov 3, 2023

  • move off fwrite() (C stdlib)

Another motivation for that:

https://stackoverflow.com/questions/2329842/checking-for-success-of-fwrite-in-c-perror#comment5694186_2329882

fwrite cannot handle EAGAIN or EINTR at all. They will set the error indicator for the stream, and there's no way to resume writing without possibly getting lost or duplicate data (due to not knowing the state of the buffer). If you want to use interrupting signals or want to setup a file as nonblocking, stdio is generally not usable.

nh2 added a commit to nh2/PoissonRecon that referenced this issue Nov 3, 2023
In general, the code's IO design has a problem:
It uses RAII for file resource handling.
This is not robust, because only `fclose()` (or `fflush()`) is
guaranteed to actually pass data to the OS, and thus surface potential errors.

"RAII-for-files" requires that `fclose()` be put in C++ destructors.
But destructors are not allowed to `throw`.
Thus "RAII-for-files" cannot do proper error handling of failed writes.

**C++ RAII cannot be used for file management.**

Or anything where the cleanup action can fail.
(Unless you're happy to swallow errors silently, which nobody should be.)

This commit replaces all calls to `fwrite()` and `fclose()` by

    throwing_fwrite()
    throwing_fclose()
    fwrite_from_destructor()
    fclose_from_destructor()

All variants try to report good error reasons, such as
"no space left on device"
(currently only imlemented for POSIX, not for Windows, where
no error details are produced).

The `*_from_destructor()` versions use non-allocating writes to
`std::cerr` and terminate the program.
@nh2
Copy link
Author

nh2 commented Nov 3, 2023

Some improvements for this in my branch: master...nh2:PoissonRecon:nh2fixes-v15.03

I'm furhter improving this situation in that branch with commit nh2@e155f45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant