Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dumped while testing mutation_log::reset_from #1207

Closed
empiredan opened this issue Oct 27, 2022 · 0 comments · Fixed by #1208
Closed

Core dumped while testing mutation_log::reset_from #1207

empiredan opened this issue Oct 27, 2022 · 0 comments · Fixed by #1208
Labels
type/bug This issue reports a bug.

Comments

@empiredan
Copy link
Contributor

Both Test Release (dsn.replica.test) and Test ASAN (dsn.replica.test) for dsn.replica.test have failed in CI workflows of #1205, with reported errors as follows:

[ RUN      ] mutation_log_test.reset_from
E2022-10-26 11:56:44.166 (1666785404166907303 282) replica.default0.0000010100010001: replica.cpp:569:init_disk_tag(): [1.1@] get disk tag of ./test-log failed: ERR_OBJECT_NOT_FOUND, init it to empty 
W2022-10-26 11:56:44.172 (1666785404172717263 282) replica.default0.0000010100010001: filesystem.cpp:381:rename_path(): rename from './test-log.1666785404172467160' to './test-log' failed, err = Directory not empty
F2022-10-26 11:56:44.172 (1666785404172723363 282) replica.default0.0000010100010001: mutation_log.cpp:970:operator()(): assertion expression: false
F2022-10-26 11:56:44.172 (1666785404172728763 282) replica.default0.0000010100010001: mutation_log.cpp:970:operator()(): rollback ./test-log.1666785404172[467](https://github.com/apache/incubator-pegasus/actions/runs/3326102384/jobs/5505382914#step:7:468)160 to ./test-log failed
Aborted (core dumped)
Error: Process completed with exit code 134.

Review the code of mutation_log::reset_from with the reported errors, some problems could be found.

Firstly, the reason for core dump is that err in capture list of the lambda function for dsn::defer is passed as value rather than a reference. As a result, err for lambda function will always be ERR_FILE_OPERATION_FAILED; also, utils::filesystem::rename_path(temp_dir, _dir) will be called mistakenly. See the following code for details:

    // define `defer` for rollback temp_dir when failed or remove temp_dir when success
    auto temp_dir_resolve = dsn::defer([this, err, temp_dir]() {
        if (err != ERR_OK) {
            if (!utils::filesystem::rename_path(temp_dir, _dir)) {
                // rollback failed means old log files are not be recovered, it may be lost if only
                // LOG_ERROR,  dassert for manual resolve it
                dassert_f("rollback {} to {} failed", temp_dir, _dir);
            }
        } else {
            if (!dsn::utils::filesystem::remove_path(temp_dir)) {
                // temp dir allow delete failed, it's only garbage
                LOG_ERROR_F("remove temp dir {} failed", temp_dir);
            }
        }
    });

Secondly, from the above code, another error can be found for dassert_f("rollback {} to {} failed", temp_dir, _dir);. The first parameter for dassert_f should have been a condition; however, "rollback {} to {} failed" is passed as the first parameter by mistake rather than the condition. Therefore, this assertion will never failed since this string will always be converted implicitly as true.

Thirdly, the following code also renames a directory, however there is not any rollback process for this:

    // move source dir to target dir
    if (!utils::filesystem::rename_path(dir, _dir)) {
        LOG_ERROR_F("rename {} to {} failed", dir, _dir);
        return err;
    }
    LOG_INFO_F("move {} to {} as our new log directory", dir, _dir);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant