Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared storage across workers can result in sealing sectors being deleted #7484

Closed
8 of 18 tasks
neondragon opened this issue Oct 10, 2021 · 0 comments · Fixed by #7494
Closed
8 of 18 tasks

Shared storage across workers can result in sealing sectors being deleted #7484

neondragon opened this issue Oct 10, 2021 · 0 comments · Fixed by #7494
Assignees
Labels

Comments

@neondragon
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

1.13.0-rc2

Describe the Bug

When two workers on different machines share the same storage, sectors can be partially deleted during the remote copy process.

Scenario
Two seal storage locations exist:

  • seal1
  • seal2.

Three machines exist, each running one worker process:

  • PC1 Worker 1 (has access to seal1)
  • PC1 Worker 2 (has access to seal2)
  • PC2/C2 Worker (has access to BOTH seal1 and seal2)

A sector exists with components located on both storage locations:

  • Unsealed sector is on seal1
  • Sealed sector, and cache directory, is on seal2

This is how the sector gets deleted:

  1. Lotus wants to move ownership of the sector from PC2/C2 Worker to PC1 Worker 2
  2. Lotus performs a remote copy of the Unsealed sector from PC2/C2 Worker to PC1 Worker 2. All parts of the sector are now on seal2.
  3. Lotus sends a command to PC2/C2 Worker to delete the unsealed sector
  4. PC2/C2 Worker has access to both storage locations seal1 and seal2, so, it deletes the Unsealed from seal1, and deletes the Unsealed from seal2
  5. All copies of the unsealed sector are gone

Logging Information

Will update with logs when I see it clearly happening. Looking at the code I think it's clear that the remote delete command assumes that the worker has no access to the destination storage location, and so deletes the file from every storage location. The remote worker should instead only delete the file from the storage location that was the source of the copy, not delete the copies on both storage locations.

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants