-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug that creates 0 byte block file mistakenly #17497
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
beinan
approved these changes
May 26, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
rerun the failed test |
alluxio-bot, merge this please. |
merge failed: |
alluxio-bot, merge this please. |
Xenorith
pushed a commit
to Xenorith/alluxio
that referenced
this pull request
May 30, 2023
Fix a bug that may create 0-byte block file on worker, when there is issue to read a file from UFS. Also fix the logging as it is too spammy when HDFSUnderFileSystem fails to read a UFS file When we are caching a file (async), and somehow the file does not exist on UFS any more (perhaps modified out of band). An exception will be thrown from `UnderFileSystemBlockStore.createBlockReader`. In its exception handling part, we treated this case the same as a normal close and commit the temp block. This commit fixes this by abort the temp block instead on error cases. Besides, the exception message in `createUfsBlockReader` is constructed wrong by also attaching the stacktrace into errorMessage. This is also fixed. In addition, surpressing the warn log on HDFS UFS when attempting to read a file to debug level, but only show the last error. ``` 2023-05-17 06:43:13,039 WARN UfsInputStreamCache - Failed to create a new cached ufs instream of file id 6321787562360831 and path hdfs://nameservice1/user/hive/warehouse/some/table/period_name_desc=2023-17/period_end_date=2023-03-31/000008_0 java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/some/table/period_name_desc=2023-17/period_end_date=2023-03-31/000008_0 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2168) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2049) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:583) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:94) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272) at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547) at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:244) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) at com.google.common.cache.LocalCache.get(LocalCache.java:3966) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863) at alluxio.worker.block.UfsInputStreamCache.acquire(UfsInputStreamCache.java:227) at alluxio.worker.block.UnderFileSystemBlockReader.updateUnderFileSystemInputStream(UnderFileSystemBlockReader.java:373) at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:194) at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:137) at alluxio.worker.block.UnderFileSystemBlockStore.createBlockReader(UnderFileSystemBlockStore.java:306) at alluxio.worker.block.MonoBlockStore.createUfsBlockReader(MonoBlockStore.java:199) at alluxio.worker.block.DefaultBlockWorker.createUfsBlockReader(DefaultBlockWorker.java:413) at alluxio.worker.block.CacheRequestManager.cacheBlockFromUfs(CacheRequestManager.java:261) at alluxio.worker.block.CacheRequestManager.cacheBlock(CacheRequestManager.java:239) at alluxio.worker.block.CacheRequestManager.access$000(CacheRequestManager.java:56) at alluxio.worker.block.CacheRequestManager$CacheTask.call(CacheRequestManager.java:210) at alluxio.worker.block.CacheRequestManager$CacheTask.call(CacheRequestManager.java:164) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at alluxio.worker.grpc.GrpcExecutors$ImpersonateThreadPoolExecutor.lambda$execute$0(GrpcExecutors.java:159) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` No pr-link: Alluxio#17497 change-id: cid-92e96a46cf67606c3087115cf065a8470d929421
jiacheliu3
pushed a commit
to jiacheliu3/alluxio
that referenced
this pull request
Jun 1, 2023
### What changes are proposed in this pull request? Fix a bug that may create 0-byte block file on worker, when there is issue to read a file from UFS. Also fix the logging as it is too spammy when HDFSUnderFileSystem fails to read a UFS file ### Why are the changes needed? When we are caching a file (async), and somehow the file does not exist on UFS any more (perhaps modified out of band). An exception will be thrown from `UnderFileSystemBlockStore.createBlockReader`. In its exception handling part, we treated this case the same as a normal close and commit the temp block. This commit fixes this by abort the temp block instead on error cases. Besides, the exception message in `createUfsBlockReader` is constructed wrong by also attaching the stacktrace into errorMessage. This is also fixed. In addition, surpressing the warn log on HDFS UFS when attempting to read a file to debug level, but only show the last error. ``` 2023-05-17 06:43:13,039 WARN UfsInputStreamCache - Failed to create a new cached ufs instream of file id 6321787562360831 and path hdfs://nameservice1/user/hive/warehouse/some/table/period_name_desc=2023-17/period_end_date=2023-03-31/000008_0 java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/some/table/period_name_desc=2023-17/period_end_date=2023-03-31/000008_0 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2168) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2049) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:583) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:94) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272) at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547) at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:244) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) at com.google.common.cache.LocalCache.get(LocalCache.java:3966) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863) at alluxio.worker.block.UfsInputStreamCache.acquire(UfsInputStreamCache.java:227) at alluxio.worker.block.UnderFileSystemBlockReader.updateUnderFileSystemInputStream(UnderFileSystemBlockReader.java:373) at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:194) at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:137) at alluxio.worker.block.UnderFileSystemBlockStore.createBlockReader(UnderFileSystemBlockStore.java:306) at alluxio.worker.block.MonoBlockStore.createUfsBlockReader(MonoBlockStore.java:199) at alluxio.worker.block.DefaultBlockWorker.createUfsBlockReader(DefaultBlockWorker.java:413) at alluxio.worker.block.CacheRequestManager.cacheBlockFromUfs(CacheRequestManager.java:261) at alluxio.worker.block.CacheRequestManager.cacheBlock(CacheRequestManager.java:239) at alluxio.worker.block.CacheRequestManager.access$000(CacheRequestManager.java:56) at alluxio.worker.block.CacheRequestManager$CacheTask.call(CacheRequestManager.java:210) at alluxio.worker.block.CacheRequestManager$CacheTask.call(CacheRequestManager.java:164) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at alluxio.worker.grpc.GrpcExecutors$ImpersonateThreadPoolExecutor.lambda$execute$0(GrpcExecutors.java:159) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` ### Does this PR introduce any user facing changes? No pr-link: Alluxio#17497 change-id: cid-92e96a46cf67606c3087115cf065a8470d929421
Xenorith
pushed a commit
to Xenorith/alluxio
that referenced
this pull request
Jun 12, 2023
Cherry-pick of existing commit. orig-pr: Alluxio#17497 orig-commit: Alluxio/alluxio@812855f orig-commit-author: Bin Fan <fanbin103@gmail.com>
alluxio-bot
pushed a commit
that referenced
this pull request
Mar 12, 2024
### What changes are proposed in this pull request? A this fix a bug in corrupted data files. Previously, #17497 attempt to solve this issue but only covers the case when creating a UFS reader Notice we only handle block reader. So this may not fix `paged Block Reader` in 2.x ### Why are the changes needed? Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: #18525 change-id: cid-bbba0feb29231e70750e5e79da5f405bb591d47a
alluxio-bot
pushed a commit
that referenced
this pull request
Mar 12, 2024
### What changes are proposed in this pull request? A this fix a bug in corrupted data files. Previously, #17497 attempt to solve this issue but only covers the case when creating a UFS reader Notice we only handle block reader. So this may not fix `paged Block Reader` in 2.x ### Why are the changes needed? Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: #18525 change-id: cid-bbba0feb29231e70750e5e79da5f405bb591d47a
jja725
added a commit
to jja725/alluxio
that referenced
this pull request
Mar 12, 2024
### What changes are proposed in this pull request? A this fix a bug in corrupted data files. Previously, Alluxio#17497 attempt to solve this issue but only covers the case when creating a UFS reader Notice we only handle block reader. So this may not fix `paged Block Reader` in 2.x ### Why are the changes needed? Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#18525 change-id: cid-bbba0feb29231e70750e5e79da5f405bb591d47a
yuzhu
pushed a commit
to yuzhu/alluxio
that referenced
this pull request
Apr 1, 2024
A this fix a bug in corrupted data files. Previously, Alluxio#17497 attempt to solve this issue but only covers the case when creating a UFS reader Notice we only handle block reader. So this may not fix `paged Block Reader` in 2.x Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#18525 change-id: cid-bbba0feb29231e70750e5e79da5f405bb591d47a
yuzhu
pushed a commit
to yuzhu/alluxio
that referenced
this pull request
Apr 1, 2024
A this fix a bug in corrupted data files. Previously, Alluxio#17497 attempt to solve this issue but only covers the case when creating a UFS reader Notice we only handle block reader. So this may not fix `paged Block Reader` in 2.x Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#18525 change-id: cid-bbba0feb29231e70750e5e79da5f405bb591d47a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Fix a bug that may create 0-byte block file on worker, when there is issue to read a file from UFS.
Also fix the logging as it is too spammy when HDFSUnderFileSystem fails to read a UFS file
Why are the changes needed?
When we are caching a file (async), and somehow the file does not exist on UFS any more (perhaps modified out of band).
An exception will be thrown from
UnderFileSystemBlockStore.createBlockReader
.In its exception handling part, we treated this case the same as a normal close and commit the temp block.
This commit fixes this by abort the temp block instead on error cases.
Besides, the exception message in
createUfsBlockReader
is constructed wrong by also attaching the stacktrace into errorMessage. This is also fixed.In addition, surpressing the warn log on HDFS UFS when attempting to read a file to debug level, but only show the last error.
Does this PR introduce any user facing changes?
No