Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wget webhdfs failed #3676

Open
phantooom opened this issue Nov 28, 2024 · 11 comments
Open

wget webhdfs failed #3676

phantooom opened this issue Nov 28, 2024 · 11 comments
Assignees
Labels

Comments

@phantooom
Copy link

Bug report:

download via proxy failed (rust client)
export http_proxy=http://100.6.4.218:4001
wget http://hdfs.internal:50070/webhdfs/v1/agent_index/dt=2024-09-15/agent05/index.rpc-server-agent05.8/_qj_Lucene84_0.doc?op=OPEN

failed

download via proxy success(go dfdaemon)

export http_proxy=http://100.6.4.218:5002
wget http://hdfs.internal:50070/webhdfs/v1/agent_index/dt=2024-09-15/agent05/index.rpc-server-agent05.8/_qj_Lucene84_0.doc?op=OPEN
success

  2024-11-28T15:40:01.499627706+00:00 ERROR  copy "/var/lib/dragonfly/content/tasks/b36/b36833475c0f737d276cd782d57593f91215dc5d6db835fb2f4f0253b44a225b" failed: error decoding response body
    at dragonfly-client-storage/src/content.rs:317
    in write_piece
    in download_piece_from_source_finished
    in download_from_source with piece_id: "b36833475c0f737d276cd782d57593f91215dc5d6db835fb2f4f0253b44a225b-102"
    in download_partial_with_scheduler_from_source
    in download_partial_with_scheduler
    in download
    in download_task with host_id: "100.6.4.218-lf-op-k8s-node782-pm", task_id: "b36833475c0f737d276cd782d57593f91215dc5d6db835fb2f4f0253b44a225b", peer_id: "100.6.4.218-lf-op-k8s-node782-pm-d5d926b3-decb-44f0-a95e-10073befbee9"

  2024-11-28T15:40:01.499653716+00:00 ERROR  download piece finished: error decoding response body
    at dragonfly-client/src/resource/piece.rs:640
    in download_from_source with piece_id: "b36833475c0f737d276cd782d57593f91215dc5d6db835fb2f4f0253b44a225b-102"
    in download_partial_with_scheduler_from_source
    in download_partial_with_scheduler
    in download
    in download_task with host_id: "100.6.4.218-lf-op-k8s-node782-pm", task_id: "b36833475c0f737d276cd782d57593f91215dc5d6db835fb2f4f0253b44a225b", peer_id: "100.6.4.218-lf-op-k8s-node782-pm-d5d926b3-decb-44f0-a95e-10073befbee9"

Expected behavior:

use rust client proxy will success

How to reproduce it:

Environment:

  • Dragonfly version: client:v0.1.118 & dfdaemon:v2.1.63
  • OS: centos 7
  • Kernel (e.g. uname -a): 4.19
  • Others:
@phantooom phantooom added the bug label Nov 28, 2024
@BruceAko
Copy link
Contributor

BruceAko commented Nov 29, 2024

dfget has implemented backend for hdfs with opendal. Please use dfget hdfs://hdfs.internal:9870/path/to/your/file to download files from hdfs. (9870 is the web port of your webhdfs api).
For specific details you can refer to https://docs.rs/opendal/latest/opendal/services/struct.Webhdfs.html.

@gaius-qi
Copy link
Member

@phantooom What is the file size?

@phantooom
Copy link
Author

2.6g

@gaius-qi
Copy link
Member

gaius-qi commented Nov 29, 2024

@phantooom I have added retry for http backend, you can try the v0.1.120 client.

@phantooom
Copy link
Author

@phantooom I have added retry for http backend, you can try the v0.1.120 client.

thanks

@phantooom
Copy link
Author

phantooom commented Dec 4, 2024

have the same problem.

root@lf-op-k8s-node782-pm:/# /usr/local/bin/dfdaemon -V
dfdaemon 0.1.120 (unknown, unknown)

  2024-12-04T03:31:11.726384313+00:00  INFO  start to download piece d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87-82 from source
    at dragonfly-client/src/resource/task.rs:1227
    in download_partial_with_scheduler_from_source
    in download_partial_with_scheduler
    in download
    in download_task with host_id: "100.65.4.218-lf-op-k8s-node782-pm", task_id: "d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87", peer_id: "100.65.4.218-lf-op-k8s-node782-pm-d8db9b1b-fbb9-40ff-b119-0e988befbd51"

  2024-12-04T03:31:11.727295016+00:00 ERROR  download piece finished: error decoding response body
    at dragonfly-client/src/resource/piece.rs:644
    in download_from_source with piece_id: "d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87-1"
    in download_partial_with_scheduler_from_source
    in download_partial_with_scheduler
    in download
    in download_task with host_id: "100.65.4.218-lf-op-k8s-node782-pm", task_id: "d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87", peer_id: "100.65.4.218-lf-op-k8s-node782-pm-d8db9b1b-fbb9-40ff-b119-0e988befbd51"

  2024-12-04T03:31:11.727312365+00:00  INFO  delete piece metadata d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87-1
    at dragonfly-client-storage/src/metadata.rs:798
    in delete_piece
    in download_piece_failed
    in download_piece_failed
    in download_from_source with piece_id: "d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87-1"
    in download_partial_with_scheduler_from_source
    in download_partial_with_scheduler
    in download
    in download_task with host_id: "100.65.4.218-lf-op-k8s-node782-pm", task_id: "d09ca1f7a0b777001e708d026a573d1ff070ea56665e18dc6d4f7e476dd68e87", peer_id: "100.65.4.218-lf-op-k8s-node782-pm-d8db9b1b-fbb9-40ff-b119-0e988befbd51"

@phantooom
Copy link
Author

when i use go seed as proxy. it's never faild. use rust client as proxy. 100% failed when the file larger than 1GB.

@gaius-qi
Copy link
Member

gaius-qi commented Dec 4, 2024

@phantooom I can't reproduce it locally. Is there an public HDFS service that I can reproduce it

@bigeyefish
Copy link

when i use go seed as proxy. it's never faild. use rust client as proxy. 100% failed when the file larger than 1GB.

I also encountered this problem. client version is "v0.2.0"

@gaius-qi
Copy link
Member

@bigeyefish Is there an public HDFS service that I can reproduce it?

@bigeyefish
Copy link

@bigeyefish Is there an public HDFS service that I can reproduce it?

I have fixed it. The reason is config "download -> pieceTimeout",When downloading large files with many tasks, exceeding the configured time limit will cause timeout failure and continuous retries, which will eventually result in high load on the file server.

after change pieceTimeout from 30s to 60m, it fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants