-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ko perf train gpu one #25250
Ko perf train gpu one #25250
Conversation
docs/source/ko/perf_train_gpu_one.md
Outdated
|
||
This guide focuses on training large models efficiently on a single GPU. These approaches are still valid if you have access to a machine with multiple GPUs but you will also have access to additional methods outlined in the [multi-GPU section](perf_train_gpu_many). | ||
|
||
In this section we have a look at a few tricks to reduce the memory footprint and speed up training for large models and how they are integrated in the [`Trainer`] and [🤗 Accelerate](https://huggingface.co/docs/accelerate/). Each method can improve speed or memory usage which is summarized in the table below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatGPT는 이 부분을 '메모리 풋프린트'로 그대로 표시했는데, 메모리 풋프린트를 그대로 메모리 풋프린트로 두는 게 좋을까요? 아니면 메모리 풋프린트를 메모리 사용량으로 바꾸는 것도 괜찮을까요 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
메모리 사용량
으로 표기하셔도 좋을 것으로 생각됩니다!
메모리 사용량(memory footprint)
으로 병기하신다면, 의미를 더 쉽게 이해할 수 있을 것 같습니다.
docs/source/ko/perf_train_gpu_one.md
Outdated
|
||
### 8-bit Adam | ||
|
||
Instead of aggregating optimizer states like Adafactor, 8-bit Adam keeps the full state and quantizes it. Quantization means that it stores the state with lower precision and dequantizes it only for the optimization. This is similar to the idea behind FP16 training where using variables with lower precision saves memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantize 번역을 어떻게 하는 게 좋을까요? 아래는 chatGPT한테 질문한 내용입니다!
질문
여기서 quantizes를 양자화한다라고 해석을 하니 의미가 잘 와닿지가 않아. Quantizes를 무슨 뜻으로 받아들이면 될까?
답변
"Quantizes"의 맥락에서의 의미는 데이터를 보다 적은 메모리를 사용하는 방식으로 변환하거나 압축하는 것을 의미합니다. 이 경우, "quantizes"는 옵티마이저의 상태를 더 적은 메모리를 사용하도록 변환한다는 의미로 해석될 수 있습니다. 즉, 데이터를 더 작은 비트로 표현함으로써 메모리를 절약하고 컴퓨팅 효율성을 높입니다.
이러한 접근 방식은 FP16 훈련에서의 아이디어와 유사한데, 이 경우에도 저정밀도(즉, 16비트 부동 소수점)를 사용하여 메모리 사용량을 줄이고 속도를 높이는 방법입니다.
따라서 "quantizes"를 "저정밀도로 변환한다" 혹은 "저정밀도로 표현한다" 정도로 이해하면 적절할 것 같습니다.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantize
는 최근 양자화하다
로 많이 사용되고 있어서 그대로 번역하셔도 괜찮을 것 같습니다!
저정밀도로 변환한다
는 의미에 대해 부연 설명을 덧붙여주시면, 더 이해하기 쉬운 글이 될 것으로 생각됩니다.
추가로 통용되는 용어를 확인하기 위해 국내 TTA 정보통신사전을 활용하시면 좋습니다 😄
docs/source/ko/perf_train_gpu_one.md
Outdated
|
||
Since it has been discovered that more parameters lead to better performance, this technique allows to increase the number of parameters by an order of magnitude without increasing training costs. | ||
|
||
In this approach every other FFN layer is replaced with a MoE Layer which consists of many experts, with a gated function that trains each expert in a balanced way depending on the input token's position in a sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experts를 특수 모듈 정도로 번역하면 어떨까요?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MoE(Mixture of Experts)에서 "expert"는 모델 내부의 특정 부분 또는 기능에 특화된 작은 서브모델들을 의미하기 때문에, 말씀주신 특수 모듈
로도 번역하셔도 좋을 것 같습니다. expert
로도 사용되며 전문 모델
, 전문가
로도 번역되곤 하니 참고 부탁드립니다!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this big doc! We've recently refactored this page in #23963 to be more concise and actionable.
Would you mind updating the translation? 🤗
번역 문서가 새롭게 업데이트 되었습니다! 아직 번역이 완료되지 않으셨다면, 문서 업데이트 부탁드립니다 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
따라서 GPU 메모리를 절약하거나 작업을 더 빠르게 할 수 있는 몇 가지 포인트들이 있을 수 있습니다. 첫 번째 간단한 최적화인 적절한 배치 크기 선택부터 시작해 보겠습니다. | ||
|
||
## 배치 크기 [[batch-sizes]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the content above this section has been moved and translated in #25755, so it can be removed now :)
|
||
다음은 도커 이미지를 다운로드하고 배포하는 지침을 따르면 됩니다. | ||
|
||
## 희소성 [[sparsity]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed, there's no section on sparsity in the new docs.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Translated the
<your_file>.md
file of the documentation to Korean.Thank you in advance for your review.
Part of #20179
Before reviewing
[[lowercased-header]]
)Who can review? (Initial)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review? (Final)