Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ko perf train gpu one #25250

Closed
wants to merge 3 commits into from
Closed

Conversation

HongB1
Copy link

@HongB1 HongB1 commented Aug 2, 2023

What does this PR do?

Translated the <your_file>.md file of the documentation to Korean.
Thank you in advance for your review.

Part of #20179

Before reviewing

  • Check for missing / redundant translations (번역 누락/중복 검사)
  • Grammar Check (맞춤법 검사)
  • Review or Add new terms to glossary (용어 확인 및 추가)
  • Check Inline TOC (e.g. [[lowercased-header]])
  • Check live-preview for gotchas (live-preview로 정상작동 확인)

Who can review? (Initial)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review? (Final)

@HongB1 HongB1 marked this pull request as ready for review August 2, 2023 03:48

This guide focuses on training large models efficiently on a single GPU. These approaches are still valid if you have access to a machine with multiple GPUs but you will also have access to additional methods outlined in the [multi-GPU section](perf_train_gpu_many).

In this section we have a look at a few tricks to reduce the memory footprint and speed up training for large models and how they are integrated in the [`Trainer`] and [🤗 Accelerate](https://huggingface.co/docs/accelerate/). Each method can improve speed or memory usage which is summarized in the table below:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatGPT는 이 부분을 '메모리 풋프린트'로 그대로 표시했는데, 메모리 풋프린트를 그대로 메모리 풋프린트로 두는 게 좋을까요? 아니면 메모리 풋프린트를 메모리 사용량으로 바꾸는 것도 괜찮을까요 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

메모리 사용량으로 표기하셔도 좋을 것으로 생각됩니다!
메모리 사용량(memory footprint)으로 병기하신다면, 의미를 더 쉽게 이해할 수 있을 것 같습니다.


### 8-bit Adam

Instead of aggregating optimizer states like Adafactor, 8-bit Adam keeps the full state and quantizes it. Quantization means that it stores the state with lower precision and dequantizes it only for the optimization. This is similar to the idea behind FP16 training where using variables with lower precision saves memory.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quantize 번역을 어떻게 하는 게 좋을까요? 아래는 chatGPT한테 질문한 내용입니다!

질문

여기서 quantizes를 양자화한다라고 해석을 하니 의미가 잘 와닿지가 않아. Quantizes를 무슨 뜻으로 받아들이면 될까?

답변

"Quantizes"의 맥락에서의 의미는 데이터를 보다 적은 메모리를 사용하는 방식으로 변환하거나 압축하는 것을 의미합니다. 이 경우, "quantizes"는 옵티마이저의 상태를 더 적은 메모리를 사용하도록 변환한다는 의미로 해석될 수 있습니다. 즉, 데이터를 더 작은 비트로 표현함으로써 메모리를 절약하고 컴퓨팅 효율성을 높입니다.

이러한 접근 방식은 FP16 훈련에서의 아이디어와 유사한데, 이 경우에도 저정밀도(즉, 16비트 부동 소수점)를 사용하여 메모리 사용량을 줄이고 속도를 높이는 방법입니다.

따라서 "quantizes"를 "저정밀도로 변환한다" 혹은 "저정밀도로 표현한다" 정도로 이해하면 적절할 것 같습니다.

Copy link
Contributor

@0525hhgus 0525hhgus Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quantize는 최근 양자화하다로 많이 사용되고 있어서 그대로 번역하셔도 괜찮을 것 같습니다!
저정밀도로 변환한다는 의미에 대해 부연 설명을 덧붙여주시면, 더 이해하기 쉬운 글이 될 것으로 생각됩니다.
추가로 통용되는 용어를 확인하기 위해 국내 TTA 정보통신사전을 활용하시면 좋습니다 😄


Since it has been discovered that more parameters lead to better performance, this technique allows to increase the number of parameters by an order of magnitude without increasing training costs.

In this approach every other FFN layer is replaced with a MoE Layer which consists of many experts, with a gated function that trains each expert in a balanced way depending on the input token's position in a sequence.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experts를 특수 모듈 정도로 번역하면 어떨까요?

Copy link
Contributor

@0525hhgus 0525hhgus Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoE(Mixture of Experts)에서 "expert"는 모델 내부의 특정 부분 또는 기능에 특화된 작은 서브모델들을 의미하기 때문에, 말씀주신 특수 모듈로도 번역하셔도 좋을 것 같습니다. expert로도 사용되며 전문 모델, 전문가로도 번역되곤 하니 참고 부탁드립니다!

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this big doc! We've recently refactored this page in #23963 to be more concise and actionable.

Would you mind updating the translation? 🤗

@0525hhgus
Copy link
Contributor

번역 문서가 새롭게 업데이트 되었습니다! 아직 번역이 완료되지 않으셨다면, 문서 업데이트 부탁드립니다 😄

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you still have a couple of sections (see image below) that need to be removed to reflect the current docs! Some of the sections like "Choice of CPU" and "How to scale" have since been removed.

Screenshot 2023-08-29 at 10 16 47 AM


따라서 GPU 메모리를 절약하거나 작업을 더 빠르게 할 수 있는 몇 가지 포인트들이 있을 수 있습니다. 첫 번째 간단한 최적화인 적절한 배치 크기 선택부터 시작해 보겠습니다.

## 배치 크기 [[batch-sizes]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the content above this section has been moved and translated in #25755, so it can be removed now :)


다음은 도커 이미지를 다운로드하고 배포하는 지침을 따르면 됩니다.

## 희소성 [[sparsity]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed, there's no section on sparsity in the new docs.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants