Skip to content

Commit

Permalink
code review changes
Browse files Browse the repository at this point in the history
  • Loading branch information
deepanker13 committed Dec 13, 2023
1 parent 5acece0 commit f5572d8
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 4 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish-sdk-images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ jobs:
fail-fast: false
matrix:
include:
- component-name: train-api-training-image
dockerfile: sdk/python/kubeflow/training/training_container/Dockerfile
- component-name: train-api-hf-image
dockerfile: sdk/python/kubeflow/trainer/hf_dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN pip install --no-cache-dir -r requirements.txt

# Run storage.py when the container launches
ENTRYPOINT ["python", "hf_llm_training.py"]
ENTRYPOINT ["python", "hf_llm_training.py"]

Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def load_and_preprocess_data(dataset_dir, tokenizer):
train_data = load_dataset(dataset_dir, split="train").map(
lambda x: tokenizer(x["text"]), batched=True
)
train_data = train_data.train_test_split(shuffle=True, test_size=200)
train_data = train_data.train_test_split(shuffle=True, test_size=0.1)

try:
eval_data = load_dataset(dataset_dir, split="eval")
Expand Down

0 comments on commit f5572d8

Please sign in to comment.