Skip to content

Commit

Permalink
Configure remote storage
Browse files Browse the repository at this point in the history
  • Loading branch information
truskovskiyk committed Jul 1, 2024
1 parent 3bb8922 commit 2d51505
Show file tree
Hide file tree
Showing 12 changed files with 17 additions and 21 deletions.
24 changes: 15 additions & 9 deletions module-2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,22 +167,31 @@ git commit -m "Initialize DVC"

Add data

```
```bash
mkdir data
touch ./data/big-data.csv
```

Add to dvc

```
```bash
dvc add ./data/big-data.csv
git add data/.gitignore data/big-data.csv.dvc
git commit -m "Add raw data"
```

Add remote

You can use Minio via AWS CLI

```bash
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_ENDPOINT_URL=http://127.0.0.1:9000
```


```bash
aws s3api create-bucket --bucket ml-data

dvc remote add -d minio s3://ml-data
Expand All @@ -191,24 +200,21 @@ dvc remote modify minio endpointurl $AWS_ENDPOINT_URL

Save code to git

```
```bash
git add .dvc/config
git commit -m "Configure remote storage"
git push
```

Save data to storage

```
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
```bash
dvc push
```

- <https://dvc.org/doc/start/data-management>
- <https://github.com/iterative/dataset-registry>


## Labeling with Argilla

```bash
Expand All @@ -228,5 +234,5 @@ python ./labeling/create_dataset.py
Create synthetic dataset:

```bash
docker run -it --rm --name argilla -p 6900:6900 argilla/argilla-quickstart:v2.0.0rc1
```
python ./labeling/create_dataset_synthetic.py
```
3 changes: 0 additions & 3 deletions module-2/dvc_test/.dvc/.gitignore

This file was deleted.

Empty file removed module-2/dvc_test/.dvc/config
Empty file.
3 changes: 0 additions & 3 deletions module-2/dvc_test/.dvcignore

This file was deleted.

Empty file.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Binary file removed module-2/dvc_test/mydata/label_studio.sqlite3
Binary file not shown.

This file was deleted.

3 changes: 2 additions & 1 deletion module-2/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ datasets==2.20.0
sentence_transformers==3.0.1
lancedb==0.9.0
argilla==2.0.0rc1
dvc==3.51.2
dvc==3.51.2
dvc_s3==3.2.0

0 comments on commit 2d51505

Please sign in to comment.