Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sda download #392

Merged
merged 38 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
9134b95
Start sda-download, parse args
kostas-kou May 22, 2024
5b08556
Add sda-download in commands
kostas-kou May 22, 2024
c63307d
Get response from the download service
kostas-kou May 22, 2024
fff9e3a
Add more arguments and functions for getiing the download url and the…
kostas-kou May 23, 2024
f3d58ec
Change arguments and download multiple files
kostas-kou May 23, 2024
8d1a632
Remove prints and add comments
kostas-kou May 23, 2024
fb9aa7e
Give the file path as an argument and not only the file name
kostas-kou May 24, 2024
dbd84a1
Add progress bar for sda-download and small refactoring
kostas-kou May 24, 2024
f18ff1e
Add unit tests for the sda-download
kostas-kou May 28, 2024
5e1c8ef
Add sda-download usage instructions in readme
kostas-kou May 28, 2024
32e884a
Add mockoidc container
kostas-kou May 28, 2024
189eb98
Python script for the mockoidc
kostas-kou May 28, 2024
d6b5dba
Download and reencrypt service in compose
kostas-kou May 28, 2024
d73efaf
Add test archive file and s3cmd-admin for uploading
kostas-kou May 28, 2024
aa4fbe7
Create crypt4gh keys in setup script for testing download
kostas-kou May 28, 2024
9925f6e
Add database entries in test script
kostas-kou May 28, 2024
7261e27
Add integration test for the sda-download
kostas-kou May 29, 2024
222bc68
Move part of the test to the setup script
kostas-kou May 29, 2024
f76d0e0
Fixes from linter
kostas-kou May 29, 2024
8b36261
modify tests
kostas-kou May 30, 2024
6b261a8
update to go 1.22
aaperis May 21, 2024
e0035ab
Update go.mod
aaperis May 22, 2024
2288998
Bump github.com/neicnordic/crypt4gh from 1.10.1 to 1.12.0
dependabot[bot] May 22, 2024
a9e0f1a
Bump codecov/codecov-action from 4.4.0 to 4.4.1
dependabot[bot] May 27, 2024
40b504d
Bump github.com/aws/aws-sdk-go from 1.53.5 to 1.53.10
dependabot[bot] May 27, 2024
d836f13
Help returns exit 0 code
pahatz May 24, 2024
5b04116
Review fixes
pahatz May 27, 2024
de16d1b
Merge branch 'main' into feature/sda-download
kostas-kou May 30, 2024
41533c4
Addressed some review comments
kostas-kou May 31, 2024
cd2b13f
General check about the userid
kostas-kou Jun 3, 2024
587044f
Fix linting
kostas-kou Jun 3, 2024
9406e74
Apply suggestions from code review
kostas-kou Jun 3, 2024
3eb8057
Fixes on the review commits
kostas-kou Jun 3, 2024
d4ed233
Add extra test (using url as dataset)
kostas-kou Jun 4, 2024
dd6b3cc
Update README.md
kostas-kou Jun 14, 2024
bf36fb0
Apply suggestions from code review
kostas-kou Jun 17, 2024
940c2a2
Fix for integration test
kostas-kou Jun 20, 2024
5b32b8f
Address review comments
kostas-kou Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .github/integration/setup/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,29 @@ if [ ! -f "dummy.ega.nbis.se.pem" ]; then
chmod 644 keys/dummy.ega.nbis.se.pub dummy.ega.nbis.se.pem
fi

cp s3cmd-template.conf s3cmd.conf
output=$(python sign_jwt.py)
echo "access_token=$output" >> s3cmd.conf

# Create crypt4gh keys for testing the download service
cat << EOF > c4gh.pub.pem
-----BEGIN CRYPT4GH PUBLIC KEY-----
avFAerx0ZWuJE6fTI8S/0wv3yMo1n3SuNTV6zvKdxQc=
-----END CRYPT4GH PUBLIC KEY-----
EOF

chmod 444 c4gh.pub.pem

cat << EOF > c4gh.sec.pem
-----BEGIN CRYPT4GH ENCRYPTED PRIVATE KEY-----
YzRnaC12MQAGc2NyeXB0ABQAAAAAwAs5mVkXda50vqeYv6tbkQARY2hhY2hhMjBf
cG9seTEzMDUAPAd46aTuoVWAe+fMGl3VocCKCCWmgFUsFIHejJoWxNwy62c1L/Vc
R9haQsAPfJMLJSvUXStJ04cyZnDHSw==
-----END CRYPT4GH ENCRYPTED PRIVATE KEY-----
EOF

chmod 444 c4gh.sec.pem

# get latest image tag for s3inbox
latest_tag=$(curl -s https://api.github.com/repos/neicnordic/sensitive-data-archive/tags | jq -r '.[0].name')

Expand Down Expand Up @@ -66,4 +86,77 @@ do echo "waiting for buckets to be created"
sleep 10
done

# Populate database with for testing the download service
# Insert entry in sda.files
file_id=$(docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.files (stable_id, submission_user, \
submission_file_path, submission_file_size, archive_file_path, \
archive_file_size, decrypted_file_size, backup_path, header, \
encryption_method) VALUES ('urn:neic:001-002', 'integration-test', '5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8_elixir-europe.org/main/subfolder/dummy_data.c4gh', \
1048729, '4293c9a7-dc50-46db-b79a-27ddc0dad1c6', 1049081, 1048605, \
'', '637279707434676801000000010000006c000000000000006af1407abc74656b8913a7d323c4bfd30bf7c8ca359f74ae35357acef29dc5073799e207ec5d022b2601340585ff082565e55fbff5b6cdbbbe6b12a0d0a19ef325a219f8b62344325e22c8d26a8e82e45f053f4dcee10c0ec4bb9e466d5253f139dcd4be', 'CRYPT4GH') RETURNING id;" | xargs)

if [ -z "$file_id" ]; then
echo "Failed to insert file entry into database"
exit 1
fi

# Insert entry in sda.file_event_log
docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.file_event_log (file_id, event) \
VALUES ('$file_id', 'ready');"

# Insert entries in sda.checksums
docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.checksums (file_id, checksum, type, source) \
VALUES ('$file_id', '06bb0a514b26497b4b41b30c547ad51d059d57fb7523eb3763cfc82fdb4d8fb7', 'SHA256', 'UNENCRYPTED');"

docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.checksums (file_id, checksum, type, source) \
VALUES ('$file_id', '5e9c767958cc3f6e8d16512b8b8dcab855ad1e04e05798b86f50ef600e137578', 'SHA256', 'UPLOADED');"

docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.checksums (file_id, checksum, type, source) \
VALUES ('$file_id', '74820dbcf9d30f8ccd1ea59c17d5ec8a714aabc065ae04e46ad82fcf300a731e', 'SHA256', 'ARCHIVED');"

# Insert dataset in sda.datasets
dataset_id=$(docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.datasets (stable_id) VALUES ('https://doi.example/ty009.sfrrss/600.45asasga') \
ON CONFLICT (stable_id) DO UPDATE \
SET stable_id=excluded.stable_id RETURNING id;")

if [ -z "$dataset_id" ]; then
echo "Failed to insert dataset entry into database"
exit 1
fi

# Add file to dataset
docker run --rm --name client --network testing_default \
neicnordic/pg-client:latest \
postgresql://postgres:rootpasswd@postgres:5432/sda \
-t -q -c "INSERT INTO sda.file_dataset (file_id, dataset_id) \
VALUES ('$file_id', $dataset_id);"

# Add file to archive
s3cmd -c directS3 put archive_data/4293c9a7-dc50-46db-b79a-27ddc0dad1c6 s3://archive/4293c9a7-dc50-46db-b79a-27ddc0dad1c6

# Get the correct token form mockoidc
token=$(curl "http://localhost:8002/tokens" | jq -r '.[0]')

# Create s3cmd-download.conf file for download
cp s3cmd-template.conf s3cmd-download.conf
echo "access_token=$token" >> s3cmd-download.conf

docker ps
16 changes: 16 additions & 0 deletions .github/integration/tests/tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -321,4 +321,20 @@ fi

rm -r downloads

# Download file by using the sda download service
./sda-cli sda-download -config testing/s3cmd-download.conf -dataset https://doi.example/ty009.sfrrss/600.45asasga -url http://localhost:8080 -outdir test-download main/subfolder/dummy_data.c4gh

# check if file exists in the path
if [ ! -f "test-download/main/subfolder/dummy_data" ]; then
echo "Downloaded file not found"
exit 1
fi

# check the first line of that file
first_line=$(head -n 1 test-download/main/subfolder/dummy_data)
if [[ $first_line != *"THIS FILE IS JUST DUMMY DATA"* ]]; then
echo "First line does not contain the expected string"
exit 1
fi

kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
echo "Integration test finished successfully"
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,9 +174,14 @@ If no config is given by the user, the tool will look for a previous login from

## Download

The SDA/BP archive enables for downloading files and datasets in a secure manner. That can be achieved using the `sda-cli` tool and the process consists of the following two steps
The SDA/BP archive enables for downloading files and datasets in a secure manner. That can be achieved using the `sda-cli` tool and and it can be done in two ways:
- by downloading from a S3 bucket
- by using the download API
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved

### Create keys
### Download from S3 bucket
This process consists of the following two steps: create keys and download file and is explained in the following sections.
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved

#### Create keys

In order to make sure that the files are downloaded from the archive in a secure manner, the user is supposed to create the key pair that the files will be encrypted with. The key pair can be created using the following command:
```bash
Expand All @@ -186,7 +191,7 @@ where `<keypair_name>` is the base name of the key files. This command will crea

**NOTE:** Make sure to keep these keys safe. Losing the keys could lead to sensitive data leaks.

### Download file
#### Download file

The `sda-cli` tool allows for downloading file(s)/datasets. The URLs of the respective dataset files that are available for downloading are stored in a file named `urls_list.txt`. `sda-cli` allows to download files only by using such a file or the URL where it is stored. There are three different ways to pass the location of the file to the tool, similar to the [dataset size section](#get-dataset-size):
1. a direct URL to `urls_list.txt` or a file with a different name but containing the locations of the dataset files
Expand All @@ -204,6 +209,19 @@ The tool also allows for selecting a folder where the files will be downloaded,
```
**Note**: If needed, the user can download a selection of files from an available dataset by providing a customized `urls_list.txt` file.

### Download using the download API

The download API allows for downloading files from the archive and it requires the user to have access to the dataset, therefore a configuration file needs to be downloaded before starting the downloading of the files.
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
For downloading files the user needs to know the download service URL, the dataset ID and the path of the file. Given those four arguments files can be downloaded using the following command:
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
```bash
./sda-cli sda-download -config <configuration_file> -dataset <datasetID> -url <download-service-URL> <filepath_1_to_download> <filepath_2_to_download> ...
```
where `<configuration_file>` the file downloaded in the previous step, `<dataset_id>` the ID of the dataset and `<filepath>` the path of the file in the dataset.
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
The tool also allows for downloading multiple files at once, by listing them separated with space and it also allows for selecting a folder where the files will be downloaded, using the `outdir` argument:
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
```bash
./sda-cli sda-download -config <configuration_file> -dataset <datasetID> -url <download-service-url> -outdir <outdir> <filepath_1_to_download> <filepath_2_to_download> ...
```

## Decrypt file

Given that the instructions in the [download section](#download) have been followed, the key pair and the data files should be stored in some location. The last step is to decrypt the files in order to access their content. That can be achieved using the following command:
Expand Down
22 changes: 13 additions & 9 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"github.com/NBISweden/sda-cli/helpers"
"github.com/NBISweden/sda-cli/list"
"github.com/NBISweden/sda-cli/login"
sdaDownload "github.com/NBISweden/sda-cli/sda_download"
"github.com/NBISweden/sda-cli/upload"
"github.com/NBISweden/sda-cli/version"
log "github.com/sirupsen/logrus"
Expand All @@ -34,15 +35,16 @@ type commandInfo struct {
}

var Commands = map[string]commandInfo{
"encrypt": {encrypt.Args, encrypt.Usage, encrypt.ArgHelp},
"createKey": {createKey.Args, createKey.Usage, createKey.ArgHelp},
"decrypt": {decrypt.Args, decrypt.Usage, decrypt.ArgHelp},
"download": {download.Args, download.Usage, download.ArgHelp},
"upload": {upload.Args, upload.Usage, upload.ArgHelp},
"datasetsize": {datasetsize.Args, datasetsize.Usage, datasetsize.ArgHelp},
"list": {list.Args, list.Usage, list.ArgHelp},
"login": {login.Args, login.Usage, login.ArgHelp},
"version": {version.Args, version.Usage, version.ArgHelp},
"encrypt": {encrypt.Args, encrypt.Usage, encrypt.ArgHelp},
"createKey": {createKey.Args, createKey.Usage, createKey.ArgHelp},
"decrypt": {decrypt.Args, decrypt.Usage, decrypt.ArgHelp},
"download": {download.Args, download.Usage, download.ArgHelp},
"upload": {upload.Args, upload.Usage, upload.ArgHelp},
"datasetsize": {datasetsize.Args, datasetsize.Usage, datasetsize.ArgHelp},
"list": {list.Args, list.Usage, list.ArgHelp},
"login": {login.Args, login.Usage, login.ArgHelp},
"sda-download": {sdaDownload.Args, sdaDownload.Usage, sdaDownload.ArgHelp},
"version": {version.Args, version.Usage, version.ArgHelp},
}

// Main does argument parsing, then delegates to one of the sub modules
Expand Down Expand Up @@ -70,6 +72,8 @@ func main() {
err = list.List(args)
case "login":
err = login.NewLogin(args)
case "sda-download":
err = sdaDownload.SdaDownload(args)
kostas-kou marked this conversation as resolved.
Show resolved Hide resolved
case "version":
err = version.Version(Version)
default:
Expand Down
Loading
Loading