Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mrueda committed Jul 9, 2024
1 parent 490f2e9 commit e87cfff
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 9 deletions.
3 changes: 2 additions & 1 deletion Changes
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ Revision history for Perl distribution Convert-Pheno

0.22 2024-0X-XXT00:00:00Z (Manuel Rueda <mrueda@cpan.org>)

- Reduced memory usage in -stream mode by emptying CONCEPT, PERSON and VISIT_OCCURRENCE during AoH -> HoH step
- Reduced memory usage in -iomop -stream by emptying CONCEPT, PERSON and VISIT_OCCURRENCE during AoH -> HoH step
- Reduced memory usage in -iomop -no-stream by avoiding data duplication during transposition

0.21 2024-06-01T00:00:00Z (Manuel Rueda <mrueda@cpan.org>)

Expand Down
8 changes: 4 additions & 4 deletions docs/omop-cdm.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ The **OMOP CDM** is designed to be database-agnostic, which means it can be impl
Number of rows | Estimated RAM memory | Estimated time
:---: | :---: | :---:
100K | 1GB | 5s
500K | 2.5GB | 15s
1M | 5GB | 30s
2M | 10GB | 1m
500K | 2GB | 15s
1M | 4GB | 30s
2M | 8GB | 1m

1 x Intel(R) Xeon(R) W-1350P @ 4.00GHz - 32GB RAM - SSD

Expand Down Expand Up @@ -129,7 +129,7 @@ The **OMOP CDM** is designed to be database-agnostic, which means it can be impl

1 x Intel(R) Xeon(R) W-1350P @ 4.00GHz - 32GB RAM - SSD

Note that the output JSON files generated in `--stream` mode will always include information from both the `PERSON` and `CONCEPT` tables. This is not a mandatory requirement, but it serves to facilitate subsequent [validation of the data against JSON schemas](https://github.com/EGA-archive/beacon2-ri-tools/tree/main/utils/bff_validator). In terms of the JSON Schema terminology, these files contain `required` properties for [BFF](bff.md) and [PXF](pxf.md).
Note that the output JSON files generated in `--stream` mode will always include information from the `PERSON` and `CONCEPT` tables. Therefore, **both tables must be loaded into RAM** (along with `VISIT_OCCURRENCE` if present). **The size of these tables will obviously impact RAM usage**. Although having this information is not a mandatory requirement for _MongoDB_, it helps in validating the data against Beacon v2 JSON schemas. According to JSON Schema terminology, these files contain `required` properties for [BFF](bff.md) and [PXF](pxf.md). For more details on validation, refer to the [BFF Validator](https://github.com/EGA-archive/beacon2-ri-tools/tree/main/utils/bff_validator).

??? Tip "About parallelization and speed"
`Convert-Pheno` has been optimized for speed, and, in general the CLI results are generated almost immediatly. For instance, all tests with synthetic data take less than a second or a few seconds to complete. It should be noted that the speed of the results depends on the performance of the CPU and disk speed. When `Convert-Pheno` has to retrieve ontologies from a database to annotate the data, the processing takes longer.
Expand Down
4 changes: 2 additions & 2 deletions lib/Convert/Pheno.pm
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,7 @@ sub omop2bff {
$self->{person} = convert_table_aoh_to_hoh( $data, 'PERSON' ); # Dynamically adding attributes (setter)
}

# We transpose $self->{data}{VISIT_OCCURRENCE} if present
# We convert $self->{data}{VISIT_OCCURRENCE} if present
if ( exists $data->{VISIT_OCCURRENCE} ) {
print
"Transforming <VISIT_OCCURRENCE> from array to lookup table...\n\n"
Expand All @@ -485,7 +485,7 @@ sub omop2bff {
# NB: Transformation is due ONLY IN $omop_main_table FIELDS, the rest of the tables are not used
# The transformation is performed in --no-stream mode
$self->{data} =
$self->{stream} ? $data : transpose_omop_data_structure($data); # Dynamically adding attributes (setter)
$self->{stream} ? $data : transpose_omop_data_structure($self, $data); # Dynamically adding attributes (setter)

# Giving some memory back to the system
$data = undef;
Expand Down
5 changes: 3 additions & 2 deletions lib/Convert/Pheno/IO/FileIO.pm
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,9 @@ sub write_json {
my $arg = shift;
my $file = $arg->{filepath};
my $json_data = $arg->{data};
my $json = JSON::XS->new->utf8->canonical->pretty->encode($json_data); # utf-8
path($file)->spew($json); # already need utf-8
my $json =
JSON::XS->new->utf8->canonical->pretty->encode($json_data); # utf-8
path($file)->spew($json); # already need utf-8
return 1;
}

Expand Down

0 comments on commit e87cfff

Please sign in to comment.