Skip to content

Commit

Permalink
Merge pull request #364 from InseeFr/master
Browse files Browse the repository at this point in the history
Update docs (#362)
  • Loading branch information
NicoLaval authored Oct 26, 2024
2 parents 3941d02 + f676449 commit a8d23a4
Show file tree
Hide file tree
Showing 51 changed files with 5,129 additions and 572 deletions.
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Transformation engine and validator for statistics.
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)

Trevas is a Java engine for the Validation and Transformation Language (VTL), an [SDMX standard](https://sdmx.org/?page_id=5096) that allows the formal definition of algorithms to validate statistical data and calculate derived data. VTL is user oriented and provides a technology-neutral and standard view of statistical processes at the business level. Trevas supports the latest VTL version (v2.0, July 2020).
Trevas is a Java engine for the Validation and Transformation Language (VTL), an [SDMX standard](https://sdmx.org/?page_id=5096) that allows the formal definition of algorithms to validate statistical data and calculate derived data. VTL is user oriented and provides a technology-neutral and standard view of statistical processes at the business level. Trevas supports the latest VTL version (v2.1, July 2024).

For actual execution, VTL expressions need to be translated to the target runtime environment. Trevas provides this step for the Java platform, by using the VTL formal grammar and the [Antlr](https://www.antlr.org/) tool. For a given execution, Trevas receives the VTL expression and the data bindings that associate variable names in the expression to actual data sets. The execution results can then be retrieved from the bindings for further treatments.

Expand All @@ -32,6 +32,12 @@ Open JDK 8+ is required.

## References

<p align="center">
<img width="100px" src="./docs/static/img/sdmx-logo.svg" />
</p>

Trevas is listed among the [SDMX](https://sdmx.org/?page_id=4500) tools.

<p align="center">
<img width="100px" src="./docs/static/img/sdmx-io-logo.svg" />
</p>
Expand All @@ -42,4 +48,4 @@ Trevas is part of the [sdmx.io](https://www.sdmx.io/) ecosystem.
<img src="https://awesome.re/mentioned-badge.svg" />
</p>

Trevas is referencing by [_Awesome official statistics software_](https://github.com/SNStatComp/awesome-official-statistics-software)
Trevas is referenced by [_Awesome official statistics software_](https://github.com/SNStatComp/awesome-official-statistics-software)
239 changes: 239 additions & 0 deletions docs/blog/2024-10-07-trevas-provenance.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
---
slug: /trevas-provenance
title: Trevas - Provenance
authors: [nicolas]
tags: [Trevas, provenance, SDTH]
---

import useBaseUrl from '@docusaurus/useBaseUrl';
import Link from '@theme/Link';

### News

Trevas 1.6.0 introduces the VTL Prov module.

This module enables to produce lineage metadata from Trevas, based on RDF ontologies: `PROV-O` and `SDTH`.

#### SDTH model overview

```mermaid
classDiagram
class Program["sdth:Program"] {
rdfs:label
}
class ProgramStep["sdth:ProgramStep"] {
rdfs:label
sdth:hasSourceCode
sdth:hasSDTL
}
class VariableInstance["sdth:VariableInstance"] {
rdfs:label
sdth:hasName
}
class DataframeInstance["sdth:DataframeInstance"] {
rdfs:label
sdth:hasName
}
class FileInstance["sdth:FileInstance"] {
rdfs:label
sdth:hasName
}
ProgramStep <-- Program : sdthhasProgramStep
ProgramStep <-- ProgramStep : sdth_hasProgramStep
ProgramStep --> VariableInstance : sdth_usesVariable
ProgramStep --> VariableInstance : sdth_assignsVariable
ProgramStep --> DataframeInstance : sdth_consumesDataframe
ProgramStep --> DataframeInstance : sdth_producesDataframe
ProgramStep --> FileInstance : sdth_loadsFile
ProgramStep --> FileInstance : sdth_savesFile
DataframeInstance --> VariableInstance : sdth_hasVariableInstance
FileInstance --> VariableInstance : sdth_hasVariableInstance
DataframeInstance --> DataframeInstance : sdth_derivedFrom
DataframeInstance --> DataframeInstance : sdth_elaborationOf
FileInstance --> FileInstance : sdth_derivedFrom
FileInstance --> FileInstance : sdth_elaborationOf
VariableInstance --> VariableInstance : sdth_derivedFrom
VariableInstance --> VariableInstance : sdth_elaborationOf
```

#### Adopted model

The `vtl-prov` module, version 1.6.0, uses the following partial model:

```mermaid
classDiagram
class Agent {
}
class Program {
rdfs:label
}
class ProgramStep {
rdfs:label
}
class VariableInstance {
rdfs:label
sdth:hasName
}
class DataframeInstance {
rdfs:label
sdth:hasName
}
Agent <|-- Program
ProgramStep <-- Program : sdth_hasProgramStep
ProgramStep --> VariableInstance : sdth_usesVariable
ProgramStep --> VariableInstance : sdth_assignsVariable
ProgramStep --> DataframeInstance : sdth_consumesDataframe
ProgramStep --> DataframeInstance : sdth_producesDataframe
DataframeInstance --> VariableInstance : sdth_hasVariableInstance
DataframeInstance --> DataframeInstance : sdth_wasDerivedFrom
VariableInstance --> VariableInstance : sdth_wasDerivedFrom
```

Improvements will come in next weeks.

#### Tools available

Provenance Trevas tools are documented <Link label={"here"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')} />.

#### Example

##### Business use case

Two sources datasets are transformed to produce transient datasets and a final permanent one.

```mermaid
flowchart TD
OP1{add +}
OP2{multiply *}
OP3{filter}
OP4{create variable}
SC3([3])
ds_1 --> OP1
ds_2 --> OP1
OP1 --> ds_sum
SC3 --> OP2
ds_sum --> OP2
OP2 --> ds_mul
ds_mul --> OP3
OP3 --> OP4
OP4 --> ds_res
```

### Inputs

`ds1` & `ds2` metadata:

| id | var1 | var2 |
| :--------: | :-----: | :-----: |
| STRING | INTEGER | NUMBER |
| IDENTIFIER | MEASURE | MEASURE |

### VTL script

```vtl
ds_sum := ds1 + ds2;
ds_mul := ds_sum * 3;
ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];
```

### RDF model target

```ttl
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX sdth: <http://rdf-vocabulary.ddialliance.org/sdth#>
# --- Program and steps
<http://example.com/program1> a sdth:Program ;
a prov:Agent ; # Agent? Or an activity
rdfs:label "My program 1"@en, "Mon programme 1"@fr ;
sdth:hasProgramStep <http://example.com/program1/program-step1>,
<http://example.com/program1/program-step2>,
<http://example.com/program1/program-step3> .
<http://example.com/program1/program-step1> a sdth:ProgramStep ;
rdfs:label "Program step 1"@en, "Étape 1"@fr ;
sdth:hasSourceCode "ds_sum := ds1 + ds2;" ;
sdth:consumesDataframe <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:producesDataframe <http://example.com/dataset/ds_sum> .
<http://example.com/program1/program-step2> a sdth:ProgramStep ;
rdfs:label "Program step 2"@en, "Étape 2"@fr ;
sdth:hasSourceCode "ds_mul := ds_sum * 3;" ;
sdth:consumesDataframe <http://example.com/dataset/ds_sum> ;
sdth:producesDataframe <http://example.com/dataset/ds_mul> .
<http://example.com/program1/program-step3> a sdth:ProgramStep ;
rdfs:label "Program step 3"@en, "Étape 3"@fr ;
sdth:hasSourceCode "ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];" ;
sdth:consumesDataframe <http://example.com/dataset/ds_mul> ;
sdth:producesDataframe <http://example.com/dataset/ds_res> ;
sdth:usesVariable <http://example.com/variable/var1>,
<http://example.com/variable/var2> ;
sdth:assignsVariable <http://example.com/variable/var_sum> .
# --- Variables
# i think here it's not instances but names we refer to...
<http://example.com/variable/id1> a sdth:VariableInstance ;
rdfs:label "id1" .
<http://example.com/variable/var1> a sdth:VariableInstance ;
rdfs:label "var1" .
<http://example.com/variable/var2> a sdth:VariableInstance ;
rdfs:label "var2" .
<http://example.com/variable/var_sum> a sdth:VariableInstance ;
rdfs:label "var_sum" .
# --- Data frames
<http://example.com/dataset/ds1> a sdth:DataframeInstance ;
rdfs:label "ds1" ;
sdth:hasName "ds1" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds2> a sdth:DataframeInstance ;
rdfs:label "ds2" ;
sdth:hasName "ds2" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_sum> a sdth:DataframeInstance ;
rdfs:label "ds_sum" ;
sdth:hasName "ds_sum" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_mul> a sdth:DataframeInstance ;
rdfs:label "ds_mul" ;
sdth:hasName "ds_mul" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_sum> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_res> a sdth:DataframeInstance ;
rdfs:label "ds_res" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_mul> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2>,
<http://example.com/variable/var_sum> .
```
28 changes: 28 additions & 0 deletions docs/blog/2024-10-09-trevas-vtl-21.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
slug: /trevas-vtl-21
title: Trevas - VTL 2.1
authors: [nicolas]
tags: [Trevas, 'VTL 2.1']
---

import useBaseUrl from '@docusaurus/useBaseUrl';
import Link from '@theme/Link';

Trevas 1.7.0 upgrade to version 2.1 of VTL.

This version introduces two new operators:

- `random`
- `case`

`random` produces a decimal number between 0 and 1.

`case` allows for clearer multi conditional branching, for example:

`ds2 := ds1[ calc c := case when r < 0.2 then "Low" when r > 0.8 then "High" else "Medium" ]`

Both operators are already available in Trevas!

The new grammar also provides time operators and includes corrections, without any breaking changes compared to the 2.0 version.

See the <Link label={"coverage"} href={useBaseUrl('/user-guide/coverage')} /> section for more details.
2 changes: 1 addition & 1 deletion docs/docs/developer-guide/basic-mode/data-sources/jdbc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ custom_edit_url: null
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-jdbc</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/developer-guide/basic-mode/data-sources/json.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ custom_edit_url: null
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-jackson</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down
10 changes: 9 additions & 1 deletion docs/docs/developer-guide/index-developer-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import Card from '@theme/Card';
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-engine</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down Expand Up @@ -64,3 +64,11 @@ PersistentDataset result = (PersistentDataset) engine.getBindings(ScriptContext.
<Card title="Spark mode" page={useBaseUrl('/developer-guide/spark-mode')} />
</div>
</div>

### Provenance

<div className="row">
<div className="col">
<Card title="Provenance" page={useBaseUrl('/developer-guide/provenance')} />
</div>
</div>
Loading

0 comments on commit a8d23a4

Please sign in to comment.