-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(rule): NON_UNIQUE_ENTITY_VALUE rule implementation
Closes #197
- Loading branch information
Showing
11 changed files
with
376 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# NON_UNIQUE_ENTITY_VALUE | ||
|
||
## Rule test folder | ||
|
||
`test/fixtures/rules-cases/non-unique-entity-value` | ||
|
||
## Description | ||
An issue according to this rule will be fired when id value entity (under particular kind of entity, geo-country, for example) is not unique. | ||
|
||
## Examples of correct data | ||
|
||
`ddf--entities--geo--income_groups.csv` | ||
``` | ||
income_groups,name,gwid,is--income_groups | ||
high_income,High income,i268,TRUE | ||
lower_middle_income,Lower middle income,i269,TRUE | ||
low_income,Low income,i266,TRUE | ||
upper_middle_income,Upper middle income,i267,TRUE | ||
``` | ||
|
||
## Examples of incorrect data | ||
|
||
`ddf--entities--geo--income_groups.csv` | ||
``` | ||
income_groups,name,gwid,is--income_groups | ||
high_income,High income,i268,TRUE | ||
high_income,High income,i268,TRUE | ||
lower_middle_income,Lower middle income,i269,TRUE | ||
low_income,Low income,i266,TRUE | ||
upper_middle_income,Upper middle income,i267,TRUE | ||
``` | ||
|
||
## Output data format | ||
|
||
* `value` - duplicated entity id value | ||
|
||
## Extra information | ||
|
||
### @jheeffer | ||
|
||
ddf--entities--geo--country.csv | ||
``` | ||
geo name is--country | ||
swe Sweden 1 | ||
``` | ||
|
||
ddf--entities--geo--un_state.csv | ||
``` | ||
geo name un_membership_year is--un_state | ||
swe Sweden 1946 1 | ||
``` | ||
|
||
The above is valid but should give a warning because `geo.name` of `swe` is defined twice, but is equal so there's no error. | ||
I am not sure about the warning here. On the one hand, it is good to try to limit the amount of redundant data in a dataset. Warnings could help with spotting these redundancies. | ||
On the other hand, this could lead to maaaaany useless warnings. Maybe they should be per file: "`geo.name` is defined in both `ddf--entities--geo--country.csv` and `ddf--entities--geo--un_state.csv` and causes redundancy" instead of a per-entity warning. However, this is a more complex validation I think. It's possible for `geo.name` to be in multiple files without causing overlap/redundancy. Plus, duplicating `geo.name` over files could be useful for overview, so maybe a warning is not always in place? What do you think? | ||
|
||
--------------- | ||
|
||
ddf--entities--geo--country.csv | ||
``` | ||
geo name is--country | ||
swe Sweden 1 | ||
``` | ||
|
||
ddf--entities--geo--un_state.csv | ||
``` | ||
geo name un_membership_year is--un_state | ||
swe Kingdom of Sweden 1946 1 | ||
``` | ||
|
||
The above is invalid and should throw an error because `geo.name` for `swe` has two different values and thus there's a conflict for the value of `geo.name` for `swe`. | ||
|
||
### @jheeffer | ||
|
||
Also I'm fine with error'ing on duplicate ID in one file, as under your first case. | ||
|
||
ddf--entities--geo--country.csv | ||
``` | ||
geo name is--country | ||
swe Sweden 1 | ||
ukr Ukraine 1 | ||
swe Sweden 1 | ||
``` | ||
Invalid because of duplicate `swe` entity in one file, even though properties are all the same. | ||
|
||
### @buchslava | ||
|
||
yes I understand this idea: I'll get keys intersection for two records, for example, | ||
|
||
for `name is--country` and `name un_membership_year is--un_state` intersection will be `name` | ||
|
||
and after I'll analyze values for those fields (`name`). if they are equal - ok, else - error |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 3 additions & 0 deletions
3
test/fixtures/rules-cases/non-unique-entity-value/ddf--concepts--measures.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
concept,concept_type,domain,name | ||
lat,measure,,Latitude | ||
lng,measure,,Longitude |
11 changes: 11 additions & 0 deletions
11
test/fixtures/rules-cases/non-unique-entity-value/ddf--concepts.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
concept,concept_type,domain,name,drill_up | ||
name,string,,, | ||
drill_up,string,,, | ||
geo,entity_domain,,, | ||
tag,entity_domain,,, | ||
domain,string,,Domain, | ||
region,entity_set,geo,Region,"[""country"",""capital""]" | ||
country,entity_set,geo,Country, | ||
capital,entity_set,geo,Capital, | ||
pop,measure,geo,Population, | ||
year,time,,year, |
3 changes: 3 additions & 0 deletions
3
.../fixtures/rules-cases/non-unique-entity-value/ddf--datapoints--pop--by--country--year.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
country,year,pop | ||
vat,1960,100000 | ||
vat,1970,"1,100.196" |
3 changes: 3 additions & 0 deletions
3
test/fixtures/rules-cases/non-unique-entity-value/ddf--entities--geo--capital.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
geo,name,lat,lng,is--region,is--country,is--capital | ||
vat,Vatican2,,,0,1,1 | ||
and,Andorra,,,0,1,0 |
10 changes: 10 additions & 0 deletions
10
test/fixtures/rules-cases/non-unique-entity-value/ddf--entities--geo--country.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
geo,name,lat,lng,is--region,is--country,is--capital | ||
and,Andorra,,,0,1,0 | ||
afg,Afghanistan,,,0,1,0 | ||
afg,Afghanistan2,,,0,1,0 | ||
dza,Algeria,,,0,1,0 | ||
africa,Africa,,,1,0,0 | ||
europe,Europe,,,1,0,0 | ||
americas,Americas,,,1,0,0 | ||
asia,Asia,,,1,0,0 | ||
vat,Vatican,,,0,1,1 |
Oops, something went wrong.