-
Notifications
You must be signed in to change notification settings - Fork 4
Frictionless Data
Liz Dobbins edited this page Mar 31, 2024
·
2 revisions
Frictionless data is a way of documenting data sets so they can be accessed with code more easily. The project includes APIs for Python and R, and a few on-line tools to help generate the files. They are in the midst of creating version 2 of their schema, which shows the vigor of the community.
Here are some users whose pages will be investigated more:
- Validator for the schema: https://create.frictionlessdata.io
- Catalyst uses the frictionless tabular schema to document their PUDL data set
- Catalyst's GitHub repo searched for frictionless.
- On Zenodo, Catalyst's [ferc1_xbrl_datapackage.json](https://zenodo.org/records/10708669/files/ferc1_xbrl_datapackage.json?download=1_ = so many resources!
- Kaggle uses frictionless
Here is an example of the table schema:
{
"profile": "tabular-data-package",
"resources": [
{
"name": "resource1",
"path": "acep_regions.csv",
"profile": "tabular-data-resource",
"schema": {
"fields": [
{
"name": "acep_region_id",
"type": "integer",
"format": "default"
},
{
"name": "acep_region_name",
"type": "string",
"format": "default"
}
]
}
}
],
"licenses": [
{
"name": "CC-BY-4.0",
"title": "Creative Commons Attribution 4.0",
"path": "https://creativecommons.org/licenses/by/4.0/"
}
]
}
Bits of Catalyst's FERC schema (really long but has no license)
{
"profile": "tabular-data-package",
"name": "ferc1-extracted-xbrl",
"title": "Ferc1 data extracted from XBRL filings",
"resources": [
{
"path": "sqlite:////home/mambauser/pudl_work/output/ferc1_xbrl.sqlite",
"profile": "tabular-data-resource",
"name": "corporate_officer_certification_001_duration",
"dialect": {
"table": "corporate_officer_certification_001_duration"
},
"title": "001 - Schedule - Corporate Officer Certification - duration",
"description": "ferc:ScheduleIdentificationAbstract",
"format": "sqlite",
"mediatype": "application/vnd.sqlite3",
"schema": {
"fields": [
{
"name": "entity_id",
"title": "Entity Identifier",
"type": "string",
"format": "default",
"description": "Unique identifier of respondent"
},
{
"name": "filing_name",
"title": "Filing Name",
"type": "string",
"format": "default",
"description": "Name of filing"
...
"primary_key": [
"entity_id",
"filing_name",
"publication_time",
"start_date",
"end_date"
]
}
},
{
"path": "sqlite:////home/mambauser/pudl_work/output/ferc1_xbrl.sqlite",
"profile": "tabular-data-resource",
"name": "corporate_officer_certification_001_instant",
"dialect": {
"table": "corporate_officer_certification_001_instant"
},
"title": "001 - Schedule - Corporate Officer Certification - instant",
"description": "ferc:ScheduleIdentificationAbstract",
"format": "sqlite",
"mediatype": "application/vnd.sqlite3",
"schema": {
"fields": [
{
"name": "entity_id",
"title": "Entity Identifier",
"type": "string",
"format": "default",
"description": "Unique identifier of respondent"
...