Implementor Guide #438

rufuspollock · 2017-05-27T05:53:15Z

This is an epic for creating a detailed guide for implementors of the specs. The guide would be the main place for implementors to understand the nuances of the specs and how best to proceed.

User Stories

As a Developer creating and implementation I want a guide in one place on implementation to complement the bare specs so that I understand the nuances of the specs and how best to proceed.

Suggested API, existing implementations to look at
FAQs e.g. how to deal with INF, -INF if your language does not support it
Backwards compatibility (pre-v1 support)

As a Developer looking to create an implementation I want to know what the interface of my library should look like so that I can design it well and consistently with libraries in other languages

As a Developer looking to use one of the existing Data Package libraries i want to get up and running as quickly as possible so that I can be very productive

I want to quick walk through in my language of doing the simple things quickly

Acceptance criteria

Guide on either fd.io or specs.fd.io
FAQs

Tasks

Interface design / code walkthrough
- python code for walkthrough
- javascript code for walkthrough
Collect FAQ items
- -INF, INF Disallow NaN, INF, -INF and enum for number type? #393
- resource path concatenation
Backwards compatibility
- resource.url
- fmt and datetime changes Add an implementer note about fmt: support #436

Analysis

Existing Work on a Guide

Stack reference - https://github.com/frictionlessdata/stack#frictionless-data-stack

For implementors - http://specs.frictionlessdata.io/implementation/

Neither of these provide simple examples.

Backwards Compatibility

There is an implementer note about resource.url support in Table Resource spec. So I suppose the same we could do for Table Schema: date/time/datetime and fmt: - I suppose temporal format is the most often mistakes I saw. And implementations better to support prev version too.

FAQs

Dereferencing schemas before validation: Also should we have a note here about dereferencing? It means that if schema is url-or-path that it should be dereferenced before descriptor validation (other question that the spec doesn’t touch concept of validation at all).

Interface design and code walkthrough

Walkthrough - simple elegant code for each of these (could do as one big code block or as separate ones -- maybe easier together as you can reuse earlier pages)

Creating / loading a data package from a url, path and getting the data
- Tabular and Non-Tabular (e.g. GeoJSON)
- Bonus: ?? inline data (create from descriptor with inline data)
Create a data package (from descriptor or null) and set properties and then save
Create a data package with CSV and guess schema and then save ...
Load a Tabular Data Package and then save to a database ...

We recommend library implementors support an interface similar to the following:

dp_identifier = file_path or url
var dp = new DataPackage(dp_identifier)
dp.descriptor.title
// raw bit stream for this resource
var resource = dp.resources[0]
var datastream = resource.dataraw()

// do we have this default to table or have option below
var datastream = resource.data()

var resource.parent_package() // does that work or not?

if resource.profile == "tabular":
    // table is a TabularResource
    var table = resource.table() // .table or .table()
    for row in table:
        print row

Python

# $ pip install datapackage==1.0.0a4
from datapackage import DataPackage

# With datapackage-v1 [WIP]:
# - validate and update logic should by synced with JavaScript version
# - added function like add/get/remote_resource etc

# Remote tabular
dataPackage = DataPackage('https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/datapackage.json')
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Local tabular
dataPackage = DataPackage('datapackage/datapackage.json')
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Local tabular
dataPackage = DataPackage({
    'name': "datapackage",
    'resources': [
        {
            'name': "data",
            'path': ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
            'profile': "tabular-data-resource",
            'dialect': {
                'quoteChar': "|"
            },
            'schema': {
                'fields': [
                    {
                        'name': "id",
                        'type': "integer"
                    },
                    {
                        'name': "city",
                        'type': "string"
                    }
                ]
                }
            }
        ]
    })
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Create from scratch and update datapackage
dataPackage = DataPackage({})
dataPackage.descriptor['name'] = 'datapackage'
dataPackage.descriptor['description'] = 'Good data package'
dataPackage.descriptor['resources'] = [{
  'name': 'cities',
  'profile': 'tabular-data-resource',
  'path': ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
  'schema': {
    'fields': [
    {
      'name': "id",
      'type': "integer"
    },
    {
      'name': "city",
      'type': "string"
    }
    ]
  }
}]
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Non-tabular datapackage
dataPackage = DataPackage({
    'name': 'geojson',
    'resources': [
      {
        'name': 'point',
        'data': {
          "type": "Feature",
          "geometry": {
            "type": "Point",
            "coordinates": [125.6, 10.1]
          },
          "properties": {
            "name": "Dinagat Islands"
          }
        }
      }
    ]
})
print(dataPackage.resources[0].source['type']) # Feature

# Load a Tabular Data Package and then save to a database
# This API is WIP - https://github.com/frictionlessdata/datapackage-py/issues/132

JavaScript

// ES6 with async/await
// $ npm install datapackage@latest
// $ node7 --harmony-async-await example.js
const DataPackage = require('datapackage').DataPackage

// With tableschema-v1 [WIP]:
// - no need to await resource.table
// - resource.table.read({keyed: true})
// https://github.com/frictionlessdata/tableschema-js/pull/69


// Remote tabular
async function example1() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load('https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/datapackage.json')
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Local tabular
async function example2() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load('datapackage/datapackage.json')
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Inline tabular
async function example3() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load({
      name: "datapackage",
      resources: [
        {
          name: "data",
          path: ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
          profile: "tabular-data-resource",
          dialect: {
            quoteChar: "|"
          },
          schema: {
            fields: [
            {
              name: "id",
              type: "integer"
            },
            {
              name: "city",
              type: "string"
            }
            ]
          }
        }
      ]
    })
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Create from scratch and update datapackage
async function example4() {

    // In non strict mode we could provide not valid descriptor
    const dataPackage = await DataPackage.load({resources: []}, {strict: false})
    dataPackage.descriptor.name = 'datapackage'
    dataPackage.descriptor.description = 'Good data package'
    dataPackage.update()
    dataPackage.addResource({
      name: 'cities',
      profile: 'tabular-data-resource',
      path: ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
      schema: {
        fields: [
        {
          name: "id",
          type: "integer"
        },
        {
          name: "city",
          type: "string"
        }
        ]
      }
    })
    // Check for errors if updated descriptor is not valid
    if (!dataPackage.valid) {
      for (let error of dataPackage.errors) {
        console.log(error)
      }
    }
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Non-tabular datapackage
// TODO: it doesn't work because of pending path/data change in spec
async function example5() {

  // Load will throw an error on invalid descriptor
  const dataPackage = await DataPackage.load({
    name: 'geojson',
    resources: [
      {
        name: 'point',
        data: {
          "type": "Feature",
          "geometry": {
            "type": "Point",
            "coordinates": [125.6, 10.1]
          },
          "properties": {
            "name": "Dinagat Islands"
          }
        }
      }
    ]
  })
  console.log(dataPackage.resources[0].source.type) // Feature

}


example1()
example2()
example3()
example4()
example5()

The text was updated successfully, but these errors were encountered:

pwalsh · 2017-05-29T06:45:48Z

@rufuspollock

we have an implementors guide here http://specs.frictionlessdata.io/implementation/

rufuspollock · 2017-05-30T16:07:01Z

@pwalsh yes and I read it before I wrote this issue 😉 - this issue reflects the things I think are missing from that implementors guide. I could be mistaken on this so please comment against the items in the description 😄 (and add items).

This was referenced May 28, 2017

Add an implementer note about fmt: support #436

Closed

FAQ or "rationale" page? #395

Closed

pwalsh added this to the Backlog milestone May 29, 2017

rufuspollock mentioned this issue May 29, 2017

TableSchema v1: missingValues + required clarifications #446

Closed

rufuspollock mentioned this issue Jun 21, 2017

Possible lack of clarity around items that can be url, path, or object #433

Closed

roll added this to Open Knowledge Apr 14, 2023

roll removed this from the Backlog milestone Apr 14, 2023

roll added the docs label Jan 3, 2024

frictionlessdata locked and limited conversation to collaborators Oct 21, 2024

roll converted this issue into discussion #1038 Oct 21, 2024

github-project-automation bot moved this to Done in Open Knowledge Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Implementor Guide #438

Implementor Guide #438

rufuspollock commented May 27, 2017 •

edited

Loading

pwalsh commented May 29, 2017

rufuspollock commented May 30, 2017 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Implementor Guide #438

Implementor Guide #438

Comments

rufuspollock commented May 27, 2017 • edited Loading

User Stories

Acceptance criteria

Tasks

Analysis

Existing Work on a Guide

Backwards Compatibility

FAQs

Interface design and code walkthrough

Python

JavaScript

pwalsh commented May 29, 2017

rufuspollock commented May 30, 2017 • edited Loading

This issue was moved to a discussion.

rufuspollock commented May 27, 2017 •

edited

Loading

rufuspollock commented May 30, 2017 •

edited

Loading