Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementor Guide #438

Closed
2 of 7 tasks
rufuspollock opened this issue May 27, 2017 · 2 comments
Closed
2 of 7 tasks

Implementor Guide #438

rufuspollock opened this issue May 27, 2017 · 2 comments
Labels

Comments

@rufuspollock
Copy link
Contributor

rufuspollock commented May 27, 2017

This is an epic for creating a detailed guide for implementors of the specs. The guide would be the main place for implementors to understand the nuances of the specs and how best to proceed.

User Stories

As a Developer creating and implementation I want a guide in one place on implementation to complement the bare specs so that I understand the nuances of the specs and how best to proceed.

  • Suggested API, existing implementations to look at
  • FAQs e.g. how to deal with INF, -INF if your language does not support it
  • Backwards compatibility (pre-v1 support)

As a Developer looking to create an implementation I want to know what the interface of my library should look like so that I can design it well and consistently with libraries in other languages

As a Developer looking to use one of the existing Data Package libraries i want to get up and running as quickly as possible so that I can be very productive

  • I want to quick walk through in my language of doing the simple things quickly

Acceptance criteria

  • Guide on either fd.io or specs.fd.io
  • FAQs

Tasks


Analysis

Existing Work on a Guide

Stack reference - https://github.com/frictionlessdata/stack#frictionless-data-stack

For implementors - http://specs.frictionlessdata.io/implementation/

Neither of these provide simple examples.

Backwards Compatibility

There is an implementer note about resource.url support in Table Resource spec. So I suppose the same we could do for Table Schema: date/time/datetime and fmt: - I suppose temporal format is the most often mistakes I saw. And implementations better to support prev version too.

FAQs

  • Dereferencing schemas before validation: Also should we have a note here about dereferencing? It means that if schema is url-or-path that it should be dereferenced before descriptor validation (other question that the spec doesn’t touch concept of validation at all).

Interface design and code walkthrough

Walkthrough - simple elegant code for each of these (could do as one big code block or as separate ones -- maybe easier together as you can reuse earlier pages)

  1. Creating / loading a data package from a url, path and getting the data
    • Tabular and Non-Tabular (e.g. GeoJSON)
    • Bonus: ?? inline data (create from descriptor with inline data)
  2. Create a data package (from descriptor or null) and set properties and then save
  3. Create a data package with CSV and guess schema and then save ...
  4. Load a Tabular Data Package and then save to a database ...

We recommend library implementors support an interface similar to the following:

dp_identifier = file_path or url
var dp = new DataPackage(dp_identifier)
dp.descriptor.title
// raw bit stream for this resource
var resource = dp.resources[0]
var datastream = resource.dataraw()

// do we have this default to table or have option below
var datastream = resource.data()

var resource.parent_package() // does that work or not?

if resource.profile == "tabular":
    // table is a TabularResource
    var table = resource.table() // .table or .table()
    for row in table:
        print row

Python

# $ pip install datapackage==1.0.0a4
from datapackage import DataPackage

# With datapackage-v1 [WIP]:
# - validate and update logic should by synced with JavaScript version
# - added function like add/get/remote_resource etc

# Remote tabular
dataPackage = DataPackage('https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/datapackage.json')
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Local tabular
dataPackage = DataPackage('datapackage/datapackage.json')
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Local tabular
dataPackage = DataPackage({
    'name': "datapackage",
    'resources': [
        {
            'name': "data",
            'path': ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
            'profile': "tabular-data-resource",
            'dialect': {
                'quoteChar': "|"
            },
            'schema': {
                'fields': [
                    {
                        'name': "id",
                        'type': "integer"
                    },
                    {
                        'name': "city",
                        'type': "string"
                    }
                ]
                }
            }
        ]
    })
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Create from scratch and update datapackage
dataPackage = DataPackage({})
dataPackage.descriptor['name'] = 'datapackage'
dataPackage.descriptor['description'] = 'Good data package'
dataPackage.descriptor['resources'] = [{
  'name': 'cities',
  'profile': 'tabular-data-resource',
  'path': ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
  'schema': {
    'fields': [
    {
      'name': "id",
      'type': "integer"
    },
    {
      'name': "city",
      'type': "string"
    }
    ]
  }
}]
for item in dataPackage.resources[0].table.read(keyed=True):
    print('City %s has an id %s' % (item['city'], item['id']))

# Non-tabular datapackage
dataPackage = DataPackage({
    'name': 'geojson',
    'resources': [
      {
        'name': 'point',
        'data': {
          "type": "Feature",
          "geometry": {
            "type": "Point",
            "coordinates": [125.6, 10.1]
          },
          "properties": {
            "name": "Dinagat Islands"
          }
        }
      }
    ]
})
print(dataPackage.resources[0].source['type']) # Feature

# Load a Tabular Data Package and then save to a database
# This API is WIP - https://github.com/frictionlessdata/datapackage-py/issues/132

JavaScript

// ES6 with async/await
// $ npm install datapackage@latest
// $ node7 --harmony-async-await example.js
const DataPackage = require('datapackage').DataPackage

// With tableschema-v1 [WIP]:
// - no need to await resource.table
// - resource.table.read({keyed: true})
// https://github.com/frictionlessdata/tableschema-js/pull/69


// Remote tabular
async function example1() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load('https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/datapackage.json')
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Local tabular
async function example2() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load('datapackage/datapackage.json')
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Inline tabular
async function example3() {

    // Load will throw an error on invalid descriptor
    const dataPackage = await DataPackage.load({
      name: "datapackage",
      resources: [
        {
          name: "data",
          path: ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
          profile: "tabular-data-resource",
          dialect: {
            quoteChar: "|"
          },
          schema: {
            fields: [
            {
              name: "id",
              type: "integer"
            },
            {
              name: "city",
              type: "string"
            }
            ]
          }
        }
      ]
    })
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Create from scratch and update datapackage
async function example4() {

    // In non strict mode we could provide not valid descriptor
    const dataPackage = await DataPackage.load({resources: []}, {strict: false})
    dataPackage.descriptor.name = 'datapackage'
    dataPackage.descriptor.description = 'Good data package'
    dataPackage.update()
    dataPackage.addResource({
      name: 'cities',
      profile: 'tabular-data-resource',
      path: ["https://raw.githubusercontent.com/frictionlessdata/datapackage-py/master/tests/fixtures/datapackage/data.csv"],
      schema: {
        fields: [
        {
          name: "id",
          type: "integer"
        },
        {
          name: "city",
          type: "string"
        }
        ]
      }
    })
    // Check for errors if updated descriptor is not valid
    if (!dataPackage.valid) {
      for (let error of dataPackage.errors) {
        console.log(error)
      }
    }
    const table = await dataPackage.resources[0].table
    // Read will throw an error if data not compliant to schema
    const data = await table.read(true)
    for ({id, city} of data) {
      console.log(`City ${city} has an id ${id}`)
    }

}

// Non-tabular datapackage
// TODO: it doesn't work because of pending path/data change in spec
async function example5() {

  // Load will throw an error on invalid descriptor
  const dataPackage = await DataPackage.load({
    name: 'geojson',
    resources: [
      {
        name: 'point',
        data: {
          "type": "Feature",
          "geometry": {
            "type": "Point",
            "coordinates": [125.6, 10.1]
          },
          "properties": {
            "name": "Dinagat Islands"
          }
        }
      }
    ]
  })
  console.log(dataPackage.resources[0].source.type) // Feature

}


example1()
example2()
example3()
example4()
example5()

@pwalsh
Copy link
Member

pwalsh commented May 29, 2017

@rufuspollock

we have an implementors guide here http://specs.frictionlessdata.io/implementation/

@rufuspollock
Copy link
Contributor Author

rufuspollock commented May 30, 2017

@pwalsh yes and I read it before I wrote this issue 😉 - this issue reflects the things I think are missing from that implementors guide. I could be mistaken on this so please comment against the items in the description 😄 (and add items).

@roll roll removed this from the Backlog milestone Apr 14, 2023
@roll roll added the docs label Jan 3, 2024
@frictionlessdata frictionlessdata locked and limited conversation to collaborators Oct 21, 2024
@roll roll converted this issue into discussion #1038 Oct 21, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants