Skip to content
Steve L. Nyemba edited this page Jul 9, 2023 · 1 revision

This document shows how to use data-transport to read/write data from and to a traditional RDBMS. The supported databases are PostgreSQL; SQLite; MySQL; Netezza ...

Read/Write functions respectively leverage pandas and sqlalchemy

Bigquery

To handle a dataset and a private_key parameters must be supplied even if the execution of the query doesn't involve the dataset in question.

Bigquery Read Functionality :

import transport
reader = transport.factory.instance(provider='bigquery',context='read',dataset='covid19_italy',private_key='/home/me/my-service-key.json')
_df = reader.read(table='data_by_province',limit=1000)
print (_df.head())
#
# or 
_df = reader.read(sql='SELECT * FROM covid19_italy.data_by_province limit 100')
print (_df.head())
reader.close()

Bigquery Write Functionality :

Write functions can be done against a pandas data-frame or a list of objects that can be used in lieu of pandas

import transport
import pandas as pd

_df = pd.DataFrame({"name":["steve","nico","elon"],"age":[40,30,5]})
writer = transport.factory.instance(provider='bigquery',dataset='out_dataset',private_key='/home/me/my-service-key.json')
writer.write(_df,table='friends')
writer.close()

PostgreSQL

Reading From PostgreSQL

import transport
#
# reading the table of logs
reader = transport.factory.instance(provider="postgresql",database="mydb", table="logs","username"="steve","password="foo")
_df = reader.read(limit=10) #-- Will only read 10 rows, it is optional
#
# Assuming we are interested in running an SQL query against another table or a set of tables
# We assume we have two tables, foo (_id,info) and bar (_id,flag) 
# and we are interested in the resulting join of the two tables, so we do the following

sql = """
select x.* from foo as x INNER JOIN bar y ON x._id = y._id AND y.flag = 'N'
"""
_xdf = reader.read(sql=sql) ;

If we are interested in limiting the records being read always provide a limit to the read

Writing to PostgreSQL

In the advent of a database with security enabled, you can/should provide username/password from a file the structure of the file is as follows (JSON) and should be in a secure location. For the sake of the example, this file will be stored in /home/steve/transport/pg.json

{
"username":"steve","password":"let-me-!n",
"host":"empire.io","port":5432,
"database":"mydb"
}
import transport
import pandas as pd

_df = pd.DataFrame({"name":["steve","nico","elon"],"age":[40,30,5]})
writer = transport.factory.instance(provider='provider',context='write', database='mydb',table='distro', auth_file='/home/steve/transport/pg.json')
writer.write(_df)
Clone this wiki locally