Skip to content

Latest commit

 

History

History
executable file
·
1956 lines (1205 loc) · 45.3 KB

dataframe.md

File metadata and controls

executable file
·
1956 lines (1205 loc) · 45.3 KB

Table of Contents

DataFrame

src/dataframe.js:11-1259

DataFrame data structure providing an immutable, flexible and powerfull way to manipulate data with columns and rows.

Parameters

  • data (Array | Object | DataFrame) The data of the DataFrame.
  • columns Array The DataFrame column names.
  • options Object Additional options. Example: modules. (optional, default {})

toDSV

src/dataframe.js:314-316

Convert the DataFrame into a text delimiter separated values. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • sep String Column separator. (optional, default ' ')
  • header Boolean Writing the header in the first line. If false, there will be no header. (optional, default true)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toDSV()
df.toDSV(';')
df.toDSV(';', true)
// From node.js only
df.toDSV(';', true, '/my/absolute/path/dataframe.txt')

Returns String The text file in raw string.

toCSV

src/dataframe.js:329-331

Convert the DataFrame into a comma separated values string. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • header Boolean Writing the header in the first line. If false, there will be no header. (optional, default true)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')

Returns String The csv file in raw string.

toTSV

src/dataframe.js:344-346

Convert the DataFrame into a tab separated values string. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • header Boolean Writing the header in the first line. If false, there will be no header. (optional, default true)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')

Returns String The csv file in raw string.

toPSV

src/dataframe.js:359-361

Convert the DataFrame into a pipe separated values string. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • header Boolean Writing the header in the first line. If false, there will be no header. (optional, default true)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toPSV()
df.toPSV(true)
// From node.js only
df.toPSV(true, '/my/absolute/path/dataframe.csv')

Returns String The csv file in raw string.

toText

src/dataframe.js:376-378

Convert the DataFrame into a text delimiter separated values. Alias for .toDSV. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • sep String Column separator. (optional, default ' ')
  • header Boolean Writing the header in the first line. If false, there will be no header. (optional, default true)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toText()
df.toText(';')
df.toText(';', true)
// From node.js only
df.toText(';', true, '/my/absolute/path/dataframe.txt')

Returns String The text file in raw string.

toJSON

src/dataframe.js:390-392

Convert the DataFrame into a json string. You can also save the file if you are using nodejs.

Parameters

  • args ...any
  • asCollection Boolean Writing the JSON as collection of Object. (optional, default false)
  • path String? The path to save the file. /!\ Works only on node.js, not into the browser.

Examples

df.toJSON()
// From node.js only
df.toJSON('/my/absolute/path/dataframe.json')

Returns String The json file in raw string.

toDict

src/dataframe.js:400-407

Convert DataFrame into dict / hash / object.

Examples

df.toDict()

Returns Object The DataFrame converted into dict.

toArray

src/dataframe.js:416-420

Convert DataFrame into Array of Arrays. You can also extract only one column as Array.

Parameters

  • columnName String? Column Name to extract. By default, all columns are transformed.

Examples

df.toArray()

Returns Array The DataFrame (or the column) converted into Array.

toCollection

src/dataframe.js:429-433

Convert DataFrame into Array of dictionnaries. You can also return Rows instead of dictionnaries.

Parameters

  • ofRows Boolean? Return a collection of Rows instead of dictionnaries.

Examples

df.toCollection()

Returns Array The DataFrame converted into Array of dictionnaries (or Rows).

show

src/dataframe.js:445-474

Display the DataFrame as String Table. Can only return a sring instead of displaying the DataFrame.

Parameters

  • rows Number The number of lines to display. (optional, default 10)
  • quiet Boolean Quiet mode. If true, only returns a string instead of console.log(). (optional, default false)

Examples

df.show()
df.show(10)
const stringDF = df.show(10, true)

Returns String The DataFrame as String Table.

dim

src/dataframe.js:482-484

Get the DataFrame dimensions.

Examples

const [height, width] = df.dim()

Returns Array The DataFrame dimensions. [height, width]

transpose

src/dataframe.js:493-508

Transpose a DataFrame. Rows become columns and conversely. n x p => p x n.

Parameters

  • tranposeColumnNames
  • transposeColumnNames Boolean An option to transpose columnNames in a rowNames column. (optional, default false)

Examples

df.transpose()

Returns ÐataFrame A new transposed DataFrame.

count

src/dataframe.js:516-518

Get the rows number.

Examples

df.count()

Returns Int The number of DataFrame rows.

countValue

src/dataframe.js:529-533

Get the count of a value into a column.

Parameters

  • valueToCount The value to count into the selected column.
  • columnName String The column to count the value. (optional, default this.listColumns()[0])

Examples

df.countValue(5, 'column2')
df.select('column1').countValue(5)

Returns Int The number of times the selected value appears.

push

src/dataframe.js:542-544

Push new rows into the DataFrame.

Parameters

  • rows (Array | Row) The rows to add.

Examples

df.push([1,2,3], [1,4,9])

Returns DataFrame A new DataFrame with the new rows.

replace

src/dataframe.js:555-568

Replace a value by another in all the DataFrame or in a column.

Parameters

  • value The value to replace.
  • replacement The new value.
  • columnNames (String | Array) The columns to apply the replacement. (optional, default this.listColumns())

Examples

df.replace(undefined, 0, 'column1', 'column2')

Returns DataFrame A new DataFrame with replaced values.

distinct

src/dataframe.js:577-582

Compute unique values into a column.

Parameters

  • columnName String The column to distinct.

Examples

df.distinct('column1')

Returns DataFrame A DataFrame containing the column with distinct values.

unique

src/dataframe.js:592-594

Compute unique values into a column. Alias from .distinct()

Parameters

  • columnName String The column to distinct.

Examples

df.unique('column1')

Returns DataFrame A DataFrame containing the column with distinct values.

listColumns

src/dataframe.js:602-604

List DataFrame columns.

Examples

df.listColumns()

Returns Array An Array containing DataFrame columnNames.

select

src/dataframe.js:613-618

Select columns in the DataFrame.

Parameters

  • columnNames ...String The columns to select.

Examples

df.select('column1', 'column3')

Returns DataFrame A new DataFrame containing selected columns.

withColumn

src/dataframe.js:629-638

Add a new column or set an existing one.

Parameters

  • columnName String The column to modify or to create.
  • func Function The function to create the column. (optional, default (row,index)=>undefined)

Examples

df.withColumn('column4', () => 2)
df.withColumn('column2', (row) => row.get('column2') * 2)

Returns DataFrame A new DataFrame containing the new or modified column.

restructure

src/dataframe.js:649-651

Modify the structure of the DataFrame by changing columns order, creating new columns or removing some columns.

Parameters

  • newColumnNames Array The new columns of the DataFrame.

Examples

df.restructure(['column1', 'column4', 'column2', 'column3'])
df.restructure(['column1', 'column4'])
df.restructure(['column1', 'newColumn', 'column4'])

Returns DataFrame A new DataFrame with restructured columns (renamed, add or deleted).

renameAll

src/dataframe.js:660-665

Rename each column.

Parameters

  • newColumnNames Array The new column names of the DataFrame.

Examples

df.renameAll(['column1', 'column3', 'column4'])

Returns DataFrame A new DataFrame with the new column names.

rename

src/dataframe.js:675-680

Rename a column.

Parameters

  • columnName String The column to rename.
  • replacement String The new name for the column.

Examples

df.rename('column1', 'columnRenamed')

Returns DataFrame A new DataFrame with the new column name.

castAll

src/dataframe.js:689-702

Cast each column into a given type.

Parameters

  • typeFunctions Array The functions used to cast columns.

Examples

df.castAll([Number, String, (val) => new CustomClass(val)])

Returns DataFrame A new DataFrame with the columns having new types.

cast

src/dataframe.js:713-717

Cast a column into a given type.

Parameters

  • columnName String The column to cast.
  • typeFunction
  • ObjectType Function The function used to cast the column.

Examples

df.cast('column1', Number)
df.cast('column1', (val) => new MyCustomClass(val))

Returns DataFrame A new DataFrame with the column having a new type.

drop

src/dataframe.js:726-731

Remove a single column.

Parameters

  • columnName String The column to drop.

Examples

df.drop('column2')

Returns DataFrame A new DataFrame without the dropped column.

chain

src/dataframe.js:746-751

Chain maps and filters functions on DataFrame by optimizing their executions. If a function returns boolean, it's a filter. Else it's a map. It can be 10 - 100 x faster than standard chains of .map() and .filter().

Parameters

  • funcs ...Function Functions to apply on the DataFrame rows taking the row as parameter.

Examples

df.chain(
     row => row.get('column1') > 3, // filter
     row => row.set('column1', 3),  // map
     row => row.get('column2') === '5' // filter
)

Returns DataFrame A new DataFrame with modified rows.

filter

src/dataframe.js:761-775

Filter DataFrame rows.

Parameters

  • condition (Function | Object) A filter function or a column/value object.

Examples

df.filter(row => row.get('column1') >= 3)
df.filter({'column2': 5, 'column1': 3}))

Returns DataFrame A new filtered DataFrame.

where

src/dataframe.js:786-788

Filter DataFrame rows. Alias of .filter()

Parameters

  • condition (Function | Object) A filter function or a column/value object.

Examples

df.where(row => row.get('column1') >= 3)
df.where({'column2': 5, 'column1': 3}))

Returns DataFrame A new filtered DataFrame.

find

src/dataframe.js:798-800

Find a row (the first met) based on a condition.

Parameters

  • condition (Function | Object) A filter function or a column/value object.

Examples

df.find(row => row.get('column1') === 3)
df.find({'column1': 3})

Returns Row The targeted Row.

map

src/dataframe.js:809-814

Map on DataFrame rows. /!\ Prefer to use .chain().

Parameters

  • func Function A function to apply on each row taking the row as parameter.

Examples

df.map(row => row.set('column1', row.get('column1') * 2))

Returns DataFrame A new DataFrame with modified rows.

reduce

src/dataframe.js:828-832

Reduce DataFrame into a value.

Parameters

  • func Function The reduce function taking 2 parameters, previous and next.
  • init The initial value of the reducer.

Examples

df.reduce((p, n) => n.get('column1') + p, 0)
df2.reduce((p, n) => (
         n.set('column1', p.get('column1') + n.get('column1'))
          .set('column2', p.get('column2') + n.get('column2'))
))

Returns any A reduced value.

reduceRight

src/dataframe.js:842-846

Reduce DataFrame into a value, starting from the last row (see .reduce()).

Parameters

  • func Function The reduce function taking 2 parameters, previous and next.
  • init The initial value of the reducer.

Examples

df.reduceRight((p, n) => p > n ? p : n, 0)

Returns any A reduced value.

dropDuplicates

src/dataframe.js:855-861

Return a DataFrame without duplicated columns.

Parameters

  • columnNames ...String The columns used to check unicity of rows. If omitted, unicity is checked on all columns.

Examples

df.dropDuplicates('id', 'name')

Returns DataFrame A DataFrame without duplicated rows.

dropMissingValues

src/dataframe.js:870-884

Return a DataFrame without rows containing missing values (undefined, NaN, null).

Parameters

  • columnNames Array The columns to consider. All columns are considered by default.

Examples

df.dropMissingValues(['id', 'name'])

Returns DataFrame A DataFrame without rows containing missing values.

fillMissingValues

src/dataframe.js:894-896

Return a DataFrame with missing values (undefined, NaN, null) fill with default value.

Parameters

  • replacement The new value.
  • columnNames Array The columns to consider. All columns are considered by default.

Examples

df.fillMissingValues(0, ['id', 'name'])

Returns DataFrame A DataFrame with missing values replaced.

shuffle

src/dataframe.js:904-915

Return a shuffled DataFrame rows.

Examples

df.shuffle()

Returns DataFrame A shuffled DataFrame.

sample

src/dataframe.js:924-938

Return a random sample of rows.

Parameters

  • percentage Number A percentage of the orignal DataFrame giving the sample size.

Examples

df.sample(0.3)

Returns DataFrame A sample DataFrame

bisect

src/dataframe.js:947-964

Randomly split a DataFrame into 2 DataFrames.

Parameters

  • percentage Number A percentage of the orignal DataFrame giving the first DataFrame size. The second takes the rest.

Examples

const [30DF, 70DF] = df.bisect(0.3)

Returns Array An Array containing the two DataFrames. First, the X% DataFrame then the rest DataFrame.

groupBy

src/dataframe.js:977-979

Group DataFrame rows by columns giving a GroupedDataFrame object. See its doc for more examples.

Parameters

  • args ...any
  • columnNames ...String The columns used for the groupBy.

Examples

df.groupBy('column1')
df.groupBy('column1', 'column2')
df.groupBy('column1', 'column2').listGroups()
df.groupBy('column1', 'column2').show()
df.groupBy('column1', 'column2').aggregate((group) => group.count())

Returns GroupedDataFrame A GroupedDataFrame object.

sortBy

src/dataframe.js:992-1056

Sort DataFrame rows based on column values. The row should contains only one variable type. Columns are sorted left-to-right.

Parameters

  • columnNames (String | Array<string>) The columns giving order.
  • reverse Boolean Reverse mode. Reverse the order if true. (optional, default false)
  • missingValuesPosition String Define the position of missing values (undefined, nulls and NaN) in the order. (optional, default 'first')

Examples

df.sortBy('id')
df.sortBy(['id1', 'id2'])
df.sortBy(['id1'], true)

Returns DataFrame An ordered DataFrame.

union

src/dataframe.js:1065-1078

Concat two DataFrames.

Parameters

  • dfToUnion DataFrame The DataFrame to concat.

Examples

df.union(df2)

Returns DataFrame A new concatenated DataFrame resulting of the union.

join

src/dataframe.js:1089-1098

Join two DataFrames.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.
  • how String The join mode. Can be: full, inner, outer, left, right. (optional, default 'inner')

Examples

df.join(df2, 'column1', 'full')

Returns DataFrame The joined DataFrame.

innerJoin

src/dataframe.js:1110-1112

Join two DataFrames with inner mode.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.

Examples

df.innerJoin(df2, 'id')
df.join(df2, 'id')
df.join(df2, 'id', 'inner')

Returns DataFrame The joined DataFrame.

fullJoin

src/dataframe.js:1123-1125

Join two DataFrames with full mode.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.

Examples

df.fullJoin(df2, 'id')
df.join(df2, 'id', 'full')

Returns DataFrame The joined DataFrame.

outerJoin

src/dataframe.js:1136-1138

Join two DataFrames with outer mode.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.

Examples

df2.outerJoin(df2, 'id')
df2.join(df2, 'id', 'outer')

Returns DataFrame The joined DataFrame.

leftJoin

src/dataframe.js:1149-1151

Join two DataFrames with left mode.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.

Examples

df.leftJoin(df2, 'id')
df.join(df2, 'id', 'left')

Returns DataFrame The joined DataFrame.

rightJoin

src/dataframe.js:1162-1164

Join two DataFrames with right mode.

Parameters

  • dfToJoin DataFrame The DataFrame to join.
  • columnNames (String | Array) The selected columns for the join.

Examples

df.rightJoin(df2, 'id')
df.join(df2, 'id', 'right')

Returns DataFrame The joined DataFrame.

diff

src/dataframe.js:1174-1176

Find the differences between two DataFrames (reverse of join).

Parameters

  • dfToDiff DataFrame The DataFrame to diff.
  • columnNames (String | Array) The selected columns for the diff.

Examples

df2.diff(df2, 'id')

Returns DataFrame The differences DataFrame.

head

src/dataframe.js:1186-1188

Create a new subset DataFrame based on the first rows.

Parameters

  • nRows Number The number of first rows to get. (optional, default 10)

Examples

df2.head()
df2.head(5)

Returns DataFrame The subset DataFrame.

tail

src/dataframe.js:1198-1200

Create a new subset DataFrame based on the last rows.

Parameters

  • nRows Number The number of last rows to get. (optional, default 10)

Examples

df2.tail()
df2.tail(5)

Returns DataFrame The subset DataFrame.

slice

src/dataframe.js:1213-1221

Create a new subset DataFrame based on given indexs. Similar to Array.slice.

Parameters

  • startIndex Number The index to start the slice (included). (optional, default 0)
  • endIndex Number The index to end the slice (excluded). (optional, default this.count())

Examples

df2.slice()
df2.slice(0)
df2.slice(0, 20)
df2.slice(10, 30)

Returns DataFrame The subset DataFrame.

getRow

src/dataframe.js:1230-1232

Return a Row by its index.

Parameters

  • index Number The index to select the row. (optional, default 0)

Examples

df2.getRow(1)

Returns Row The Row.

setRow

src/dataframe.js:1242-1246

Modify a Row a the given index.

Parameters

  • index Number The index to select the row. (optional, default 0)
  • func (optional, default row=>row)

Examples

df2.setRowByIndex(1, row => row.set("column1", 33))

Returns DataFrame A new DataFrame with the modified Row.

setRowInPlace

src/dataframe.js:1255-1258

Modify a Row in place (by mutation) at the given index.

Parameters

  • index Number The index to select the row. (optional, default 0)
  • func (optional, default row=>row)

Examples

df2.setRowByIndex(1, row => row.set("column1", 33))

Returns DataFrame The current DataFrame with the modified row.

setDefaultModules

src/dataframe.js:18-20

Set the default modules used in DataFrame instances.

Parameters

  • defaultModules ...Object DataFrame modules used by default.

Examples

DataFrame.setDefaultModules(SQL, Stat)

fromDSV

src/dataframe.js:35-37

Create a DataFrame from a delimiter separated values text file. It returns a Promise.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.
  • sep String The separator used to parse the file.
  • header Boolean A boolean indicating if the text has a header or not. (optional, default true)

Examples

DataFrame.fromDSV('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromDSV(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromDSV('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromDSV('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())

fromText

src/dataframe.js:52-54

Create a DataFrame from a delimiter separated values text file. It returns a Promise. Alias of DataFrame.fromDSV.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.
  • sep String The separator used to parse the file.
  • header Boolean A boolean indicating if the text has a header or not. (optional, default true)

Examples

DataFrame.fromText('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromText(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromText('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromText('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())

fromCSV

src/dataframe.js:68-70

Create a DataFrame from a comma separated values file. It returns a Promise.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.
  • header Boolean A boolean indicating if the csv has a header or not. (optional, default true)

Examples

DataFrame.fromCSV('http://myurl/myfile.csv').then(df => df.show())
// For browser only
DataFrame.fromCSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromCSV('/my/absolue/path/myfile.csv').then(df => df.show())
DataFrame.fromCSV('/my/absolue/path/myfile.csv', true).then(df => df.show())

fromTSV

src/dataframe.js:84-86

Create a DataFrame from a tab separated values file. It returns a Promise.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.
  • header Boolean A boolean indicating if the tsv has a header or not. (optional, default true)

Examples

DataFrame.fromTSV('http://myurl/myfile.tsv').then(df => df.show())
// For browser only
DataFrame.fromTSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromTSV('/my/absolue/path/myfile.tsv').then(df => df.show())
DataFrame.fromTSV('/my/absolue/path/myfile.tsv', true).then(df => df.show())

fromPSV

src/dataframe.js:100-102

Create a DataFrame from a pipe separated values file. It returns a Promise.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.
  • header Boolean A boolean indicating if the psv has a header or not. (optional, default true)

Examples

DataFrame.fromPSV('http://myurl/myfile.psv').then(df => df.show())
// For browser only
DataFrame.fromPSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromPSV('/my/absolue/path/myfile.psv').then(df => df.show())
DataFrame.fromPSV('/my/absolue/path/myfile.psv', true).then(df => df.show())

fromJSON

src/dataframe.js:114-116

Create a DataFrame from a JSON file. It returns a Promise.

Parameters

  • args ...any
  • pathOrFile (String | File) A path to the file (url or local) or a browser File object.

Examples

DataFrame.fromJSON('http://myurl/myfile.json').then(df => df.show())
// For browser only
DataFrame.fromJSON(myFile).then(df => df.show())
// From node.js only
DataFrame.fromJSON('/my/absolute/path/myfile.json').then(df => df.show())