- DataFrame
- Parameters
- toDSV
- toCSV
- toTSV
- toPSV
- toText
- toJSON
- toDict
- toArray
- toCollection
- show
- dim
- transpose
- count
- countValue
- push
- replace
- distinct
- unique
- listColumns
- select
- withColumn
- restructure
- renameAll
- rename
- castAll
- cast
- drop
- chain
- filter
- where
- find
- map
- reduce
- reduceRight
- dropDuplicates
- dropMissingValues
- fillMissingValues
- shuffle
- sample
- bisect
- groupBy
- sortBy
- union
- join
- innerJoin
- fullJoin
- outerJoin
- leftJoin
- rightJoin
- diff
- head
- tail
- slice
- getRow
- setRow
- setRowInPlace
- setDefaultModules
- fromDSV
- fromText
- fromCSV
- fromTSV
- fromPSV
- fromJSON
DataFrame data structure providing an immutable, flexible and powerfull way to manipulate data with columns and rows.
data
(Array | Object | DataFrame) The data of the DataFrame.columns
Array The DataFrame column names.options
Object Additional options. Example: modules. (optional, default{}
)
Convert the DataFrame into a text delimiter separated values. You can also save the file if you are using nodejs.
args
...anysep
String Column separator. (optional, default' '
)header
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toDSV()
df.toDSV(';')
df.toDSV(';', true)
// From node.js only
df.toDSV(';', true, '/my/absolute/path/dataframe.txt')
Returns String The text file in raw string.
Convert the DataFrame into a comma separated values string. You can also save the file if you are using nodejs.
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
Convert the DataFrame into a tab separated values string. You can also save the file if you are using nodejs.
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
Convert the DataFrame into a pipe separated values string. You can also save the file if you are using nodejs.
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toPSV()
df.toPSV(true)
// From node.js only
df.toPSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
Convert the DataFrame into a text delimiter separated values. Alias for .toDSV. You can also save the file if you are using nodejs.
args
...anysep
String Column separator. (optional, default' '
)header
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toText()
df.toText(';')
df.toText(';', true)
// From node.js only
df.toText(';', true, '/my/absolute/path/dataframe.txt')
Returns String The text file in raw string.
Convert the DataFrame into a json string. You can also save the file if you are using nodejs.
args
...anyasCollection
Boolean Writing the JSON as collection of Object. (optional, defaultfalse
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
df.toJSON()
// From node.js only
df.toJSON('/my/absolute/path/dataframe.json')
Returns String The json file in raw string.
Convert DataFrame into dict / hash / object.
df.toDict()
Returns Object The DataFrame converted into dict.
Convert DataFrame into Array of Arrays. You can also extract only one column as Array.
columnName
String? Column Name to extract. By default, all columns are transformed.
df.toArray()
Returns Array The DataFrame (or the column) converted into Array.
Convert DataFrame into Array of dictionnaries. You can also return Rows instead of dictionnaries.
ofRows
Boolean? Return a collection of Rows instead of dictionnaries.
df.toCollection()
Returns Array The DataFrame converted into Array of dictionnaries (or Rows).
Display the DataFrame as String Table. Can only return a sring instead of displaying the DataFrame.
rows
Number The number of lines to display. (optional, default10
)quiet
Boolean Quiet mode. If true, only returns a string instead of console.log(). (optional, defaultfalse
)
df.show()
df.show(10)
const stringDF = df.show(10, true)
Returns String The DataFrame as String Table.
Get the DataFrame dimensions.
const [height, width] = df.dim()
Returns Array The DataFrame dimensions. [height, width]
Transpose a DataFrame. Rows become columns and conversely. n x p => p x n.
tranposeColumnNames
transposeColumnNames
Boolean An option to transpose columnNames in a rowNames column. (optional, defaultfalse
)
df.transpose()
Returns ÐataFrame A new transposed DataFrame.
Get the rows number.
df.count()
Returns Int The number of DataFrame rows.
Get the count of a value into a column.
valueToCount
The value to count into the selected column.columnName
String The column to count the value. (optional, defaultthis.listColumns()[0]
)
df.countValue(5, 'column2')
df.select('column1').countValue(5)
Returns Int The number of times the selected value appears.
Push new rows into the DataFrame.
rows
(Array | Row) The rows to add.
df.push([1,2,3], [1,4,9])
Returns DataFrame A new DataFrame with the new rows.
Replace a value by another in all the DataFrame or in a column.
value
The value to replace.replacement
The new value.columnNames
(String | Array) The columns to apply the replacement. (optional, defaultthis.listColumns()
)
df.replace(undefined, 0, 'column1', 'column2')
Returns DataFrame A new DataFrame with replaced values.
Compute unique values into a column.
columnName
String The column to distinct.
df.distinct('column1')
Returns DataFrame A DataFrame containing the column with distinct values.
Compute unique values into a column. Alias from .distinct()
columnName
String The column to distinct.
df.unique('column1')
Returns DataFrame A DataFrame containing the column with distinct values.
List DataFrame columns.
df.listColumns()
Returns Array An Array containing DataFrame columnNames.
Select columns in the DataFrame.
columnNames
...String The columns to select.
df.select('column1', 'column3')
Returns DataFrame A new DataFrame containing selected columns.
Add a new column or set an existing one.
columnName
String The column to modify or to create.func
Function The function to create the column. (optional, default(row,index)=>undefined
)
df.withColumn('column4', () => 2)
df.withColumn('column2', (row) => row.get('column2') * 2)
Returns DataFrame A new DataFrame containing the new or modified column.
Modify the structure of the DataFrame by changing columns order, creating new columns or removing some columns.
newColumnNames
Array The new columns of the DataFrame.
df.restructure(['column1', 'column4', 'column2', 'column3'])
df.restructure(['column1', 'column4'])
df.restructure(['column1', 'newColumn', 'column4'])
Returns DataFrame A new DataFrame with restructured columns (renamed, add or deleted).
Rename each column.
newColumnNames
Array The new column names of the DataFrame.
df.renameAll(['column1', 'column3', 'column4'])
Returns DataFrame A new DataFrame with the new column names.
Rename a column.
df.rename('column1', 'columnRenamed')
Returns DataFrame A new DataFrame with the new column name.
Cast each column into a given type.
typeFunctions
Array The functions used to cast columns.
df.castAll([Number, String, (val) => new CustomClass(val)])
Returns DataFrame A new DataFrame with the columns having new types.
Cast a column into a given type.
columnName
String The column to cast.typeFunction
ObjectType
Function The function used to cast the column.
df.cast('column1', Number)
df.cast('column1', (val) => new MyCustomClass(val))
Returns DataFrame A new DataFrame with the column having a new type.
Remove a single column.
columnName
String The column to drop.
df.drop('column2')
Returns DataFrame A new DataFrame without the dropped column.
Chain maps and filters functions on DataFrame by optimizing their executions. If a function returns boolean, it's a filter. Else it's a map. It can be 10 - 100 x faster than standard chains of .map() and .filter().
funcs
...Function Functions to apply on the DataFrame rows taking the row as parameter.
df.chain(
row => row.get('column1') > 3, // filter
row => row.set('column1', 3), // map
row => row.get('column2') === '5' // filter
)
Returns DataFrame A new DataFrame with modified rows.
Filter DataFrame rows.
df.filter(row => row.get('column1') >= 3)
df.filter({'column2': 5, 'column1': 3}))
Returns DataFrame A new filtered DataFrame.
Filter DataFrame rows. Alias of .filter()
df.where(row => row.get('column1') >= 3)
df.where({'column2': 5, 'column1': 3}))
Returns DataFrame A new filtered DataFrame.
Find a row (the first met) based on a condition.
df.find(row => row.get('column1') === 3)
df.find({'column1': 3})
Returns Row The targeted Row.
Map on DataFrame rows. /!\ Prefer to use .chain().
func
Function A function to apply on each row taking the row as parameter.
df.map(row => row.set('column1', row.get('column1') * 2))
Returns DataFrame A new DataFrame with modified rows.
Reduce DataFrame into a value.
func
Function The reduce function taking 2 parameters, previous and next.init
The initial value of the reducer.
df.reduce((p, n) => n.get('column1') + p, 0)
df2.reduce((p, n) => (
n.set('column1', p.get('column1') + n.get('column1'))
.set('column2', p.get('column2') + n.get('column2'))
))
Returns any A reduced value.
Reduce DataFrame into a value, starting from the last row (see .reduce()).
func
Function The reduce function taking 2 parameters, previous and next.init
The initial value of the reducer.
df.reduceRight((p, n) => p > n ? p : n, 0)
Returns any A reduced value.
Return a DataFrame without duplicated columns.
columnNames
...String The columns used to check unicity of rows. If omitted, unicity is checked on all columns.
df.dropDuplicates('id', 'name')
Returns DataFrame A DataFrame without duplicated rows.
Return a DataFrame without rows containing missing values (undefined, NaN, null).
columnNames
Array The columns to consider. All columns are considered by default.
df.dropMissingValues(['id', 'name'])
Returns DataFrame A DataFrame without rows containing missing values.
Return a DataFrame with missing values (undefined, NaN, null) fill with default value.
replacement
The new value.columnNames
Array The columns to consider. All columns are considered by default.
df.fillMissingValues(0, ['id', 'name'])
Returns DataFrame A DataFrame with missing values replaced.
Return a shuffled DataFrame rows.
df.shuffle()
Returns DataFrame A shuffled DataFrame.
Return a random sample of rows.
percentage
Number A percentage of the orignal DataFrame giving the sample size.
df.sample(0.3)
Returns DataFrame A sample DataFrame
Randomly split a DataFrame into 2 DataFrames.
percentage
Number A percentage of the orignal DataFrame giving the first DataFrame size. The second takes the rest.
const [30DF, 70DF] = df.bisect(0.3)
Returns Array An Array containing the two DataFrames. First, the X% DataFrame then the rest DataFrame.
Group DataFrame rows by columns giving a GroupedDataFrame object. See its doc for more examples.
args
...anycolumnNames
...String The columns used for the groupBy.
df.groupBy('column1')
df.groupBy('column1', 'column2')
df.groupBy('column1', 'column2').listGroups()
df.groupBy('column1', 'column2').show()
df.groupBy('column1', 'column2').aggregate((group) => group.count())
Returns GroupedDataFrame A GroupedDataFrame object.
Sort DataFrame rows based on column values. The row should contains only one variable type. Columns are sorted left-to-right.
columnNames
(String | Array<string>) The columns giving order.reverse
Boolean Reverse mode. Reverse the order if true. (optional, defaultfalse
)missingValuesPosition
String Define the position of missing values (undefined, nulls and NaN) in the order. (optional, default'first'
)
df.sortBy('id')
df.sortBy(['id1', 'id2'])
df.sortBy(['id1'], true)
Returns DataFrame An ordered DataFrame.
Concat two DataFrames.
dfToUnion
DataFrame The DataFrame to concat.
df.union(df2)
Returns DataFrame A new concatenated DataFrame resulting of the union.
Join two DataFrames.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.how
String The join mode. Can be: full, inner, outer, left, right. (optional, default'inner'
)
df.join(df2, 'column1', 'full')
Returns DataFrame The joined DataFrame.
Join two DataFrames with inner mode.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
df.innerJoin(df2, 'id')
df.join(df2, 'id')
df.join(df2, 'id', 'inner')
Returns DataFrame The joined DataFrame.
Join two DataFrames with full mode.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
df.fullJoin(df2, 'id')
df.join(df2, 'id', 'full')
Returns DataFrame The joined DataFrame.
Join two DataFrames with outer mode.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
df2.outerJoin(df2, 'id')
df2.join(df2, 'id', 'outer')
Returns DataFrame The joined DataFrame.
Join two DataFrames with left mode.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
df.leftJoin(df2, 'id')
df.join(df2, 'id', 'left')
Returns DataFrame The joined DataFrame.
Join two DataFrames with right mode.
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
df.rightJoin(df2, 'id')
df.join(df2, 'id', 'right')
Returns DataFrame The joined DataFrame.
Find the differences between two DataFrames (reverse of join).
dfToDiff
DataFrame The DataFrame to diff.columnNames
(String | Array) The selected columns for the diff.
df2.diff(df2, 'id')
Returns DataFrame The differences DataFrame.
Create a new subset DataFrame based on the first rows.
nRows
Number The number of first rows to get. (optional, default10
)
df2.head()
df2.head(5)
Returns DataFrame The subset DataFrame.
Create a new subset DataFrame based on the last rows.
nRows
Number The number of last rows to get. (optional, default10
)
df2.tail()
df2.tail(5)
Returns DataFrame The subset DataFrame.
Create a new subset DataFrame based on given indexs. Similar to Array.slice.
startIndex
Number The index to start the slice (included). (optional, default0
)endIndex
Number The index to end the slice (excluded). (optional, defaultthis.count()
)
df2.slice()
df2.slice(0)
df2.slice(0, 20)
df2.slice(10, 30)
Returns DataFrame The subset DataFrame.
Return a Row by its index.
index
Number The index to select the row. (optional, default0
)
df2.getRow(1)
Returns Row The Row.
Modify a Row a the given index.
index
Number The index to select the row. (optional, default0
)func
(optional, defaultrow=>row
)
df2.setRowByIndex(1, row => row.set("column1", 33))
Returns DataFrame A new DataFrame with the modified Row.
Modify a Row in place (by mutation) at the given index.
index
Number The index to select the row. (optional, default0
)func
(optional, defaultrow=>row
)
df2.setRowByIndex(1, row => row.set("column1", 33))
Returns DataFrame The current DataFrame with the modified row.
Set the default modules used in DataFrame instances.
defaultModules
...Object DataFrame modules used by default.
DataFrame.setDefaultModules(SQL, Stat)
Create a DataFrame from a delimiter separated values text file. It returns a Promise.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.sep
String The separator used to parse the file.header
Boolean A boolean indicating if the text has a header or not. (optional, defaulttrue
)
DataFrame.fromDSV('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromDSV(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromDSV('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromDSV('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())
Create a DataFrame from a delimiter separated values text file. It returns a Promise. Alias of DataFrame.fromDSV.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.sep
String The separator used to parse the file.header
Boolean A boolean indicating if the text has a header or not. (optional, defaulttrue
)
DataFrame.fromText('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromText(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromText('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromText('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())
Create a DataFrame from a comma separated values file. It returns a Promise.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the csv has a header or not. (optional, defaulttrue
)
DataFrame.fromCSV('http://myurl/myfile.csv').then(df => df.show())
// For browser only
DataFrame.fromCSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromCSV('/my/absolue/path/myfile.csv').then(df => df.show())
DataFrame.fromCSV('/my/absolue/path/myfile.csv', true).then(df => df.show())
Create a DataFrame from a tab separated values file. It returns a Promise.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the tsv has a header or not. (optional, defaulttrue
)
DataFrame.fromTSV('http://myurl/myfile.tsv').then(df => df.show())
// For browser only
DataFrame.fromTSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromTSV('/my/absolue/path/myfile.tsv').then(df => df.show())
DataFrame.fromTSV('/my/absolue/path/myfile.tsv', true).then(df => df.show())
Create a DataFrame from a pipe separated values file. It returns a Promise.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the psv has a header or not. (optional, defaulttrue
)
DataFrame.fromPSV('http://myurl/myfile.psv').then(df => df.show())
// For browser only
DataFrame.fromPSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromPSV('/my/absolue/path/myfile.psv').then(df => df.show())
DataFrame.fromPSV('/my/absolue/path/myfile.psv', true).then(df => df.show())
Create a DataFrame from a JSON file. It returns a Promise.
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.
DataFrame.fromJSON('http://myurl/myfile.json').then(df => df.show())
// For browser only
DataFrame.fromJSON(myFile).then(df => df.show())
// From node.js only
DataFrame.fromJSON('/my/absolute/path/myfile.json').then(df => df.show())