Danfo.js is a powerful JavaScript library for data manipulation and analysis, similar to Python's pandas. It provides robust functionalities for working with data frames, series, and performing various data manipulation tasks. This comprehensive tutorial will cover the following topics:
- Setting Up Danfo.js
- Working with Data Frames
- Working with Series
- Data Manipulation Techniques
- Advanced Data Analysis Examples
By mastering these techniques, you can perform sophisticated data analysis directly in the browser or Node.js environment.
To use Danfo.js, you can include it in your HTML file or install it via npm if you're using Node.js.
HTML:
<!DOCTYPE html>
<html>
<head>
<title>Data Analysis with Danfo.js</title>
<script src="https://cdn.jsdelivr.net/npm/danfojs@0.4.0/lib/bundle.min.js"></script>
</head>
<body>
<h1>Data Analysis with Danfo.js</h1>
<div id="output"></div>
</body>
</html>
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it as a table or a spreadsheet.
JavaScript:
<script>
// Creating a DataFrame
const df = new dfd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Salary': [50000, 60000, 45000, 70000, 65000]
});
// Displaying the DataFrame in the console
df.print();
// Displaying the DataFrame in the HTML
df.plot('output').table();
</script>
Explanation:
- A DataFrame is created with columns 'Name', 'Age', and 'Salary'.
- The DataFrame is displayed in the console using the
print
method. - The DataFrame is displayed in the HTML using the
plot
method withtable
.
JavaScript:
<script>
// Accessing a single column
const ageColumn = df['Age'];
ageColumn.print();
// Accessing multiple columns
const subsetDf = df[['Name', 'Salary']];
subsetDf.print();
// Accessing rows by index
const firstRow = df.iloc([0]);
firstRow.print();
// Accessing rows by condition
const youngEmployees = df.query(df['Age'].lt(30));
youngEmployees.print();
</script>
Explanation:
- Accessing a single column returns a Series.
- Accessing multiple columns returns a new DataFrame.
iloc
allows accessing rows by index.query
allows filtering rows based on a condition.
A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index.
JavaScript:
<script>
// Creating a Series
const series = new dfd.Series([50000, 60000, 45000, 70000, 65000], {index: ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], name: 'Salary'});
// Displaying the Series in the console
series.print();
// Displaying the Series in the HTML
series.plot('output').table();
</script>
Explanation:
- A Series is created with data and an index representing names.
- The Series is displayed in the console using the
print
method. - The Series is displayed in the HTML using the
plot
method withtable
.
JavaScript:
<script>
// Accessing a single element
const aliceSalary = series.loc(['Alice']);
console.log('Alice\'s Salary:', aliceSalary.values[0]);
// Accessing multiple elements
const selectedSalaries = series.loc(['Alice', 'David']);
selectedSalaries.print();
</script>
Explanation:
loc
allows accessing elements by label.- Accessing multiple elements returns a new Series.
Danfo.js provides various methods for data manipulation, including filtering, sorting, grouping, and more.
JavaScript:
<script>
// Filtering data where Age > 25
const filteredDf = df.query(df['Age'].gt(25));
filteredDf.print();
</script>
Explanation:
- The DataFrame is filtered to include only rows where 'Age' is greater than 25.
JavaScript:
<script>
// Sorting data by Salary in descending order
const sortedDf = df.sortValues('Salary', { ascending: false });
sortedDf.print();
</script>
Explanation:
- The DataFrame is sorted by 'Salary' in descending order.
JavaScript:
<script>
// Creating a DataFrame with Department column
const df = new dfd.DataFrame({
'Department': ['HR', 'Engineering', 'HR', 'Engineering', 'HR'],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Salary': [50000, 60000, 45000, 70000, 65000]
});
// Grouping by Department and calculating the average Salary
const groupedDf = df.groupby(['Department']).col(['Salary']).mean();
groupedDf.print();
</script>
Explanation:
- The DataFrame is grouped by 'Department', and the average 'Salary' is calculated.
JavaScript:
<script>
// Creating two DataFrames
const df1 = new dfd.DataFrame({
'ID': [1, 2, 3, 4, 5],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
});
const df2 = new dfd.DataFrame({
'ID': [1, 2, 3, 4, 5],
'Salary': [50000, 60000, 45000, 70000, 65000]
});
// Merging DataFrames on 'ID' column
const mergedDf = dfd.merge({ left: df1, right: df2, on: ['ID'] });
mergedDf.print();
</script>
Explanation:
- Two DataFrames are merged on the 'ID' column.
JavaScript:
<script>
// Calculating summary statistics for the DataFrame
const summary = df.describe();
summary.print();
</script>
Explanation:
- The
describe
method is used to calculate summary statistics for the DataFrame, including count, mean, std, min, max, and quartiles.
JavaScript:
<script>
// Creating a DataFrame with missing values
const df = new dfd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, null, 22, 32, 29],
'Salary': [50000, 60000, null, 70000, 65000]
});
// Filling missing values with a specified value
const filledDf = df.fillna({ value: 0 });
filledDf.print();
// Dropping rows with missing values
const droppedDf = df.dropna();
droppedDf.print();
</script>
Explanation:
- Missing values in the DataFrame are handled by filling them with a specified value or by dropping rows with missing values.
JavaScript:
<script>
// Creating a DataFrame
const df = new dfd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Salary': [50000, 60000, 45000, 70000, 65000]
});
// Applying a function to each element of a column
const increasedSalaryDf = df['Salary'].apply(value => value * 1.1);
increasedSalaryDf.print();
// Applying a function to each row
const ageGroupDf = df.apply(row => {
return row.get('Age') > 25 ? 'Senior' : 'Junior';
}, { axis: 1 });
ageGroupDf.print();
</script>
Explanation:
- The
apply
method is used to apply a function to each element of a column or to each row of the DataFrame.
By understanding and using Danfo.js for data analysis through
these comprehensive examples, you can efficiently work with data frames, series, and perform various data manipulation tasks. These skills are crucial for building sophisticated data-driven applications directly in JavaScript.