Skip to content

Latest commit

 

History

History
296 lines (229 loc) · 8.41 KB

07_data_analysis_with_danfo.md

File metadata and controls

296 lines (229 loc) · 8.41 KB

Advanced Tutorial: Comprehensive Data Analysis with Danfo.js

Introduction

Danfo.js is a powerful JavaScript library for data manipulation and analysis, similar to Python's pandas. It provides robust functionalities for working with data frames, series, and performing various data manipulation tasks. This comprehensive tutorial will cover the following topics:

  1. Setting Up Danfo.js
  2. Working with Data Frames
  3. Working with Series
  4. Data Manipulation Techniques
  5. Advanced Data Analysis Examples

By mastering these techniques, you can perform sophisticated data analysis directly in the browser or Node.js environment.

1. Setting Up Danfo.js

To use Danfo.js, you can include it in your HTML file or install it via npm if you're using Node.js.

Example: Including Danfo.js in an HTML File

HTML:

<!DOCTYPE html>
<html>
<head>
    <title>Data Analysis with Danfo.js</title>
    <script src="https://cdn.jsdelivr.net/npm/danfojs@0.4.0/lib/bundle.min.js"></script>
</head>
<body>
    <h1>Data Analysis with Danfo.js</h1>
    <div id="output"></div>
</body>
</html>

2. Working with Data Frames

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it as a table or a spreadsheet.

Example: Creating and Displaying a DataFrame

JavaScript:

<script>
    // Creating a DataFrame
    const df = new dfd.DataFrame({
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [24, 27, 22, 32, 29],
        'Salary': [50000, 60000, 45000, 70000, 65000]
    });

    // Displaying the DataFrame in the console
    df.print();

    // Displaying the DataFrame in the HTML
    df.plot('output').table();
</script>

Explanation:

  • A DataFrame is created with columns 'Name', 'Age', and 'Salary'.
  • The DataFrame is displayed in the console using the print method.
  • The DataFrame is displayed in the HTML using the plot method with table.

Example: Accessing DataFrame Columns and Rows

JavaScript:

<script>
    // Accessing a single column
    const ageColumn = df['Age'];
    ageColumn.print();

    // Accessing multiple columns
    const subsetDf = df[['Name', 'Salary']];
    subsetDf.print();

    // Accessing rows by index
    const firstRow = df.iloc([0]);
    firstRow.print();

    // Accessing rows by condition
    const youngEmployees = df.query(df['Age'].lt(30));
    youngEmployees.print();
</script>

Explanation:

  • Accessing a single column returns a Series.
  • Accessing multiple columns returns a new DataFrame.
  • iloc allows accessing rows by index.
  • query allows filtering rows based on a condition.

3. Working with Series

A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index.

Example: Creating and Displaying a Series

JavaScript:

<script>
    // Creating a Series
    const series = new dfd.Series([50000, 60000, 45000, 70000, 65000], {index: ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], name: 'Salary'});

    // Displaying the Series in the console
    series.print();

    // Displaying the Series in the HTML
    series.plot('output').table();
</script>

Explanation:

  • A Series is created with data and an index representing names.
  • The Series is displayed in the console using the print method.
  • The Series is displayed in the HTML using the plot method with table.

Example: Accessing Series Elements

JavaScript:

<script>
    // Accessing a single element
    const aliceSalary = series.loc(['Alice']);
    console.log('Alice\'s Salary:', aliceSalary.values[0]);

    // Accessing multiple elements
    const selectedSalaries = series.loc(['Alice', 'David']);
    selectedSalaries.print();
</script>

Explanation:

  • loc allows accessing elements by label.
  • Accessing multiple elements returns a new Series.

4. Data Manipulation Techniques

Danfo.js provides various methods for data manipulation, including filtering, sorting, grouping, and more.

Example: Filtering Data

JavaScript:

<script>
    // Filtering data where Age > 25
    const filteredDf = df.query(df['Age'].gt(25));
    filteredDf.print();
</script>

Explanation:

  • The DataFrame is filtered to include only rows where 'Age' is greater than 25.

Example: Sorting Data

JavaScript:

<script>
    // Sorting data by Salary in descending order
    const sortedDf = df.sortValues('Salary', { ascending: false });
    sortedDf.print();
</script>

Explanation:

  • The DataFrame is sorted by 'Salary' in descending order.

Example: Grouping and Aggregating Data

JavaScript:

<script>
    // Creating a DataFrame with Department column
    const df = new dfd.DataFrame({
        'Department': ['HR', 'Engineering', 'HR', 'Engineering', 'HR'],
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [24, 27, 22, 32, 29],
        'Salary': [50000, 60000, 45000, 70000, 65000]
    });

    // Grouping by Department and calculating the average Salary
    const groupedDf = df.groupby(['Department']).col(['Salary']).mean();
    groupedDf.print();
</script>

Explanation:

  • The DataFrame is grouped by 'Department', and the average 'Salary' is calculated.

Example: Merging DataFrames

JavaScript:

<script>
    // Creating two DataFrames
    const df1 = new dfd.DataFrame({
        'ID': [1, 2, 3, 4, 5],
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
    });

    const df2 = new dfd.DataFrame({
        'ID': [1, 2, 3, 4, 5],
        'Salary': [50000, 60000, 45000, 70000, 65000]
    });

    // Merging DataFrames on 'ID' column
    const mergedDf = dfd.merge({ left: df1, right: df2, on: ['ID'] });
    mergedDf.print();
</script>

Explanation:

  • Two DataFrames are merged on the 'ID' column.

5. Advanced Data Analysis Examples

Example: Calculating Summary Statistics

JavaScript:

<script>
    // Calculating summary statistics for the DataFrame
    const summary = df.describe();
    summary.print();
</script>

Explanation:

  • The describe method is used to calculate summary statistics for the DataFrame, including count, mean, std, min, max, and quartiles.

Example: Handling Missing Data

JavaScript:

<script>
    // Creating a DataFrame with missing values
    const df = new dfd.DataFrame({
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [24, null, 22, 32, 29],
        'Salary': [50000, 60000, null, 70000, 65000]
    });

    // Filling missing values with a specified value
    const filledDf = df.fillna({ value: 0 });
    filledDf.print();

    // Dropping rows with missing values
    const droppedDf = df.dropna();
    droppedDf.print();
</script>

Explanation:

  • Missing values in the DataFrame are handled by filling them with a specified value or by dropping rows with missing values.

Example: Applying Functions to Data

JavaScript:

<script>
    // Creating a DataFrame
    const df = new dfd.DataFrame({
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [24, 27, 22, 32, 29],
        'Salary': [50000, 60000, 45000, 70000, 65000]
    });

    // Applying a function to each element of a column
    const increasedSalaryDf = df['Salary'].apply(value => value * 1.1);
    increasedSalaryDf.print();

    // Applying a function to each row
    const ageGroupDf = df.apply(row => {
        return row.get('Age') > 25 ? 'Senior' : 'Junior';
    }, { axis: 1 });
    ageGroupDf.print();
</script>

Explanation:

  • The apply method is used to apply a function to each element of a column or to each row of the DataFrame.

Conclusion

By understanding and using Danfo.js for data analysis through

these comprehensive examples, you can efficiently work with data frames, series, and perform various data manipulation tasks. These skills are crucial for building sophisticated data-driven applications directly in JavaScript.

Further Reading