Skip to content

A Python library that allows you to easily access, traverse, and convert your Notion workspace data into Markdown and Pandas DataFrames. It simplifies interacting with the Notion API, enabling you to index your pages, databases, and blocks for various use cases like data analysis, documentation generation, and more.

License

Notifications You must be signed in to change notification settings

joe-stifler/notion_indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

notion_indexer

A Python library to index and traverse Notion workspaces, converting pages and databases into easily parsable formats like Markdown and Pandas DataFrames.

Features

  • Traverse Notion Hierarchy: Explore pages, databases, and blocks within a Notion workspace up to a specified depth.
  • Markdown Conversion: Transform Notion pages and databases into Markdown format for easy integration with other tools and platforms. Supports various block types including headings, lists, toggles, tables, and more.
  • DataFrame Representation: Convert Notion databases into Pandas DataFrames for convenient data analysis and manipulation.
  • Filtering and Sorting: Query databases with filters and sorts to retrieve specific data.
  • Easy-to-Use API: Simple and intuitive interface for interacting with Notion data.

Installation

pip install notion-sdk pandas
pip install git+https://github.com/joe-stifler/notion_indexer.git # or clone and install locally

Usage Basic Example

from notion_indexer.notion_client import NotionClient
from notion_indexer.notion_reader import NotionReader
import os
from dotenv import load_dotenv

# Load Notion API key from .env file
load_dotenv()
integration_token = os.getenv("NOTION_API_KEY")

# Initialize NotionReader
reader = NotionReader(integration_token=integration_token)

# Load data from a Notion page or database URL
page_url = "https://www.notion.so/your-workspace/your-page-or-database-id"  # Replace with your actual URL
page_data = reader.load_data(page_url, max_depth=2)  # Adjust max_depth as needed


# Convert to Markdown
markdown_output = page_data.to_markdown()
print(markdown_output)


# If it's a database, convert to DataFrame
if hasattr(page_data, 'to_dataframe'):
    df = page_data.to_dataframe()
    print(df)


# Working with databases
database_url = "https://www.notion.so/your-workspace/your-database-id"
filter = {
    "property": "Status",  # Replace with your property name
    "select": {
        "equals": "In Progress"  # Replace with your filter value
    }
}
database_data = reader.load_data(database_url, max_depth=1, filter=filter)
df = database_data.to_dataframe()
print(df)

Filtering and Sorting Databases

You can filter and sort the results when querying databases:

filter = {
    "property": "Date",
    "date": {
        "on_or_after": "2024-01-01"
    }
}

sorts = [
    {
        "property": "Name",
        "direction": "ascending"
    }
]

database_data = reader.load_data(database_url, filter=filter, sorts=sorts)

Contributing

Contributions are welcome! Feel free to open issues and submit pull requests.

License

MIT

Acknowledgements

Built using the Notion Python SDK.

Utilizes Pandas for DataFrame functionality.

About

A Python library that allows you to easily access, traverse, and convert your Notion workspace data into Markdown and Pandas DataFrames. It simplifies interacting with the Notion API, enabling you to index your pages, databases, and blocks for various use cases like data analysis, documentation generation, and more.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published