Bank Scraper

This is a Python-based application designed to scrape financial data from various banking and investment platforms. It supports multiple entities and features, providing a unified interface to gather and process financial information.

This is not actively maintained and was meant only for personal use, so some banks/entities/features/instruments may not work, be outdated or partially implemented. That's why this documentation is so scarce.

Features

Scrape financial data from multiple entities
Support for various financial products (stocks, funds, real estate, etc.)
Dynamic and customizable data export to Google Sheets
Virtual scraping for simulated data

Supported Entities

URBANITAE (wallet & investments)
MY_INVESTOR (periodic automatic fund contributions, funds, stocks/ETFs, main account position & related cards)
SEGO (wallet & factoring)
TRADE_REPUBLIC (stocks/ETFs/Crypto & account) [requires 2FA]
UNICJA (main account and related cards & mortgage)
- As web login uses Akamai bot protection, setting UNICAJA_ABCK is needed.
WECITY (wallet & investments) [requires 2FA]
MINTOS (wallet & loan distribution) (experimental)
- Needs Selenium to resolve reCAPTCHA, checkout docker-compose.yml.
F24 (savings and brokerage account & deposits)

Entity Features

Not all entities support the same features. Some or all of the following features are available for each entity:

POSITION: Fetch the current financial position.
AUTO_CONTRIBUTIONS: Fetch the auto-contributions of the entity.
TRANSACTIONS: Fetches all the account/investment transactions.
HISTORIC: Aggregates past positions and txs to create a history of past and current investments.

Google Sheets export

This project allows to create complex dashboards and tables in Google Sheets, aggregating and formatting the scraped data. Check Export & Import Configuration for more info.

Setup

Docker

Two docker images are available, a Selenium one and a light one (ex-selenium). The first one is the default, which currently is needed for Mintos, as it contains Selenium and reCAPTCHA resolution related Python and SO dependencies (like ffmpeg).

Both are available at Docker Hub marcosav/bank-scraper.

A very basic front end is available at marcosav/bank-scraper-front.

Development

Clone the repository:

git clone https://github.com/marcosav/bank-scraper.git
cd bank-scraper

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Install the required dependencies:

pip install -r requirements.txt
pip install -r requirements-selenium.txt  # If you want to use Selenium for reCAPTCHA

Export & Import Configuration

Checkout the default template_config.yml config that will be created on first start, which contains some examples of tables and summary dashboards.

Environment Variables

Checkout example docker-compose.yml for the environment variables that can be used to override the default config, set Mongo connection related stuff, Google credentials, entity session caches...

Credentials are stored in the environment variables {ENTITY_NAME}_USERNAME and {ENTITY_NAME}_PASSWORD. Except for MYI_USERNAME and MYI_PASSWORD in MyInvestor case, and TR_PHONE and TR_PIN for Trade Republic.

Also, credentials_reader.py is a basic and unsecure implementation to retrieve credentials from environments, there you can get the needed environment names.

Usage

Start the application:
```
python app.py
```
Use the provided API endpoints to interact with the scraper:
- GET /api/v1/scrape: Get available entities.
- POST /api/v1/scrape: Start a scraping process for a specific entity, this endpoint will also prompt for the needed 2FA code if the entity requires it. It will return all scraped data.
```
{
     "entity": "TRADE_REPUBLIC",
     "features": ["POSITION", "TRANSACTIONS"],
     "code": "0000",                              // Only if 2FA is needed
     "processId": "xxxxxxxxxxx",                  // Same
     "avoidNewLogin": false,                      // Avoid new login if session has expired and 2FA is required (optional, defaults to false)
 }
```
- POST /api/v1/update-sheets: Update Google Sheets with the latest data.
- POST /api/v1/scrape/virtual: Perform a virtual scrape.

Credits

Selenium reCAPTCHA resolution is based on sarperavci/GoogleRecaptchaBypass project, using a slightly modified version of Selenium version. In an attempt of using Playwright I made an adaptation for it here, it works, but has some troubles with headless mode.
Trade Republic client is based on project pytr-org/pytr, although it has been heavily modified to allow resumable sessions, fetch transactions and other minor changes, this library has been vital for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
bank-scraper		bank-scraper
resources		resources
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-selenium.txt		requirements-selenium.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Scraper

Table of Contents

Features

Supported Entities

Entity Features

Google Sheets export

Setup

Docker

Development

Export & Import Configuration

Environment Variables

Usage

Credits

About

Releases

Packages

Languages

marcosav/bank-scraper

Folders and files

Latest commit

History

Repository files navigation

Bank Scraper

Table of Contents

Features

Supported Entities

Entity Features

Google Sheets export

Setup

Docker

Development

Export & Import Configuration

Environment Variables

Usage

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages