Skip to content

Data Scanners

Pedro Holanda edited this page Jun 22, 2024 · 1 revision

Scrooge Data Scanners

Scrooge supports various data scanners that access online sources to facilitate the retrieval of financial data. This page provides documentation for the scanners and information on their usage.

Ethereum

With the Ethereum scanner, it is possible to directly scan the eth_getLogs RPC Method and return the data as a DuckDB table. This scanner parallelizes the processing of blocks across multiple threads.

This scanner requires an Ethereum node and uses JSON-RPC to retrieve data from it. By default, it tries to connect to a local node at http://127.0.0.1:8545. However, it can be configured to use hosted nodes that the user has access to by setting the eth_node_url variable.

Set node example:

set eth_node_url= 'https://mainnet.infura.io/v3/{key}';

The scanner is called read_eth and takes five parameters:

  1. The address to be tracked. This must be either a hex value or a known coin (e.g., USDT).
  2. The topic. This must be either a hex value or a known topic (e.g., Transfer).
  3. From Block. The start block ID from which the scanner starts.
  4. To Block. The end block ID at which the scanner stops.
  5. blocks_per_thread (Optional). The number of blocks to be processed per thread. This not only affects performance but might also be important if you are using a provider that limits the number of blocks you can query per request.

read_eth example:

set eth_node_url= 'https://mainnet.infura.io/v3/{key}';

FROM read_eth(
    'USDT',
    'Transfer',
    20034078,
    20034100, 
    blocks_per_thread=1
)
order by all
limit 1
----
"0xdac17f958d2ee523a2206206994597c13d831ec7"	Transfer	"0x0208a65db173bee464fe73c39156a25aa63b8e7f21b658dce7b933b563fe31f6"	20034081	0	121	false	[0x000000000000000000000000d0e4ccee1bbaf4bb701dd2a659c88ea849972095, 0x000000000000000000000000076aaf6914030df63fed94e4bcf22bdcfe05648e]	"0xe58cdbcc396c7bfcca433315c45192b81b9d0f2431b3fe0dcc3f4994242fd7b0"	19

Tip

Accessing node data can be much slower than accessing native DuckDB data. It currently incurs network and JSON parsing costs. In general, it is useful to create DuckDB tables directly from the scanner: CREATE TABLE t as FROM FROM read_eth('USDT','Transfer',20034078,20034100) This makes subsequent access of the table much faster.

Yahoo Finance

This scanner directly retrieves data from Yahoo Finance and transforms it into a DuckDB Table. The function takes 4 parameters.

  1. The symbol of the financial object we want to query (e.g., ^GSPC is the symbol for the S&P500 index).

  2. The date start period we want to analyze.

  3. The date end period we want to analyze.

  4. The interval we want to fetch data from yahoo. We currently support: "1d" for 1 day. "5d" for 5 days. "1wk" for a week. "1mo" for a month. "3mo" for 3 months.

yahoo_finance example

select * FROM yahoo_finance("^GSPC", "2017-12-01", "2017-12-10", "1d")

This returns a table with 7 columns.

  1. Date: The reference date of the row.

  2. Open: Refers to the price at which the financial object started trading at the beginning of the trading day.

  3. High: The highest value of the financial object on that trading day.

  4. Low: The lowest value of the financial object on that trading day.

  5. Close: Refers to the final price at which the financial object was traded at the end of the trading day.

  6. Adj Close: The adjusted closing price of the financial object. It is calculated by considering any corporate actions that affect its price, such as stock splits, dividends, or rights offerings.

  7. Volume: The total number of shares or contracts of the financial object traded that day.

Additionally, you can also submit a list of financial symbols, instead of only one financial symbols, to the yahoo_finance function. This will return the same 7 columns as the previous example, with one additional Symbol column, with the respective symbol.

yahoo_finance stock list example

select * FROM yahoo_finance(["^GSPC","BTC-USD"], "2017-12-01", "2017-12-10", "1d")

Tip

Similar to the ethereum scanner, accessing yahoo data can be much slower than native DuckDB data. It currently incurs network and CSV parsing costs. In general, it is useful to create DuckDB tables directly from the scanner.