A project to design a fact and dimension star schema for optimizing queries on a Ethereum Chain database using MSSQL (Microsoft SQL Server), a relational database management system.
A data warehouse provides a consistent view of Ethereum data over time. The designed schema allows for efficient querying of data such as transaction dates, gas fees, and transaction volume information.
The ETL process for the Ethereum Chain Analysis is designed using Microsoft SQL Server Integration Services (SSIS).
The Ethereum blockchain generates a significant amount of data due to its intrinsic transparency and decentralized nature. It is also referred to as on-chain data and is openly accessible to the world.
A peer-to-peer network of mutually distrusting nodes maintains a common view of the global state and executes code upon request. The stated is stored in a blockchain secured by a proof-of-state consensus mechanism that work by selecting validators in proportion to their quantity of holdings.
Moreover, the on-chain data is timestamped, integrated, and validated into an open ledger. This important blockchain feature enables us to assess the network’s health and usage. It serves as a massive data warehouse for complex prediction algorithms, network adoption and much more.
On-chain metrics such as active addresses, total addresses and transaction volume indicate the usage and adoption of the network.
- What is the total transaction volume on the eth chain over a specific time period?
- What are the most popularly exchanged digital tokens, represented by ERC-721 and ERC-20 smart contracts?
- The biggest transactions over the last 24 hours?
- Transactions with the highest gas fee over the last 24 hours?
While the US dollar can only be divided into 100 equal parts i.e. into pennies, cryptocurrencies can be divided into much smaller atomic units.
Wei is the smallest denomination of ether, the currency used to facilitate transactional operations on the Ethereum blockchain network, where 10^18
or 1,000,000,000,000,000,000 wei
is equivalent to one ether.
Therefore, ether has multiple different levels of wei. Each subunit indicates the quantity of units but as mentioned above, have alternative names.
The table below illustrates the various units of wei.
Unit | Alternative Name | Wei Value | Gwei Value | Ether Value |
---|---|---|---|---|
Wei | - | 1 | 10^-9 Gwei | 10^-18 ETH |
Kwei | Babbage | 1,000 | 10^-3 Wei | 10^-6 Gwei |
Mwei | Lovelace | 1,000,000 | 10^-6 Wei | 10^-3 Gwei |
Gwei | Shannon | 1,000,000,000 | 10^-9 Wei | 1 Gwei |
Microether | Szabo | 1,000,000,000,000 | 10^-12 Wei | 10^-3 Gwei |
Milliether | Finney | 1,000,000,000,000,000 | 10^-15 Wei | 10^-6 Gwei |
Ether | - | 1,000,000,000,000,000,000 | 10^-18 Wei | 10^-9 Gwei |
As gas is the pricing value deemed necessary to execute a contract or other form of transaction on the Ethereum network, gwei is needed to pay for the computational power.
The standard transaction fee is 21,000 gwei on the Ethereum blockchain network. Typically, transaction fees will rise proportionately to the underlying complexity of the transaction. The methodology to calculate the transaction fee is by using the following formula:
(receipt_gas_used * gas_price) / 10^-18
CREATE TABLE [dbo].[Transfers](
[TransferID] [int] IDENTITY(1,1) NOT NULL,
[TokenAddress] [varchar](max),
[FromAddress] [varchar](max),
[ToAddress] [varchar](max),
[TokenTransferTimestamp] [datetime2](6),
[TransactionGas] [bigint],
[TransactionGasPrice] [bigint],
[TransactionGasUsed] [bigint],
[BlockHash] [varchar](max),
[BlockGasLimit] [bigint],
[BlockBaseFeePerGas] [bigint],
[BlockGasUsed] [bigint],
[BlockTransactionCount] [bigint],
[TokenName] [varchar](max),
[TokenSymbol] [varchar](max),
[TokenTotalSupply] [varchar](max),
[TokenDecimals] [varchar](max),
[ContractIsERC20] [bit],
[ContractIsERC721] [bit],
[ValueETH] [numeric](12, 6),
[ValueEUR] [numeric](8, 2),
[ValueUSD] [numeric](8, 2)
)
WITH Tokens AS (
SELECT
t.address AS TokenAddress,
t.symbol AS TokenSymbol,
t.name AS TokenName,
t.decimals AS TokenDecimals,
t.total_supply AS TokenTotalSupply,
c.is_erc20 AS ContractIsERC20,
c.is_erc721 AS ContractIsERC721
FROM
`bigquery-public-data.crypto_ethereum.tokens` t
JOIN `bigquery-public-data.crypto_ethereum.contracts` c ON c.address = t.address
)
WITH TokenTransfers AS (
SELECT
tt.token_address AS TokenAddress,
tt.from_address AS FromAddress,
tt.to_address AS ToAddress,
tt.value AS TokenTransferValueETH,
tt.block_timestamp AS TokenTransferTimestamp,
tt.transaction_hash AS TokenTransferHash,
tt.block_number AS TokenTransferBlock,
t.gas AS TransactionGas,
t.gas_price AS TransactionGasPrice,
t.receipt_gas_used AS TransactionGasUsed
FROM
`bigquery-public-data.crypto_ethereum.token_transfers` tt
JOIN `bigquery-public-data.crypto_ethereum.transactions` t ON tt.transaction_hash = t.hash
WHERE date(tx.block_timestamp) BETWEEN date_sub(current_date, INTERVAL 1 MONTH) AND current_date()
)
SELECT
tx.token_address AS TokenAddress,
tx.from_address AS FromAddress,
tx.to_address AS ToAddress,
tx.value AS TokenTransferValueETH,
tx.block_timestamp AS TokenTransferTimestamp,
trans.gas AS TransactionGas,
trans.gas_price AS TransactionGasPrice,
trans.receipt_gas_used AS TransactionGasUsed,
blox.hash AS BlockHash,
blox.gas_limit AS BlockGasLimit,
blox.base_fee_per_gas AS BlockBaseFeePerGas,
blox.gas_used AS BlockGasUsed,
blox.transaction_count AS BlockTransactionCount,
t.symbol AS TokenSymbol,
t.name AS TokenName,
t.decimals AS TokenDecimals,
t.total_supply AS TokenTotalSupply,
contx.is_erc20 AS ContractIsERC20,
contx.is_erc721 AS ContractIsERC721
FROM
`bigquery-public-data.crypto_ethereum.token_transfers` tx
JOIN `bigquery-public-data.crypto_ethereum.tokens` t ON tx.token_address = t.address
JOIN `bigquery-public-data.crypto_ethereum.transactions` trans ON trans.`hash` = tx.transaction_hash
JOIN `bigquery-public-data.crypto_ethereum.blocks` blox ON blox.number = tx.block_number
JOIN `bigquery-public-data.crypto_ethereum.contracts` contx ON contx.address = tx.token_address
WHERE date(blox.timestamp) BETWEEN date_sub(current_date, INTERVAL 1 MONTH) AND current_date()
ORDER BY date(tx.block_timestamp) DESC
LIMIT 50
The star schema for the Ethereum Chain Analysis and Data Warehouse Design consists of the following dimensions and facts:
Facts | Description |
---|---|
Transaction | To store information about individual Ethereum transactions, such as transaction hash, block hash, transaction value, and transaction gas. |
Balance | To store information about an address' balance at a given time. |
Dimensions | Description |
---|---|
Token | To store information about ERC-20 tokens, such as token contract address, token symbol, token name, and token total supply. |
Timestamp | To store timestamps for various Ethereum-related events. |
Token Snapshot | Slowly Changing Dimension to store circulating supply. |
Block | An Ethereum block that stores Ethereum transactions and defines the gas fee for the next block. |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | Dimension |
BlockKey | int | Dimension |
TokenKey | int | Dimension |
TimeStampKey | bigint | Dimension |
TransactionHash | binary(1) | Degenerated Dimension |
FromAddress | varbinary(1) | Degenerated Dimension |
ToAddress | varbinary(1) | Degenerated Dimension |
ValueETH | bigint | Metric |
GasUsed | bigint | Metric |
GasPrice | bigint | Metric |
GasPaid | bigint | Metric |
TransactionType | int |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | Dimension |
TimestampKey | int | Dimension |
Address | varbinary(1) | Degenerated Dimension |
BalanceETH | bigint | Metric |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | - |
Address | varbinary(1) | |
Symbol | varchar(4) | |
Name | varchar(50) | |
Decimals | bigint | |
ERC20 | bit | |
ERC721 | bit |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | - |
UnixTimestamp | bigint | |
HH | int | |
MM | int | |
SS | int | |
AM_PM | int | |
Day | int | |
Week | int | |
WeekName | varchar(9) | |
Month | int | |
MonthName | varchar(9) | |
Quarter | int | |
Year | int |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | - |
TimestampKey | int | Dimension |
BlockHash | varbinary(1) | Degenerated Dimension |
GasLimit | bigint | Metric |
GasUsed | bigint | Metric |
TransactionCount | bigint | Metric |
BaseFeePerGas | bigint | Metric |
Column | Type | Designation |
---|---|---|
SurrogateKey | int | - |
TokenKey | int | Dimension |
TotalSupply | bigint | Metric |
create database DataWarehouse
go
use DataWarehouse
go
CREATE TABLE DimTimestamp
(
SurrogateKey INT NOT NULL
CONSTRAINT DimTimestamp_SurrogateKey PRIMARY KEY,
UnixTimestamp BIGINT NOT NULL,
HH INT NOT NULL CHECK (HH >= 0 AND HH <= 23),
MM INT NOT NULL CHECK (MM >= 0 AND MM <= 59),
SS INT NOT NULL CHECK (SS >= 0 AND SS <= 59),
AM_PM INT NOT NULL CHECK (AM_PM IN (0, 1)),
Day INT NOT NULL CHECK (Day >= 1 AND Day <= 31),
Week INT NOT NULL CHECK (Week >= 1 AND Week <= 53),
WeekName VARCHAR(9) NOT NULL,
Month INT NOT NULL CHECK (Month >= 1 AND Month <= 12),
MonthName VARCHAR(9) NOT NULL,
Quarter INT NOT NULL CHECK (Quarter >= 1 AND Quarter <= 4),
Year INT NOT NULL
)
go
create table DimBlock
(
SurrogateKey int not null
constraint DimBlock_SurrogateKey
primary key,
BlockHash varbinary not null,
GasLimit bigint not null,
GasUsed bigint not null,
TimestampKey int not null
constraint DimBlock_DimTimestamp_SurrogateKey_Foreign_Key
references DimTimestamp,
TransactionCount bigint not null,
BaseFeePerGas bigint not null
)
go
create table DimToken
(
Address varbinary not null
constraint DimToken_Address
unique,
Symbol varchar(4) not null,
Name varchar(50) not null,
Decimals bigint not null,
ERC20 bit not null,
ERC721 bit not null,
SurrogateKey int not null
constraint DimToken_SurrogateKey_Foreign_Key
primary key
)
go
create table DimTokenSnapshot
(
SurrogateKey int not null
constraint TokenSnapshot_SurrogateKey
primary key,
TokenKey int not null
constraint DimTokenSnapshot_DimToken_SurrogateKey_Foreign_Key
references DimToken,
TotalSupply bigint not null
)
go
create table FactBalance
(
SurrogateKey int not null
constraint FactBalance_SurrogateKey
primary key,
Address varbinary not null,
BalanceETH bigint not null,
TimestampKey int not null
constraint FactBalance_DimTimestamp_SurrogateKey_Foreign_Key
references DimTimestamp
)
go
create table FactTransaction
(
TransactionHash binary not null,
BlockKey int not null
constraint FactTransaction_DimBlock_SurrogateKey_Foreign_Key
references DimBlock,
FromAddress varbinary not null,
ToAddress varbinary not null,
TokenKey int not null
constraint FactTransaction_DimToken_SurrogateKey_Foreign_Key
references DimToken,
ValueETH bigint not null,
GasUsed bigint not null,
GasPrice bigint not null,
GasPaid bigint not null,
TransactionType int not null,
SurrogateKey int not null
constraint FactTransaction_SurrogateKey
primary key
)
go