The ethereum-datafarm aims to provide researches quick access to Ethereum blockchain data by offering an easy-to-use interface to scrap event logs from contracts and save them in .csv and .pickle format and/or directly sync to Google BigQuery.
29-04-2021 The ethereum-datafarm is still a project in progress, the latest feature added was BigQuery support and a flexible switch between contract config files, which allows to run the application with extremely low memory requirements.
- Scraps every type of event data from pre-defined contracts
- Saves each day in a separate file and/or syncs to Google BigQuery
- No local or Infura node needed => Etherscan.io API is used
- Possibility to add or remove contracts to/from the queue during runtime
- Possibility to securely terminate execution (config/end.txt file)
- Already pre-configured for multiple events used by various contracts
- Low CPU, RAM and SSD requirements (AWS S3 is used)
- .Csv, .pickle and SQL (GBQ) support
- Throttles down the number of requests to the API when the latest block is reached
- Logging (optional)
Or check out this sample output file of dai transfers
Tornado Cash Dasboard (mobile only)
ERC-20s Dasboard (mobile only)
from farm.Farm import *
aws_bucket = "ethereum-datahub" # Your AWS bucket
# Run
if __name__=="__main__":
# Load contracts
contracts = load_contracts(aws_bucket=aws_bucket)
# Initialize Farm and get status
farm = Farm(contracts=contracts, aws_bucket=aws_bucket).status()
farm.start_farming()$ git clone https://github.com/Nerolation/ethereum-datafarm
$ cd ethereum-datafarm
$ virtualenv ./venv
$ . ./venv/bin/activate
$ pip3 install -r requirements.txt
$ nano .apikey/key.txt => <API_Key>
$ nano .aws/credentials => <AWS_credentials>
$ nano .aws/config => <Configs>The .aws/credentials file might look like the following (replace the aws_access_key_id and aws_secret_access_key with your own details):
[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
[default]
region=eu-central-1
output=json
Check out the official documentation for further infos
- Python 3.5 or higher
- AWS S3 bucket and the related credentials
- Etherscan API key (for free at etherscan.io)
- Right filesystem structure (see below)
ethereum-datafarm/
|-- farm/
| |-- helpers/
| | |-- __init__.py
| | |-- Contract.py
| | |-- DailyResult.py
| | |-- Method.py
| | |-- EventHelper.py
| | |-- ContractHelper.py
| |
| |-- __init__.py
| |-- Farm.py
|
|-- .aws/
| |-- credentials // Not in repo
| |-- config // Not in repo
|
|-- .apikey/
| |-- apikey.txt // Not in repo
|
|-- config/
| |-- end.txt
|
|-- logs/ // optional, if there, then logging is activated
| |-- datafarm.log
|
|-- README
s3://
|-- <Your Bucket's name>/
| |-- config/
| | |-- contracts/
| | | |-- contracts.csv
| | |-- contracts2/ // optional, if there, then fileswitch is activated
| | | |-- contracts.csv
Make sure that contracts.csv has the following structure: (Contract address, custom name, canonical Event, start block, chunksize)
0x35A18000230DA775CAc24873d00Ff85BccdeD550,compound_uni,Approval(address,address,uint256),10921410,20000
0x35A18000230DA775CAc24873d00Ff85BccdeD550,compound_uni,Transfer(address,address,uint256),10921410,20000
0xf650C3d88D12dB855b8bf7D11Be6C55A4e07dCC9,compound_tether,Approval(address,address,uint256),9879363,20000If you'd like to remove a contract during runtime, just switch the contracts address in the contracts.csv file with the word "remove":
remove,compound_uni,Approval(address,address,uint256),10921410,20000The end.txt file is used to securely terminate the programme at a known position in the loop.
False means "no end", True equals "end, stop the programme"
Make usage of logging by creating a directory named "logs" (see above @ local file structure)
This will automatically enable the logging module.
Initialize farm and starts scraping data:
- Loads contracts from config/contracts.csv file and creates Contract objects
- Starts farm instance
- Loops over contracts and saves data into .csv and .pickle files
Logging Output:
$ Timestamp - Current timestamp
$ Contract - Address of contract being processed
$ Current Chunk - Block range being processed
$ Chunk Timestamp - Timestamp of processed block range
$ Events - Number of events found in last chunk
$ Chsz - Current Chunksize
$ Fc - Filecounter - Files safed since initializationComparison of the number of Transfers of the largest Ethereum-based Stablecoins
Click here for more examples using the data (more coming...)
- Safely terminated execution with KeyboardInterrupt commands
- Fix table-schema-bug that occurs for certain contracts
Visit toniwahrstaetter.com for further details!
Anton Wahrstätter, 03.04.2021
