Setup
Requirements
To operate the Spacebox you'll need an x86-64/arm machine with >8GB of RAM, and >4 CPU cores. For storage it is better to use faster SSD or NVME-like drives, size depends on the chain and TX volume(cosmoshub-4 index utilizes ~1 TB). Spacebox runs in Docker with help ofDocker-compose. Also, you will need the RPC and GRPC endpoints of the chain, which you will index with the appropriate historical state.
Quick start
Get
Clone the Spacebox repo, it contains everything to get started.
console
$ git clone https://github.com/bro-n-bro/spacebox.git
Cloning into 'spacebox'...
remote: Enumerating objects: 1111, done.
remote: Counting objects: 100% (59/59), done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 1111 (delta 17), reused 23 (delta 16), pack-reused 1052
Receiving objects: 100% (1111/1111), 2.82 MiB | 10.48 MiB/s, done.
Resolving deltas: 100% (601/601), done.
$ ls spacebox/
config docker-compose-local.yaml docker-compose.yaml docs go.mod LICENSE migrations mkdocs.yml README.md
Set
Fill .env with chain settings like chain prefix, node RPC & GRPC, and start \ stop height:
.env
START_HEIGHT=5048767 # Start block height
STOP_HEIGHT=0 # Stop block height, 0 for actual height
WORKERS_COUNT=15 # Go workers to pull data in async mode
SUBSCRIBE_NEW_BLOCKS=true # pull actual blocks
# Chain settings
CHAIN_PREFIX=cosmos # Prefix of indexing chain
RPC_URL=http://0.0.0.0:26657 # RPC API
GRPC_URL=0.0.0.0:9090 # GRPC API, no HTTP\S prefix
GRPC_SECURE_CONNECTION=false # GRPC secure connection
Launch
When .env
is filled to start all containers run:
console
$ docker-compose up -d
Creating network "spacebox_default" with the default driver
Creating spacebox_hasura_1 ... done
Creating spacebox_mongo-crawler_1 ... done
Creating spacebox_ndc-clickhouse_1 ... done
Creating spacebox_clickhouse_1 ... done
Creating spacebox-zookeeper ... done
Creating spacebox_postgres_1 ... done
Creating spacebox_kafka_1 ... done
Creating spacebox-kafka-ui ... done
Creating spacebox_migration_1 ... done
If everything is set correctly you will see the following in spacebox_crawler_1
:
console
$ docker logs spacebox_crawler_1 -f
INF starting app cmp=app version=v1.2.0
INF module registered cmp=app name=raw version=v1.2.0
INF all modules registered cmp=app count=1 version=v1.2.0
INF starting cmp=app service=storage version=v1.2.0
INF started cmp=app service=storage version=v1.2.0
INF starting cmp=app service=grpc_client version=v1.2.0
INF started cmp=app service=grpc_client version=v1.2.0
INF starting cmp=app service=rpc_client version=v1.2.0
INF started cmp=app service=rpc_client version=v1.2.0
INF starting cmp=app service=broker version=v1.2.0
INF started cmp=app service=broker version=v1.2.0
INF starting cmp=app service=worker version=v1.2.0
INF started cmp=app service=worker version=v1.2.0
INF starting cmp=app service=server version=v1.2.0
INF listening for new block events cmp=worker version=v1.2.0
INF started cmp=app service=server version=v1.2.0
INF starting cmp=app service=health_checker version=v1.2.0
INF exit not needed cmp=worker version=v1.2.0
INF started cmp=app service=health_checker version=v1.2.0
INF start metrics scraper cmp=server version=v1.2.0
INF application started cmp=app version=v1.2.0
INF parse block cmp=worker height=1 version=v1.2.0 worker_number=1
INF parse block cmp=worker height=2 version=v1.2.0 worker_number=11
INF parse block cmp=worker height=4 version=v1.2.0 worker_number=19
INF parse block cmp=worker height=5 version=v1.2.0 worker_number=17
INF parse block cmp=worker height=3 version=v1.2.0 worker_number=6
To stop and remove all containers use the following:
console
docker-compose down
Issues?
If after docker-compose up
you see the error:
console
ERROR: for crawler Container "d0d8ccc463b8" is unhealthy.
ERROR: Encountered errors while bringing up the project.
Then it is necessary to grant the ownership of the volumes
folder to the docker user (uid 1001):
console
$ docker-compose down
$ chown -R 1001:1001 volumes/
$ docker-compose up -d
It usually requires around ~2 minutes to start everything. Check the log of each container with docker logs <container_name>
to see what's going under the hood. It is possible to adjust some parameters on the fly, edit .env
, and restart the appropriate container.
Upgrading
Spacebox main part spacebox-crawler follows semantic versioning: MAJOR.MINOR.PATCH
+ some releases are chain-specific and contain chain name in the tag, e.g. v2.0.0-neutron
.
As long as spacebox is considered as a beta software, every major version upgrade will require re-sync, check the release details for more info on that.
We recommend using a specified version for the crawler in docker-compose.yml
, instead of :latest
that will ease troubleshooting and upgrades:
docker-compose.yaml
version: "3.9"
services:
crawler:
image: bronbro/spacebox-crawler:v1.2.0
For upgrade with full resync stop everything remove the volmes folder and start over with the newer version of the crawler:
console
$ docker-compose down
$ rm -rf volumes/
$ docker-compose up -d
If an upgrade is happening alongside the chain upgrade, then it is possible to relaunch only the crawler:
console
$ docker stop spacebox_crawler-1
$ docker rm spacebox_crawler-1
Set new version in docker-compose.yaml
services:
crawler:
- image: bronbro/spacebox-crawler:v1.2.0
+ image: bronbro/spacebox-crawler:v1.3.0
Start crawler container
console
docker-compose up -d crawler
In such case, indexation with the new version will start since the block you will upgrade the crawler's container. To avoid data inconsistency it might be a good idea to set the upgrade height in the .env as STOP_HEIGHT=
and perform the container upgrade safely.
Config options
Spacebox is configured over environment variables, set in the .env
file.
[!TIP] To apply .env changes just restart the
spacebox_crawler-1
container. However same does not apply to the rest of the containers(aka kafka or clickhouse), and they would need to be recreated.
Crawler
START_TIMEOUT
Timeout to start application. Do not change without a strict purpose.
STOP_TIMEOUT
Timeout to stop application. Do not change without a strict purpose.
LOG_LEVEL
Supported values: info
, debug
, error
.
Debug level will produce more logs, including checks of the previously parsed blocks. Error log will show only errors if there's any.
RECOVERY_MODE
Default value: false
Intended for debugging purposes mostly. In case of panic
will output a full error log, without crashing the crawler. Significantly(!) lowers the perfomance.
START_HEIGHT
Default value: 0
Height to start crawling from. Ex: for cosmoshub-4
must be set to 5200792 as it was the first block in this chain.
If set to 0 genesis will be parced automatically.
STOP_HEIGHT
Default value: 0
Height to stop indexing at. Might be useful when necessary to index only certain parts of the chain. When set to 0 crawler will pull the latest block from the RPC at the moment of launch and use it as stop height.
WORKERS_COUNT
A number of the asynchronous Go workers inside the crawler. Recommended value: <= CPU cores in your system. A huge number of workers may exhaust the RPC node, resulting in missed blocks. Usually, more workers don't mean faster crawling, best performing number is 30-40 workers (even on 88-core machine).
SUBSCRIBE_NEW_BLOCKS
Default value: true
If set to false, the crawler will stop indexation on STOP_HEIGHT
without an app crash. If set to true, the crawler will subscribe to the new coming blocks over the websocket and will parce them alongside to historical blocks.
[Warning!] If SUBSCRIBE_NEW_BLOCKS=true
spacebox_crawler_last_processed_block_height
metric will display the highest indexed block, even though historical blocks might still be processed.
PROCESS_ERROR_BLOCKS
If set to true crawler will attempt to re-index the blocks that had errors during previous processing(based on info stored in Mongo). Re-indexation would be attempted every PROCESS_ERROR_BLOCKS_INTERVAL
.
PROCESS_GENESIS
If set to true crawler will parce genesis to have genesis initial data in db. If START_HEIGHT
set to 0 genesis would be parced as well.
Text metrics
METRICS_ENABLED
Enable or disable crawler text metrics endpoint. Compatible with Prometheus.
SERVER_PORT
Specify the port for text metrics.
Chain settings
CHAIN_PREFIX
Specify the account prefix for the chain you going to index. Typically found in the beginning of every address, e.g. for cosmos106yp7zw35wftheyyv9f9pe69t8rteumjxjql7m
chain prefix is cosmos
.
RPC_URL
& GRPC_URL
GRPC_TIMEOUT
Timeout for the GRPC requests. Default value: 15s.
RPC_TIMEOUT
Timeout for the RPC requests. Default value: 15s.
Node RPC and GRPC endpoints. By default served on port 26657 and 9090.
[TIP!] For better performance, we advise running the indexer as "close" to the chain node as possible, ideally on the same machine.
If you running both crawler and chain node on the same host you need to add the following line to the crawler's docker-compose.yaml
section:
docker-compose.yaml
ports:
- '2112:2112'
+ extra_hosts:
+ - "host.docker.internal:host-gateway"
And set RPC and GRPC addresses in the .env
accordingly:
RPC_URL=http://host.docker.internal:26657
GRPC_URL=host.docker.internal:9090
Also, depending on your firewall setup, it might be required to allow connections from the docker subnet to the chain endpoint.
GRPC_SECURE_CONNECTION
Set to true if your GRPC endpoint running with SSL encryption enabled. Leave as false if connecting directly to node port or within the same localhost.
GRPC_MAX_RECEIVE_MESSAGE_SIZE_BYTES
Defines the maximum size of a single message crawler will process. If you see something similar to ...failed to get block: rpc error: code = ResourceExhausted desc = grpc: received message larger than max...
in the crawler log - increase this parameter and restart the container with spacebox_cralwer. Chains might have very heavy blocks, for ex. neutron's testnet pion-1 have some block_results of the size ~170mb.
Broker settings
BROKER_SERVER
Address of Kafka broker. Do not change without a strict purpose.
BROKER_ENABLED
Enable\disable message publishing messages to the broker. Do not change without a strict purpose.
PARTITIONS_COUNT
Count of the partitions in the broker. Do not change without a strict purpose.
MAX_MESSAGE_MAX_BYTES
Define the maximum size of the message in Kafka. Requires Kafka container recreation. Default value: 5242880 (5Mb).
BATCH_PRODUCER
Enable batch producer to process messages in Kafka by batches. Might increase performance, but considered an experimental feature.
KAFKA_UI_PASSWORD
Password for kafka-UI. Change for production deployment!
Mongo settings
MONGO_CRAWLER_URI
Address of Mongo db. Crawler utilizes mongo-db as a check-list for parsed block status.
MONGO_USER
Mongo username used by the crawler.
MONGO_PASSWORD
Mongo user password used by the crawler. Change for production deployment!
MAX_POOL_SIZE
Maximum pool size in Mongo. Do not change without a strict purpose.
MAX_CONNECTING
Maximum Mongo connections. Do not change without a strict purpose.
Clickhouse settings
CLICKHOUSE_PASSWORD
Password for clickhouse default
user. Change for production deployment! Default user will require some privileges upgrade to allow it new users creation.
Health checker
HEALTHCHECK_ENABLED
If set to true the crawler will periodically check the health of the node endpoints, and behave accordingly.
HEALTHCHECK_FATAL_ON_CHECK
If set to true the crawler will crush if the health check has failed. That will result in a container restart.
HEALTHCHECK_MAX_LAST_BLOCK_LAG
Interval for the new block to appear in the RPC, to count endpoint healthy.
HEALTHCHECK_INTERVAL
Interval to perform a health check.
HEALTHCHECK_START_DELAY
Delay for the first health check after the start.