Model Setup Documentation

Pnyx makes use of the Pocket Network to enable any model provider to join the router rotation and earn incentives in doing so. The better the model, the most likely the router will be to select it!

Note: Currently the only way to connect a model is through the Pocket Network. This process will require you to stake an amount of tokens in the network (i.e. you will need to lock-up some money, it is refunded after 21 days if the supplier is un-staked). We are working to get the fees lower and provide other methods to join the Pnyx routing, stay tuned!

The paths to the Pnyx:

Fully Permissionless - Pocket Network (Advanced difficulty)

Fully Permissionless - Pocket Network

If you want to add a model to the Pnyx router but you want to stay in the dark, or you just don't want to talk to anybody, this is the way. We won't ask who you are nor keep you from providing services, you will enter the playground as any other humble model and if you prove to be worthy (i.e. good for any given task) the router will automatically assign work to you. To achieve this we use the Pocket Network, a blockchain service.

The Pocket Network is a decentralised API protocol that enables any kind of services (RPC based) to join, no gate keeping, no KYC, just connect and relay. In this document we will show you how to set-up your model and connect it to the network and hence become part of Pnyx.

Overview

We will be launching 4 different processes using this tutorial:

vLLM Engine : This is a high performance LLM serving platform.
Poncho Side-car: A lightweight service that will enable you to abstract your model details from the Pocket Network.
RelayMiner : The entry point to the Pocket Network, this service will manage the verification of relays and request payment for service.
Pocket Network Full Node: This is used by the RelayMiner to communicate with the network.

Requirements

Storage

300 GB of disk space for Pocket Network full node (~250 GB) and vLLM image + model weights

Memory

16 GB RAM minimum

Processing

6 vCPUs + GPU (NVIDIA RTX 20 series or newer for this tutorial)

Tokens

60,000 POKT Tokens for staking

Docker Compose: You will need to have it installed and make sure it has the nvidia-docker toolkit working correctly.
A Domain Name: Your service needs to be reachable, so you will need a domain name to point to your RelayMiner deployment.

Staking your Supplier

The first step is creating and staking your supplier in the Pocket Network. The supplier is your identity in the blockchain and it will be the recipient of the payments you receive from traffic.

For this you will first need to install the pocketd binary in your system, you can get it directly in the Pocket Network repository. To install it follow the provided guide here. After that, proceed to create your account:

Create Supplier Account

pocketd keys add my-supplier

After that you will see a message with a field like: - address: pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj (not this one exactly...) that will be your supplier address. That's the public key (they all start with pokt), save it and also save your recovery phrase safely!

The next step is to fund your wallet with 60010 POKT tokens, the extra 10 will be used for fees and node operation (your account cannot be empty). You can buy POKT tokens in many exchanges, which one will depend on your country of residence (you can find a list of exchanges in Coinmarketcap).

Once you have the tokens in your supplier address, you can proceed to stake it in the Pocket Network. To do this, first create a yaml file with the following content:

supplier_stake_config.yaml

owner_address: pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
operator_address: pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
stake_amount: 60000000000upokt
default_rev_share_percent:
  pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj: 100
services:
  - service_id: "text-generation"
    endpoints:
      - publicly_exposed_url: https://relayminer.your.domain.name
        rpc_type: JSON_RPC

(Replace the address and domain name!)

We are using a service called text-generation which is a generic service for LLM models, since this is a permissionless network, we cannot force you to use any specific model. Save this file as supplier_stake_config.yaml and send the stake command by doing:

Stake Supplier

pocketd tx supplier stake-supplier \
    --config=supplier_stake_config.yaml \
    --from=my-supplier \
    --gas=auto \
    --gas-prices=0.000001upokt \
    --gas-adjustment=1.7 \
    --node https://shannon-grove-rpc.mainnet.poktroll.com \
    --yes

At the end you will receive a txhash you can copy it and search for it in POKTscan, it should take less than 5 minutes to see it successfully processed. Nevertheless it can take up to 30 minutes for your supplier to be detected by the network's applications.

Congratulations, your supplier identity is already in the Pocket Network! Now let's deploy the backend...

Model Backend Configuration

We will start by writing the configuration for all the model backend services to be deployed and then start all of them at once using the provided docker-compose file (see below).

The first thing we need up and running is the vLLM model, create a .env file and add the following lines:

.env

# Your HuggingFace token, vLLM will use this to download the model if needed
HF_TOKEN=hf_zzzzzzz
# Where you want to keep the models in the host machine
MODELS_PATH=/your/host/huggingface_hub/
# The model to deploy (this is just a tutorial, aim for 30B and up for this service)
MODEL_NAME=meta-llama/Llama-3.2-1B-Instruct
# Some configs, feel free to change
DTYPE=auto
GPU_MEMORY_UTILIZATION=0.90
MAX_MODEL_LEN=4096

The next thing that you will need is to deploy the sidecar application, which we called Poncho. The job of this app is to abstract (and gatekeep) your deployment from the Pocket Network. Its main task is to mask your model's name. Since the applications in the Pocket Network won't know what model you deploy, this simple service opens all requests and replace the incoming model name in the JSON payload and replace it with your deployed model name. So, create a file named sidecar.yaml and add this content:

sidecar.yaml

log_level: "info"
# Here is where the Poncho will respond to (where the relayminer will talk to)
server:
    host: "0.0.0.0"
    port: 8000
# This is the backend of vLLM, deployed in the docker-compose network
vllm_backend:
    host: "http://vllm-relayminer"
    port: 8800
    # This is your real model name, this is used to patch incoming requests
    model_name_override: "meta-llama/Llama-3.2-1B-Instruct"
    # This is nice extra, logprobs will spike your VRAM usage (and leak model data), so you can just reject any request that asks for logprobs.
    allow_logprobs: false
routing:
    timeout_seconds: 600
    max_payload_size_mb: 10
model_config_data:
    # This will be the name that you will be using to respond requests, it will replace whatever comes from vLLM response
    model_public_name : "pocket_network"
    # This can be used to limit your context length when serving the Pocket Network
    max_position_embeddings : 4096

Also, in the .env you created before, add the following lines:

.env (continued)

# This will be mounted later in the docker compose
PONCHO_CONFIG_FILE=/path/to/your/sidecar.yaml

Note: You can skip the usage of poncho by setting the --served-model-name flag when starting the vLLM service. You can pass a list of names there, so you can include pocket_network along any other that you are using. We recommend using the poncho sidecar because it has other tools that might be useful and keeps your vLLM away from exotic requests.

Model Backend Deployment

Now that we have all config in place, we can deploy the services. Create a docker-compose.yaml file, containing the following lines:

docker-compose.yaml

version: '3'

networks:
    pokt-relayminer-ai:
        name: pokt-relayminer-ai
        driver: bridge

services:
    vllm-relayminer:
        container_name: vllm-relayminer
        image: vllm/vllm-openai:v0.10.2
        volumes:
            - ${MODELS_PATH}:/root/.cache/huggingface/hub/
        environment:
            - HF_TOKEN=${HF_TOKEN}
            - MODEL_NAME=${MODEL_NAME}
            - GPU_MEMORY_UTILIZATION=${GPU_MEMORY_UTILIZATION}
            - MAX_MODEL_LEN=${MAX_MODEL_LEN}
            - DTYPE=${DTYPE}
        entrypoint: ["python3",
                      "-m",
                      "vllm.entrypoints.openai.api_server",
                      "--model",
                      "${MODEL_NAME}",
                      "--dtype",
                      "${DTYPE}",
                      "--gpu-memory-utilization",
                      "${GPU_MEMORY_UTILIZATION}",
                      "--max-model-len",
                      "${MAX_MODEL_LEN}",
                      "--trust-remote-code",
                      ]
        ports:
            - "8800:8000"
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          device_ids: ['0']
                          capabilities: [gpu]
        networks:
          - pokt-relayminer-ai

    poncho-sidecar:
        container_name: poncho-sidecar
        image: poktscan/pokt_ml_poncho:latest
        deploy:
            replicas: 1
        environment:
            CONFIG_PATH: /config/config.yaml
        volumes:
            - $PONCHO_CONFIG_FILE:/config/config.yaml
        ports:
            - "8000:8080"
        networks:
          - pokt-relayminer-ai

Pocket Network Backend

Now we need to setup the relayminer and the full-node. The relayminer will respond to the Pocket Network requests and the fullnode will provide the required data (stay in sync with the blockchain).

The easiest way to deploy this is by following the Stakenodes guide. There you will find the steps you need to follow to deploy the node and the miner using docker compose. Take a time to read through it before continuing here, as we will comment on some of the steps. In order to make the guide work with an AI backend you will need to make some tweaks.

Note: Please update the RelayMiner image in the repo from pocketd:v0.1.29 to poktscan/pocketd:v0.1.29-streaming-support, this will enable streaming support for your model. We will need to use custom images until the PR is merged.

In the docker-compose.yaml (found in the repository) the following environment variables should be set:

Environment Variables

# We are staking in mainnet
NETWORK=mainnet
# Your external IP
EXTERNAL_IP=
# Your node name
NODE_MONIKER="my-great-node"
# To save some space, modify this
SNAPSHOT_TYPE=pruned

Also, add an external network, so the relayminer can find the vLLM backend:

Network Configuration

...

networks:
    pokt-relayminer-ai:
        name: pokt-relayminer-ai
        external: true
...

And to each service (pocketd and relayminer) add:

Service Configuration

        networks:
          - pokt-relayminer-ai

Comments on "RelayMiner Setup Guide":

You can skip step 1 ("1. Prepare Your Supplier Stake File") since we already did that.

For step 2 you will fill the relayminer-config.yaml with the following lines:

relayminer-config.yaml

default_signing_key_names:
  # The one we created at the beginning of this document
  - my-supplier
# Very important!
default_request_timeout_seconds: 600
default_max_body_size: 30MB
smt_store_path: ":memory:"
# This will likely never happen, but just in case we will be generous...
enable_over_servicing: true
# This will communicate with your full node
pocket_node:
  query_node_rpc_url: tcp://pocketd-node:26657
  query_node_grpc_url: tcp://pocketd-node:9090
  tx_node_rpc_url: tcp://pocketd-node:26657
# This section describes the suppliers that this relayminer is representing
suppliers:
  - service_id: text-generation
    # This is the default port, make sure it is open and that your staked address (https://relayminer.your.domain.name) point here.
    listen_url: http://0.0.0.0:8545
    # Rule #2: Double Tap
    request_timeout_seconds: 600
    service_config:
      # This is where the relayminer will relay, this will point to the poncho
      backend_url: http://poncho-sidecar:8000
metrics:
  enabled: true
  addr: :9000
pprof:
  enabled: false
  addr: localhost:6060
ping:
  enabled: false
  addr: localhost:8081

Make sure you copy this config to the relay miner volume (when it is running):

Copy Config

docker cp relayminer-config.yaml relayminer:/home/pocket/.pocket/relayminer-config.yaml

For step 4, you will need to add your key to the relayminer. To do this you need to execute the second version of the command:

Add Key to RelayMiner

docker exec -it relayminer sh
pocketd keys add my-supplier --recover

This will init an interactive process that will ask for your private keys (the 12 words that were printed when you created the key). This will be stored safely in the docker volume.

DONE, Now What??

Congratulations, you have your supplier deployed! Have a beer! 🍺

Now you will need to wait... The Pnyx router works along-side the Pocket-ML-Testbench, it will take approx. 24 hrs to check you and then you will start receiving "liveness" tests, after that (if your model proves to be real) you will be slowly benchmarked. The full process can take up to 4 days.

You can check the state and rewards of your node in POKTscan's operator page by entering the public key of your supplier (the one starting with pokt...).