RL Policy

ONNX Model Requirements

Property

Value

Input shape

(1, 47) float32

Output shape

(1, 12) float32

File location

models/policy.onnx

Observation Vector (47 elements)

Your model receives this observation every 20ms:

Index

Name

Size

Units

Scaling

Source

0-2

base_ang_vel

3

rad/s

× 0.25

hardcoded

3-5

projected_gravity

3

normalized

none

-

6-8

velocity_commands

3

m/s, m/s, rad/s

none

-

9-20

joint_pos

12

rad

− default_pos

ONNX: default_joint_pos

21-32

joint_vel

12

rad/s

× 0.05

hardcoded

33-44

previous_actions

12

rad

none

-

45-46

gait_clock

2

-

cos/sin of phase

-

Joint Order (indices 9-20, 21-32, 33-44)

Index

Joint

0

L_Hip_Pitch

1

L_Hip_Roll

2

L_Hip_Yaw

3

L_Knee_Pitch

4

L_Ankle_Pitch

5

L_Ankle_Roll

6

R_Hip_Pitch

7

R_Hip_Roll

8

R_Hip_Yaw

9

R_Knee_Pitch

10

R_Ankle_Pitch

11

R_Ankle_Roll

Coordinate Frame

Body frame follows ROS REP-103 (right-handed):

                 Z (up)
                 │
                 │
                 │
                 └───────── X (forward)
                /
               /
              Y (left)

Signal

Frame

Convention

Angular velocity

Body

x=roll rate, y=pitch rate, z=yaw rate

Gravity projection

Body

Points toward ground (normalized)

Velocity commands

Body

vx=forward, vy=left, wz=CCW

Action Vector (12 elements)

Your model outputs 12 raw action values. The firmware applies:

target_position = raw_action * action_scale + action_offset

Action Scaling (from ONNX metadata)

Joint

Scale

Offset (rad)

Hip Pitch

0.140

0.0

Hip Roll

0.068

0.0

Hip Yaw

0.133

0.0

Knee

0.173

0.0

Ankle Pitch

0.116

0.0

Ankle Roll

0.116

0.0

ONNX Metadata

The firmware reads policy parameters directly from ONNX custom metadata. Swapping the ONNX file automatically updates gains, scaling, and standing pose.

Required Metadata Keys

Key

Format

Description

action_scale

comma-separated floats

Per-joint action scaling (12 values)

default_joint_pos

comma-separated floats

Standing pose / action offset (12 values)

joint_stiffness

comma-separated floats

Kp gains for policy execution (12 values)

joint_damping

comma-separated floats

Kd gains for policy execution (12 values)

Adding Metadata to Your ONNX Model

import onnx

model = onnx.load("policy.onnx")

metadata = {
    "action_scale": "0.140,0.068,0.133,0.173,0.116,0.116,0.140,0.068,0.133,0.173,0.116,0.116",
    "default_joint_pos": "0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0",
    "joint_stiffness": "257.4,394.8,135.4,130.3,93.2,93.2,257.4,394.8,135.4,130.3,93.2,93.2",
    "joint_damping": "5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0",
}
for key, value in metadata.items():
    model.metadata_props.append(onnx.StringStringEntryProto(key=key, value=value))

onnx.save(model, "policy_with_metadata.onnx")

PyTorch Export Example

import torch
import torch.onnx

class MyPolicy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(47, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 12),
        )

    def forward(self, obs):
        return self.net(obs)

# Export
model = MyPolicy()
model.load_state_dict(torch.load("policy.pt"))
model.eval()

dummy_input = torch.zeros(1, 47)
torch.onnx.export(
    model,
    dummy_input,
    "policy.onnx",
    input_names=["obs"],
    output_names=["actions"],
    opset_version=11
)

Deploying

  1. Export your model to ONNX

  2. Copy to robot: scp policy.onnx robot@192.168.1.100:~/models/

  3. Restart the firmware

Source Files

File

What it does

source/control/policy_adapter/policy_adapter.c

Builds observations, parses actions

source/hal/hardware/motor/motor_map.h

Action scales, joint limits, gains

source/app/policy/policy_thread.c

Runs the ONNX model