RL Policy

ONNX Model Requirements

Property	Value
Input shape	`(1, 47)` float32
Output shape	`(1, 12)` float32
File location	`models/policy.onnx`

Observation Vector (47 elements)

Your model receives this observation every 20ms:

Index	Name	Size	Units	Scaling	Source
0-2	base_ang_vel	3	rad/s	× 0.25	hardcoded
3-5	projected_gravity	3	normalized	none	-
6-8	velocity_commands	3	m/s, m/s, rad/s	none	-
9-20	joint_pos	12	rad	− default_pos	ONNX: `default_joint_pos`
21-32	joint_vel	12	rad/s	× 0.05	hardcoded
33-44	previous_actions	12	rad	none	-
45-46	gait_clock	2	-	cos/sin of phase	-

Joint Order (indices 9-20, 21-32, 33-44)

Index	Joint
0	L_Hip_Pitch
1	L_Hip_Roll
2	L_Hip_Yaw
3	L_Knee_Pitch
4	L_Ankle_Pitch
5	L_Ankle_Roll
6	R_Hip_Pitch
7	R_Hip_Roll
8	R_Hip_Yaw
9	R_Knee_Pitch
10	R_Ankle_Pitch
11	R_Ankle_Roll

Coordinate Frame

Body frame follows ROS REP-103 (right-handed):

                 Z (up)
                 │
                 │
                 │
                 └───────── X (forward)
                /
               /
              Y (left)

Signal	Frame	Convention
Angular velocity	Body	x=roll rate, y=pitch rate, z=yaw rate
Gravity projection	Body	Points toward ground (normalized)
Velocity commands	Body	vx=forward, vy=left, wz=CCW

Action Vector (12 elements)

Your model outputs 12 raw action values. The firmware applies:

target_position = raw_action * action_scale + action_offset

Action Scaling (from ONNX metadata)

Joint	Scale	Offset (rad)
Hip Pitch	0.140	0.0
Hip Roll	0.068	0.0
Hip Yaw	0.133	0.0
Knee	0.173	0.0
Ankle Pitch	0.116	0.0
Ankle Roll	0.116	0.0

ONNX Metadata

The firmware reads policy parameters directly from ONNX custom metadata. Swapping the ONNX file automatically updates gains, scaling, and standing pose.

Required Metadata Keys

Key	Format	Description
`action_scale`	comma-separated floats	Per-joint action scaling (12 values)
`default_joint_pos`	comma-separated floats	Standing pose / action offset (12 values)
`joint_stiffness`	comma-separated floats	Kp gains for policy execution (12 values)
`joint_damping`	comma-separated floats	Kd gains for policy execution (12 values)

Adding Metadata to Your ONNX Model

import onnx

model = onnx.load("policy.onnx")

metadata = {
    "action_scale": "0.140,0.068,0.133,0.173,0.116,0.116,0.140,0.068,0.133,0.173,0.116,0.116",
    "default_joint_pos": "0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0",
    "joint_stiffness": "257.4,394.8,135.4,130.3,93.2,93.2,257.4,394.8,135.4,130.3,93.2,93.2",
    "joint_damping": "5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0",
}
for key, value in metadata.items():
    model.metadata_props.append(onnx.StringStringEntryProto(key=key, value=value))

onnx.save(model, "policy_with_metadata.onnx")

PyTorch Export Example

import torch
import torch.onnx

class MyPolicy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(47, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 12),
        )

    def forward(self, obs):
        return self.net(obs)

# Export
model = MyPolicy()
model.load_state_dict(torch.load("policy.pt"))
model.eval()

dummy_input = torch.zeros(1, 47)
torch.onnx.export(
    model,
    dummy_input,
    "policy.onnx",
    input_names=["obs"],
    output_names=["actions"],
    opset_version=11
)

Deploying

Export your model to ONNX
Copy to robot: scp policy.onnx robot@192.168.1.100:~/models/
Restart the firmware

Source Files

File	What it does
`source/control/policy_adapter/policy_adapter.c`	Builds observations, parses actions
`source/hal/hardware/motor/motor_map.h`	Action scales, joint limits, gains
`source/app/policy/policy_thread.c`	Runs the ONNX model