RL Policy
ONNX Model Requirements
Property |
Value |
|---|---|
Input shape |
|
Output shape |
|
File location |
|
Observation Vector (47 elements)
Your model receives this observation every 20ms:
Index |
Name |
Size |
Units |
Scaling |
Source |
|---|---|---|---|---|---|
0-2 |
base_ang_vel |
3 |
rad/s |
× 0.25 |
hardcoded |
3-5 |
projected_gravity |
3 |
normalized |
none |
- |
6-8 |
velocity_commands |
3 |
m/s, m/s, rad/s |
none |
- |
9-20 |
joint_pos |
12 |
rad |
− default_pos |
ONNX: |
21-32 |
joint_vel |
12 |
rad/s |
× 0.05 |
hardcoded |
33-44 |
previous_actions |
12 |
rad |
none |
- |
45-46 |
gait_clock |
2 |
- |
cos/sin of phase |
- |
Joint Order (indices 9-20, 21-32, 33-44)
Index |
Joint |
|---|---|
0 |
L_Hip_Pitch |
1 |
L_Hip_Roll |
2 |
L_Hip_Yaw |
3 |
L_Knee_Pitch |
4 |
L_Ankle_Pitch |
5 |
L_Ankle_Roll |
6 |
R_Hip_Pitch |
7 |
R_Hip_Roll |
8 |
R_Hip_Yaw |
9 |
R_Knee_Pitch |
10 |
R_Ankle_Pitch |
11 |
R_Ankle_Roll |
Coordinate Frame
Body frame follows ROS REP-103 (right-handed):
Z (up)
│
│
│
└───────── X (forward)
/
/
Y (left)
Signal |
Frame |
Convention |
|---|---|---|
Angular velocity |
Body |
x=roll rate, y=pitch rate, z=yaw rate |
Gravity projection |
Body |
Points toward ground (normalized) |
Velocity commands |
Body |
vx=forward, vy=left, wz=CCW |
Action Vector (12 elements)
Your model outputs 12 raw action values. The firmware applies:
target_position = raw_action * action_scale + action_offset
Action Scaling (from ONNX metadata)
Joint |
Scale |
Offset (rad) |
|---|---|---|
Hip Pitch |
0.140 |
0.0 |
Hip Roll |
0.068 |
0.0 |
Hip Yaw |
0.133 |
0.0 |
Knee |
0.173 |
0.0 |
Ankle Pitch |
0.116 |
0.0 |
Ankle Roll |
0.116 |
0.0 |
ONNX Metadata
The firmware reads policy parameters directly from ONNX custom metadata. Swapping the ONNX file automatically updates gains, scaling, and standing pose.
Required Metadata Keys
Key |
Format |
Description |
|---|---|---|
|
comma-separated floats |
Per-joint action scaling (12 values) |
|
comma-separated floats |
Standing pose / action offset (12 values) |
|
comma-separated floats |
Kp gains for policy execution (12 values) |
|
comma-separated floats |
Kd gains for policy execution (12 values) |
Adding Metadata to Your ONNX Model
import onnx
model = onnx.load("policy.onnx")
metadata = {
"action_scale": "0.140,0.068,0.133,0.173,0.116,0.116,0.140,0.068,0.133,0.173,0.116,0.116",
"default_joint_pos": "0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0",
"joint_stiffness": "257.4,394.8,135.4,130.3,93.2,93.2,257.4,394.8,135.4,130.3,93.2,93.2",
"joint_damping": "5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0",
}
for key, value in metadata.items():
model.metadata_props.append(onnx.StringStringEntryProto(key=key, value=value))
onnx.save(model, "policy_with_metadata.onnx")
PyTorch Export Example
import torch
import torch.onnx
class MyPolicy(torch.nn.Module):
def __init__(self):
super().__init__()
self.net = torch.nn.Sequential(
torch.nn.Linear(47, 256),
torch.nn.ELU(),
torch.nn.Linear(256, 256),
torch.nn.ELU(),
torch.nn.Linear(256, 12),
)
def forward(self, obs):
return self.net(obs)
# Export
model = MyPolicy()
model.load_state_dict(torch.load("policy.pt"))
model.eval()
dummy_input = torch.zeros(1, 47)
torch.onnx.export(
model,
dummy_input,
"policy.onnx",
input_names=["obs"],
output_names=["actions"],
opset_version=11
)
Deploying
Export your model to ONNX
Copy to robot:
scp policy.onnx robot@192.168.1.100:~/models/Restart the firmware
Source Files
File |
What it does |
|---|---|
|
Builds observations, parses actions |
|
Action scales, joint limits, gains |
|
Runs the ONNX model |