# RL Policy ## ONNX Model Requirements | Property | Value | |----------|-------| | Input shape | `(1, 47)` float32 | | Output shape | `(1, 12)` float32 | | File location | `models/policy.onnx` | ## Observation Vector (47 elements) Your model receives this observation every 20ms: | Index | Name | Size | Units | Scaling | Source | |-------|------|------|-------|---------|--------| | 0-2 | base_ang_vel | 3 | rad/s | × 0.25 | hardcoded | | 3-5 | projected_gravity | 3 | normalized | none | - | | 6-8 | velocity_commands | 3 | m/s, m/s, rad/s | none | - | | 9-20 | joint_pos | 12 | rad | − default_pos | ONNX: `default_joint_pos` | | 21-32 | joint_vel | 12 | rad/s | × 0.05 | hardcoded | | 33-44 | previous_actions | 12 | rad | none | - | | 45-46 | gait_clock | 2 | - | cos/sin of phase | - | ### Joint Order (indices 9-20, 21-32, 33-44) | Index | Joint | |-------|-------| | 0 | L_Hip_Pitch | | 1 | L_Hip_Roll | | 2 | L_Hip_Yaw | | 3 | L_Knee_Pitch | | 4 | L_Ankle_Pitch | | 5 | L_Ankle_Roll | | 6 | R_Hip_Pitch | | 7 | R_Hip_Roll | | 8 | R_Hip_Yaw | | 9 | R_Knee_Pitch | | 10 | R_Ankle_Pitch | | 11 | R_Ankle_Roll | ### Coordinate Frame Body frame follows ROS REP-103 (right-handed): ``` Z (up) │ │ │ └───────── X (forward) / / Y (left) ``` | Signal | Frame | Convention | |--------|-------|------------| | Angular velocity | Body | x=roll rate, y=pitch rate, z=yaw rate | | Gravity projection | Body | Points toward ground (normalized) | | Velocity commands | Body | vx=forward, vy=left, wz=CCW | ## Action Vector (12 elements) Your model outputs 12 raw action values. The firmware applies: ``` target_position = raw_action * action_scale + action_offset ``` ### Action Scaling (from ONNX metadata) | Joint | Scale | Offset (rad) | |-------|-------|--------------| | Hip Pitch | 0.140 | 0.0 | | Hip Roll | 0.068 | 0.0 | | Hip Yaw | 0.133 | 0.0 | | Knee | 0.173 | 0.0 | | Ankle Pitch | 0.116 | 0.0 | | Ankle Roll | 0.116 | 0.0 | ## ONNX Metadata The firmware reads policy parameters directly from ONNX custom metadata. Swapping the ONNX file automatically updates gains, scaling, and standing pose. ### Required Metadata Keys | Key | Format | Description | |-----|--------|-------------| | `action_scale` | comma-separated floats | Per-joint action scaling (12 values) | | `default_joint_pos` | comma-separated floats | Standing pose / action offset (12 values) | | `joint_stiffness` | comma-separated floats | Kp gains for policy execution (12 values) | | `joint_damping` | comma-separated floats | Kd gains for policy execution (12 values) | ### Adding Metadata to Your ONNX Model ```python import onnx model = onnx.load("policy.onnx") metadata = { "action_scale": "0.140,0.068,0.133,0.173,0.116,0.116,0.140,0.068,0.133,0.173,0.116,0.116", "default_joint_pos": "0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0", "joint_stiffness": "257.4,394.8,135.4,130.3,93.2,93.2,257.4,394.8,135.4,130.3,93.2,93.2", "joint_damping": "5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0", } for key, value in metadata.items(): model.metadata_props.append(onnx.StringStringEntryProto(key=key, value=value)) onnx.save(model, "policy_with_metadata.onnx") ``` ## PyTorch Export Example ```python import torch import torch.onnx class MyPolicy(torch.nn.Module): def __init__(self): super().__init__() self.net = torch.nn.Sequential( torch.nn.Linear(47, 256), torch.nn.ELU(), torch.nn.Linear(256, 256), torch.nn.ELU(), torch.nn.Linear(256, 12), ) def forward(self, obs): return self.net(obs) # Export model = MyPolicy() model.load_state_dict(torch.load("policy.pt")) model.eval() dummy_input = torch.zeros(1, 47) torch.onnx.export( model, dummy_input, "policy.onnx", input_names=["obs"], output_names=["actions"], opset_version=11 ) ``` ## Deploying 1. Export your model to ONNX 2. Copy to robot: `scp policy.onnx robot@192.168.1.100:~/models/` 3. Restart the firmware ## Source Files | File | What it does | |------|--------------| | `source/control/policy_adapter/policy_adapter.c` | Builds observations, parses actions | | `source/hal/hardware/motor/motor_map.h` | Action scales, joint limits, gains | | `source/app/policy/policy_thread.c` | Runs the ONNX model |