# RL Policy

## ONNX Model Requirements

| Property | Value |
|----------|-------|
| Input shape | `(1, 47)` float32 |
| Output shape | `(1, 12)` float32 |
| File location | `models/policy.onnx` |

## Observation Vector (47 elements)

Your model receives this observation every 20ms:

| Index | Name | Size | Units | Scaling | Source |
|-------|------|------|-------|---------|--------|
| 0-2 | base_ang_vel | 3 | rad/s | × 0.25 | hardcoded |
| 3-5 | projected_gravity | 3 | normalized | none | - |
| 6-8 | velocity_commands | 3 | m/s, m/s, rad/s | none | - |
| 9-20 | joint_pos | 12 | rad | − default_pos | ONNX: `default_joint_pos` |
| 21-32 | joint_vel | 12 | rad/s | × 0.05 | hardcoded |
| 33-44 | previous_actions | 12 | rad | none | - |
| 45-46 | gait_clock | 2 | - | cos/sin of phase | - |

### Joint Order (indices 9-20, 21-32, 33-44)

| Index | Joint |
|-------|-------|
| 0 | L_Hip_Pitch |
| 1 | L_Hip_Roll |
| 2 | L_Hip_Yaw |
| 3 | L_Knee_Pitch |
| 4 | L_Ankle_Pitch |
| 5 | L_Ankle_Roll |
| 6 | R_Hip_Pitch |
| 7 | R_Hip_Roll |
| 8 | R_Hip_Yaw |
| 9 | R_Knee_Pitch |
| 10 | R_Ankle_Pitch |
| 11 | R_Ankle_Roll |

### Coordinate Frame

Body frame follows ROS REP-103 (right-handed):

```
                 Z (up)
                 │
                 │
                 │
                 └───────── X (forward)
                /
               /
              Y (left)
```

| Signal | Frame | Convention |
|--------|-------|------------|
| Angular velocity | Body | x=roll rate, y=pitch rate, z=yaw rate |
| Gravity projection | Body | Points toward ground (normalized) |
| Velocity commands | Body | vx=forward, vy=left, wz=CCW |

## Action Vector (12 elements)

Your model outputs 12 raw action values. The firmware applies:

```
target_position = raw_action * action_scale + action_offset
```

### Action Scaling (from ONNX metadata)

| Joint | Scale | Offset (rad) |
|-------|-------|--------------|
| Hip Pitch | 0.140 | 0.0 |
| Hip Roll | 0.068 | 0.0 |
| Hip Yaw | 0.133 | 0.0 |
| Knee | 0.173 | 0.0 |
| Ankle Pitch | 0.116 | 0.0 |
| Ankle Roll | 0.116 | 0.0 |

## ONNX Metadata

The firmware reads policy parameters directly from ONNX custom metadata. Swapping the ONNX file automatically updates gains, scaling, and standing pose.

### Required Metadata Keys

| Key | Format | Description |
|-----|--------|-------------|
| `action_scale` | comma-separated floats | Per-joint action scaling (12 values) |
| `default_joint_pos` | comma-separated floats | Standing pose / action offset (12 values) |
| `joint_stiffness` | comma-separated floats | Kp gains for policy execution (12 values) |
| `joint_damping` | comma-separated floats | Kd gains for policy execution (12 values) |

### Adding Metadata to Your ONNX Model

```python
import onnx

model = onnx.load("policy.onnx")

metadata = {
    "action_scale": "0.140,0.068,0.133,0.173,0.116,0.116,0.140,0.068,0.133,0.173,0.116,0.116",
    "default_joint_pos": "0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0",
    "joint_stiffness": "257.4,394.8,135.4,130.3,93.2,93.2,257.4,394.8,135.4,130.3,93.2,93.2",
    "joint_damping": "5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0",
}
for key, value in metadata.items():
    model.metadata_props.append(onnx.StringStringEntryProto(key=key, value=value))

onnx.save(model, "policy_with_metadata.onnx")
```

## PyTorch Export Example

```python
import torch
import torch.onnx

class MyPolicy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(47, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 256),
            torch.nn.ELU(),
            torch.nn.Linear(256, 12),
        )

    def forward(self, obs):
        return self.net(obs)

# Export
model = MyPolicy()
model.load_state_dict(torch.load("policy.pt"))
model.eval()

dummy_input = torch.zeros(1, 47)
torch.onnx.export(
    model,
    dummy_input,
    "policy.onnx",
    input_names=["obs"],
    output_names=["actions"],
    opset_version=11
)
```

## Deploying

1. Export your model to ONNX
2. Copy to robot: `scp policy.onnx robot@192.168.1.100:~/models/`
3. Restart the firmware

## Source Files

| File | What it does |
|------|--------------|
| `source/control/policy_adapter/policy_adapter.c` | Builds observations, parses actions |
| `source/hal/hardware/motor/motor_map.h` | Action scales, joint limits, gains |
| `source/app/policy/policy_thread.c` | Runs the ONNX model |