Reasoning
195 tasks
Best: AnyGen
Tool-Use
180 tasks
Best: AnyGen
Memory
30 tasks
Best: AnyGen
Multimodal
25 tasks
Best: AnyGen
Collaboration
30 tasks
Best: AnyGen
| Framework | Reasoning | Tool-Use | Memory | Multimodal | Collaboration | Avg |
|---|
| AnyGen | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Claude Code | 100.00 | 91.00 | 87.50 | 88.00 | 84.00 | 90.10 |
| CodeBuddy | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| H.O.P.E. | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Manus | 100.00 | 91.00 | 87.50 | 88.00 | 84.00 | 90.10 |
| OpenClaw | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| OpenClaw (Miaoda) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| WorkBuddy | 100.00 | 91.00 | 87.50 | 88.00 | 84.00 | 90.10 |
| Hermes Agent | 73.73 | 83.20 | 84.22 | 72.18 | 85.00 | 79.67 |