benchflow/env0-qwen35-9b-mobile300-prime-sft
Viewer • Updated • 300
None defined yet.
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks