We have hosted the application petri in order to run this application in our online workstations with Wine or directly.


Quick description about petri:

Petri is an open-source alignment auditing agent that lets researchers rapidly test concrete safety hypotheses against target models using realistic, multi-turn scenarios. Instead of building bespoke evals, Petri automatically generates audit environments from seed �special instructions,� orchestrates an auditor model to probe a target model, and simulates tool use and rollbacks to surface risky behaviors. Each interaction transcript is then scored by a judge model using a consistent rubric so results are comparable across runs and models. The system supports major model APIs and comes with starter seeds and judge dimensions, enabling minutes-to-insight workflows for questions like reward hacking, self-preservation, or eval awareness. Petri is designed for parallel exploration: it spins many audits in flight, aggregates findings, and highlights transcripts that deserve human review.

Features:
  • Scenario generator that turns seed instructions into realistic audit setups
  • Multi-turn auditor orchestration with simulated tool use and rollbacks
  • Judge model that scores transcripts via a consistent safety rubric
  • Parallel execution to explore many hypotheses and surface the riskiest traces first
  • Built-in starters for seeds and judge dimensions plus guidance for customization
  • API support for popular model providers with reproducible runs and reports


Programming Language: Python.
Categories:
AI Agents

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.