Overview
Agent Contracts provides powerful verification capabilities to evaluate if your agent’s behavior matches the defined contracts.Verify a run
Prerequisites:- Installed and set up agent contracts for offline verification installation.
- Created an offline specification for your agent contract specifications.
1
Run your application through all predefined scenarios
app.py
http://localhost:16686).2
Retrieve the run
A run is a collection of traces. You can retrieve a run using the CLI.
In the agent contract environment you can use the
cli command to list the traces or runs.$poetry run cli ls run —timespan 1d
Listing runs from 2025-03-05 21:54:40 to 2025-03-07 21:54:40…
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Run ID ┃ Project Name ┃ Specifications ID ┃ Start Time ┃ End Time ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ cd26ad7e │ langgraph-fin-agent │ u8spz6vw │ 2025-03-06 12:45:32 │ 2025-03-06 12:45:32 │
└──────────┴─────────────────────┴───────────────────┴─────────────────────┴─────────────────────┘3
Verify the run against the specification
$poetry run cli verify run cd26ad7e fin-agent-022225.json —timespan 1d
Verifying run cd26ad7e with specifications from .local/fin-agent-022225.json…
Output will be saved to output/verify_cd26ad7e.json
Contract Right Tickers: 100%|████████████████████████████████████████████████████████████████| 3/3 [00:05 < 00:00, 1.93s/it]
Contract Right Tickers: 100%|████████████████████████████████████████████████████████████████| 4/4 [00:07 < 00:00, 1.91s/it]
Contract Right Tickers: 100%|████████████████████████████████████████████████████████████████| 4/4 [00:08 < 00:00, 2.06s/it]
───────────────────────────────────────── Trace 150a4110d9de4134e577f7f4c0c56bd4 ──────────────────────────────────────────
Right Tickers (UNSATISFIED)
┏━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Type ┃ Qualifier ┃ Requirement ┃ Satisfied ┃
┡━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ PRE │ MUST │ Question about the debt-to-equity ratio │ Yes │
│ PATH │ MUST │ Retrieve the financials of at least 3 car manufacturers │ No │
│ POST │ MUST │ Output a table │ No │
│ POST │ SHOULD │ Include at least Tesla, Ford, and General Motors │ No │
└──────┴───────────┴─────────────────────────────────────────────────────────┴───────────┘
───────────────────────────────────────── Trace a8ac6ae61209833f7abd723bdd175770 ──────────────────────────────────────────
Right Tickers (SATISFIED)
┏━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Type ┃ Qualifier ┃ Requirement ┃ Satisfied ┃
┡━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ PRE │ MUST │ Comparison between Nike and Adidas │ Yes │
│ PATH │ MUST │ Retrieve Nike financials with the ticker NKE │ Yes │
│ PATH │ MUST │ Retrieve Adidas financials with the ticker ADDYY │ Yes │
│ POST │ SHOULD │ A numeric value for operating margins expressed in percentage │ Yes │
└──────┴───────────┴───────────────────────────────────────────────────────────────┴───────────┘
───────────────────────────────────────── Trace a7bb5cfe26811835333c087e151f5eda ──────────────────────────────────────────
Right Tickers (SATISFIED)
┏━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Type ┃ Qualifier ┃ Requirement ┃ Satisfied ┃
┡━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ PRE │ MUST │ Question about Tesla’s net income │ Yes │
│ PATH │ MUST │ Retrieve Tesla’s net income with the ticker TSLA │ Yes │
│ POST │ SHOULD │ A numeric value for net income expressed in dollars │ Yes │
└──────┴───────────┴─────────────────────────────────────────────────────┴───────────┘Verify a single trace
1
Run your application on a single scenario
2
List available traces
$poetry run cli ls trace —timespan 1d
Listing traces from 2025-03-06 01:55:50 to 2025-03-07 01:55:50…
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Trace ID ┃ Project Name ┃ Run ID ┃ Specifications ID ┃ Scenario ID ┃ Start Time ┃ End Time ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ 150a4110d9de4134e577f7f4c0c56bd4 │ langgraph-fin-agent │ cd26ad7e │ u8spz6vw │ de_ratio │ 2025-03-06 12:45:32 │ 2025-03-06 12:45:32 │
│ a8ac6ae61209833f7abd723bdd175770 │ langgraph-fin-agent │ cd26ad7e │ u8spz6vw │ nike_vs_adidas │ 2025-03-06 12:45:32 │ 2025-03-06 12:45:32 │
│ a7bb5cfe26811835333c087e151f5eda │ langgraph-fin-agent │ cd26ad7e │ u8spz6vw │ tesla_income │ 2025-03-06 12:45:32 │ 2025-03-06 12:45:32 │
└──────────────────────────────────┴─────────────────────┴──────────┴───────────────────┴──────────────────┴─────────────────────┴─────────────────────┘3
Verify the trace
get command:Interpret the results
In addition to the verification result summary, the verification results are saved in the.output directory as verify_RUN_ID.json or verify_TRACE_ID.json. You can specify a different output directory using the --output-dir flag.
Here’s an example of the verification results. You can explore the detailed explanation for the requirement checking results.
.output/verify_150a4110d9de4134e577f7f4c0c56bd4.json
verify_RUN_ID.json in the specified output directory.
TODO: add example results
Next Steps
- Explore runtime certification capabilities
- View example contracts for common use cases