agentevals Community Evaluators

Community-maintained evaluators for agentevals -- the agent evaluation framework built on Google ADK.

Evaluators are standalone scoring programs that evaluate agent traces. They read EvalInput JSON from stdin and write EvalResult JSON to stdout. This repository is the official index of community-contributed evaluators.

Using community evaluators

Browse available evaluators

agentevals evaluator list --source github

Reference a community evaluator in your eval config

Add a type: remote entry to your eval_config.yaml:

metrics:
  - tool_trajectory_avg_score

  - name: response_quality
    type: remote
    source: github
    ref: evaluators/response_quality/response_quality.py
    threshold: 0.7
    config:
      min_response_length: 20

  - name: tool_coverage
    type: remote
    source: github
    ref: evaluators/tool_coverage/tool_coverage.py
    threshold: 1.0
    config:
      min_tool_calls: 1

Then run as usual:

agentevals run traces/my_trace.json \
  --config eval_config.yaml \
  --eval-set eval_set.json

The evaluator is downloaded automatically and cached in ~/.cache/agentevals/evaluators/.

Contributing an evaluator

1. Scaffold a new evaluator

pip install agentevals
agentevals evaluator init my_evaluator

This creates a directory ready to be added to this repo:

my_evaluator/
├── my_evaluator.py     # your scoring logic
└── evaluator.yaml      # metadata manifest

2. Implement your scoring logic

Edit my_evaluator.py. Your function receives an EvalInput with the agent's invocations and returns an EvalResult with a score between 0.0 and 1.0.

from agentevals_grader_sdk import grader, EvalInput, EvalResult

@grader
def my_evaluator(input: EvalInput) -> EvalResult:
    scores = []
    for inv in input.invocations:
        # Your scoring logic here
        scores.append(1.0)

    return EvalResult(
        score=sum(scores) / len(scores) if scores else 0.0,
        per_invocation_scores=scores,
    )

if __name__ == "__main__":
    my_evaluator.run()

Install the SDK standalone with pip install agentevals-grader-sdk (no heavy dependencies).

3. Update the manifest

Edit evaluator.yaml with a description, tags, and your name:

name: my_evaluator
description: What this evaluator checks
language: python
entrypoint: my_evaluator.py
tags: [quality, tools]
author: your-github-username

4. Validate locally

Run the validation script to catch issues before submitting:

pip install pyyaml agentevals-evaluator-sdk
python scripts/validate_evaluator.py evaluators/my_evaluator

This checks:

Manifest schema -- required fields, entrypoint exists, name matches directory
Syntax and imports -- compiles cleanly, uses @evaluator decorator
Smoke run -- runs the evaluator with synthetic input and validates the EvalResult output (correct types for score, details, status, etc.)

You can also test with a full eval run:

metrics:
  - name: my_evaluator
    type: code
    path: ./evaluators/my_evaluator/my_evaluator.py
    threshold: 0.5

agentevals run traces/sample.json --config eval_config.yaml --eval-set eval_set.json

5. Submit a pull request

Fork this repository
Copy your evaluator directory into evaluators/:

evaluators/
├── my_evaluator/
│   ├── evaluator.yaml
│   └── my_evaluator.py
├── response_quality/
│   └── ...
└── tool_coverage/
    └── ...

Open a PR against main

CI will automatically validate your evaluator (manifest, syntax, and smoke run). Once merged, a separate workflow regenerates index.yaml, and your evaluator becomes available to everyone via agentevals evaluator list.

Supported languages

Evaluators can be written in any language that reads JSON from stdin and writes JSON to stdout.

Language	Extension	SDK available
Python	`.py`	`pip install agentevals-grader-sdk`
JavaScript	`.js`	No SDK yet -- just read stdin, write stdout
TypeScript	`.ts`	No SDK yet -- just read stdin, write stdout

See the custom evaluators documentation for the full protocol reference.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
evaluators		evaluators
scripts		scripts
.gitignore		.gitignore
README.md		README.md
index.yaml		index.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentevals Community Evaluators

Using community evaluators

Browse available evaluators

Reference a community evaluator in your eval config

Contributing an evaluator

1. Scaffold a new evaluator

2. Implement your scoring logic

3. Update the manifest

4. Validate locally

5. Submit a pull request

Supported languages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentevals Community Evaluators

Using community evaluators

Browse available evaluators

Reference a community evaluator in your eval config

Contributing an evaluator

1. Scaffold a new evaluator

2. Implement your scoring logic

3. Update the manifest

4. Validate locally

5. Submit a pull request

Supported languages

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages