-
Notifications
You must be signed in to change notification settings - Fork 652
Set 'temperature = 0' for extract_entities e2e test variants #3189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We had a model inference cache regen failure due to OpenAI and Anthropic generating different NER output (which caused the judge requests to be different, leading to a cache miss) instead of exactly the same output. Setting temperature=0 should make the test more consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds temperature = 0
configuration to all extract_entities function variants across two configuration files to improve test consistency. The change addresses a model inference cache regeneration failure where OpenAI and Anthropic models were generating different Named Entity Recognition (NER) outputs, causing cache misses due to different judge requests.
- Sets temperature to 0 for all extract_entities variants in both main and e2e configuration files
- Ensures deterministic model outputs to prevent cache misses during testing
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
ui/fixtures/config/tensorzero.toml | Added temperature=0 to 5 extract_entities variants for consistent model outputs |
ui/fixtures/config/tensorzero.e2e.toml | Added temperature=0 to 5 extract_entities variants in e2e test configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
/regen-fixtures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blind
We had a model inference cache regen failure due to OpenAI and Anthropic generating different NER output (which caused the judge requests to be different, leading to a cache miss) instead of exactly the same output.
Setting temperature=0 should make the test more consistent
Important
Set
temperature = 0
forextract_entities
variants to ensure consistent NER output and partially re-enable cache validation in workflow.temperature = 0
forextract_entities
variants intensorzero.e2e.toml
andtensorzero.toml
to ensure consistent NER output..github/workflows/ui-tests-e2e-model-inference-cache.yml
, but keep the exit condition commented out.This description was created by
for 1babbae. You can customize this summary. It will automatically update as commits are pushed.