Set 'temperature = 0' for extract_entities e2e test variants #3189

Aaron1011 · 2025-08-20T14:22:25Z

We had a model inference cache regen failure due to OpenAI and Anthropic generating different NER output (which caused the judge requests to be different, leading to a cache miss) instead of exactly the same output.

Setting temperature=0 should make the test more consistent

Important

Set temperature = 0 for extract_entities variants to ensure consistent NER output and partially re-enable cache validation in workflow.

Behavior:
- Set temperature = 0 for extract_entities variants in tensorzero.e2e.toml and tensorzero.toml to ensure consistent NER output.
Workflow:
- Partially re-enable row count validation in .github/workflows/ui-tests-e2e-model-inference-cache.yml, but keep the exit condition commented out.

^{This description was created by}^{for 1babbae. You can customize this summary. It will automatically update as commits are pushed.}

We had a model inference cache regen failure due to OpenAI and Anthropic generating different NER output (which caused the judge requests to be different, leading to a cache miss) instead of exactly the same output. Setting temperature=0 should make the test more consistent

Copilot

Pull Request Overview

This PR adds temperature = 0 configuration to all extract_entities function variants across two configuration files to improve test consistency. The change addresses a model inference cache regeneration failure where OpenAI and Anthropic models were generating different Named Entity Recognition (NER) outputs, causing cache misses due to different judge requests.

Sets temperature to 0 for all extract_entities variants in both main and e2e configuration files
Ensures deterministic model outputs to prevent cache misses during testing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
ui/fixtures/config/tensorzero.toml	Added temperature=0 to 5 extract_entities variants for consistent model outputs
ui/fixtures/config/tensorzero.e2e.toml	Added temperature=0 to 5 extract_entities variants in e2e test configuration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

ui/fixtures/config/tensorzero.e2e.toml

Aaron1011 · 2025-08-20T17:17:22Z

/regen-fixtures

anndvision

blind

…aaron/ner-temperature-zero

GabrielBianconi

blind

Copilot AI review requested due to automatic review settings August 20, 2025 14:22

Copilot AI reviewed Aug 20, 2025

View reviewed changes

ui/fixtures/config/tensorzero.e2e.toml Show resolved Hide resolved

ui/fixtures/config/tensorzero.e2e.toml Show resolved Hide resolved

ui/fixtures/config/tensorzero.e2e.toml Show resolved Hide resolved

ui/fixtures/config/tensorzero.e2e.toml Show resolved Hide resolved

Regenerate ModelInferenceCache fixtures

172a725

virajmehta approved these changes Aug 20, 2025

View reviewed changes

virajmehta previously approved these changes Aug 20, 2025

View reviewed changes

virajmehta enabled auto-merge August 20, 2025 18:48

anndvision previously approved these changes Aug 20, 2025

View reviewed changes

virajmehta added this pull request to the merge queue Aug 20, 2025

virajmehta removed this pull request from the merge queue due to a manual request Aug 20, 2025

virajmehta added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025

GabrielBianconi added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025

virajmehta added 2 commits August 21, 2025 09:46

Merge branch 'main' of https://github.com/tensorzero/tensorzero into …

fe17d64

…aaron/ner-temperature-zero

still disable the error but enable the check so we can see if this works

1babbae

virajmehta dismissed stale reviews from anndvision and themself via 1babbae August 21, 2025 13:47

virajmehta assigned GabrielBianconi Aug 21, 2025

GabrielBianconi approved these changes Aug 21, 2025

View reviewed changes

GabrielBianconi enabled auto-merge August 21, 2025 13:51

GabrielBianconi added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025

GabrielBianconi added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025

GabrielBianconi added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025

GabrielBianconi added this pull request to the merge queue Aug 22, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025

virajmehta added this pull request to the merge queue Aug 22, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set 'temperature = 0' for extract_entities e2e test variants #3189

Set 'temperature = 0' for extract_entities e2e test variants #3189

Aaron1011 commented Aug 20, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aaron1011 commented Aug 20, 2025

Uh oh!

anndvision left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GabrielBianconi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Set 'temperature = 0' for extract_entities e2e test variants #3189

Are you sure you want to change the base?

Set 'temperature = 0' for extract_entities e2e test variants #3189

Conversation

Aaron1011 commented Aug 20, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aaron1011 commented Aug 20, 2025

Uh oh!

anndvision left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GabrielBianconi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aaron1011 commented Aug 20, 2025 •

edited by ellipsis-dev bot

Loading