Skip to content

Set 'temperature = 0' for extract_entities e2e test variants #3189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Aaron1011
Copy link
Member

@Aaron1011 Aaron1011 commented Aug 20, 2025

We had a model inference cache regen failure due to OpenAI and Anthropic generating different NER output (which caused the judge requests to be different, leading to a cache miss) instead of exactly the same output.

Setting temperature=0 should make the test more consistent


Important

Set temperature = 0 for extract_entities variants to ensure consistent NER output and partially re-enable cache validation in workflow.

  • Behavior:
    • Set temperature = 0 for extract_entities variants in tensorzero.e2e.toml and tensorzero.toml to ensure consistent NER output.
  • Workflow:
    • Partially re-enable row count validation in .github/workflows/ui-tests-e2e-model-inference-cache.yml, but keep the exit condition commented out.

This description was created by Ellipsis for 1babbae. You can customize this summary. It will automatically update as commits are pushed.

We had a model inference cache regen failure due to OpenAI and
Anthropic generating different NER output (which caused the judge
requests to be different, leading to a cache miss) instead of exactly
the same output.

Setting temperature=0 should make the test more consistent
@Copilot Copilot AI review requested due to automatic review settings August 20, 2025 14:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds temperature = 0 configuration to all extract_entities function variants across two configuration files to improve test consistency. The change addresses a model inference cache regeneration failure where OpenAI and Anthropic models were generating different Named Entity Recognition (NER) outputs, causing cache misses due to different judge requests.

  • Sets temperature to 0 for all extract_entities variants in both main and e2e configuration files
  • Ensures deterministic model outputs to prevent cache misses during testing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
ui/fixtures/config/tensorzero.toml Added temperature=0 to 5 extract_entities variants for consistent model outputs
ui/fixtures/config/tensorzero.e2e.toml Added temperature=0 to 5 extract_entities variants in e2e test configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Aaron1011
Copy link
Member Author

/regen-fixtures

virajmehta
virajmehta previously approved these changes Aug 20, 2025
@virajmehta virajmehta enabled auto-merge August 20, 2025 18:48
anndvision
anndvision previously approved these changes Aug 20, 2025
Copy link
Contributor

@anndvision anndvision left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blind

@virajmehta virajmehta added this pull request to the merge queue Aug 20, 2025
@virajmehta virajmehta removed this pull request from the merge queue due to a manual request Aug 20, 2025
@virajmehta virajmehta added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025
@GabrielBianconi GabrielBianconi added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025
@virajmehta virajmehta dismissed stale reviews from anndvision and themself via 1babbae August 21, 2025 13:47
Copy link
Member

@GabrielBianconi GabrielBianconi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blind

@GabrielBianconi GabrielBianconi added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025
@GabrielBianconi GabrielBianconi added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025
@GabrielBianconi GabrielBianconi added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025
@GabrielBianconi GabrielBianconi added this pull request to the merge queue Aug 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025
@virajmehta virajmehta added this pull request to the merge queue Aug 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants