Skip to content

Conversation

cdrini
Copy link
Collaborator

@cdrini cdrini commented Aug 15, 2025

Closes #10290
Closes #10239

  1. Adds a graph to monitor specifically the homepage appearing with no books (work towards Home page cache breaking (solr queries) #11145)
image

When this is 0, it means the homepage is empty.

  1. Adds a monitor that watches over the solr logs and tracks various things like run time
    • Note: there are some cases where the query is so long that it gets truncated! These are currently skipped over
image
  1. Adds a query_label option to a few spots we make solr queries that we can use to monitor certain types of queries

    • Note we likely want maybe another option, client to monitor eg internal queries vs public API queries
    • Haven't tested; this must be patch deployed to the web nodes in order to see if the labels are applied correctly
  2. Add ol-solr0/ol-solr1 to our deploy flow. This lets these monitoring changes stay up-to-date

  3. Switch nginx monitoring to use obfi_previous_minute instead of using the last 17500 entries. This results in more accurate numbers and lets use get absolute values, so we can now easily see if e.g a certain bot has seen a spike in traffic, as opposed to a bot looks like it has a spike in traffic because everything else went down.

The flat line is the before since we were sampling a constant value. Now we get the actual previous minute in full.

image

Technical

Testing

https://grafana.us.archive.org/d/000000176/open-library-dev?orgId=1&refresh=1m&from=1755279851212&to=1755290651214&viewPanel=41

Screenshot

Stakeholders

@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch from 6064271 to 5a76175 Compare August 15, 2025 20:41
@cdrini cdrini added the Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. label Aug 15, 2025
@cdrini
Copy link
Collaborator Author

cdrini commented Aug 15, 2025

Patch deployed to monitor frequency of this issue.

@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch from 62da713 to f988fba Compare August 18, 2025 16:07
@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch 2 times, most recently from a136444 to dde5b2e Compare August 19, 2025 15:02
@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch from dde5b2e to 652b10b Compare August 19, 2025 15:07
@cdrini cdrini changed the title Add new monitor for empty homepage Improve solr + nginx performance monitoring Aug 19, 2025
@cdrini cdrini marked this pull request as ready for review August 19, 2025 15:44
@Copilot Copilot AI review requested due to automatic review settings August 19, 2025 15:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances monitoring capabilities for Solr and Nginx performance with more accurate metrics and query tracking. It adds specific monitoring for homepage book display and switches from fixed-size log sampling to time-based analysis for better accuracy.

  • Introduces comprehensive Solr log monitoring with query labeling and performance tracking
  • Refactors Nginx monitoring to use time-based sampling (obfi_previous_minute) instead of fixed entry counts
  • Adds homepage monitoring to track book display issues and query labeling throughout the search system

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/monitoring/solr_logs_monitor.py New comprehensive Solr log parser and monitoring system
scripts/monitoring/utils.sh Refactored to use obfi_previous_minute for more accurate Nginx metrics
scripts/monitoring/monitor.py Added Solr and homepage monitoring jobs
openlibrary/plugins/worksearch/code.py Added query labeling system for tracking different search types
scripts/deployment/deploy.sh Extended deployment to include Solr servers
compose.production.yaml Added monitoring profile for Solr servers

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

def safe_parse_log_entry(log_line: str) -> RequestLogEntry | SolrLogEntry | None:
try:
return parse_log_entry(log_line)
except Exception as e: # noqa: BLE001
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bare Exception catch is too broad and could hide important errors. Consider catching specific exceptions like ValueError or ParseError that are expected from parse_log_entry.

Suggested change
except Exception as e: # noqa: BLE001
except (ValueError, ParseError) as e:

Copilot uses AI. Check for mistakes.

@cdrini cdrini marked this pull request as draft August 20, 2025 13:43
@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch 3 times, most recently from 708d172 to 3b7d1e4 Compare August 20, 2025 22:43
@cdrini cdrini force-pushed the feature/monitor-empty-homepage branch from 3b7d1e4 to ca92ed4 Compare August 20, 2025 22:51
@cdrini cdrini marked this pull request as ready for review August 21, 2025 10:30
fields: str = '*',
facet: bool = True,
spellcheck_count: int | None = None,
query_label: QueryLabel = 'UNLABELLED',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNLABELLED_WORK_SEARCH may help us at least differentiate between e.g. author, list, subject searches.

@mekarpeles mekarpeles merged commit 0c7a096 into internetarchive:master Aug 21, 2025
4 checks passed
@cdrini cdrini deleted the feature/monitor-empty-homepage branch August 21, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. Theme: Monitoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dockerized Solr Performance Monitoring Add ol-solr0 to deploy flow

2 participants