-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Improve solr + nginx performance monitoring #11156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve solr + nginx performance monitoring #11156
Conversation
6064271
to
5a76175
Compare
Patch deployed to monitor frequency of this issue. |
62da713
to
f988fba
Compare
a136444
to
dde5b2e
Compare
dde5b2e
to
652b10b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances monitoring capabilities for Solr and Nginx performance with more accurate metrics and query tracking. It adds specific monitoring for homepage book display and switches from fixed-size log sampling to time-based analysis for better accuracy.
- Introduces comprehensive Solr log monitoring with query labeling and performance tracking
- Refactors Nginx monitoring to use time-based sampling (
obfi_previous_minute
) instead of fixed entry counts - Adds homepage monitoring to track book display issues and query labeling throughout the search system
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
scripts/monitoring/solr_logs_monitor.py | New comprehensive Solr log parser and monitoring system |
scripts/monitoring/utils.sh | Refactored to use obfi_previous_minute for more accurate Nginx metrics |
scripts/monitoring/monitor.py | Added Solr and homepage monitoring jobs |
openlibrary/plugins/worksearch/code.py | Added query labeling system for tracking different search types |
scripts/deployment/deploy.sh | Extended deployment to include Solr servers |
compose.production.yaml | Added monitoring profile for Solr servers |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
def safe_parse_log_entry(log_line: str) -> RequestLogEntry | SolrLogEntry | None: | ||
try: | ||
return parse_log_entry(log_line) | ||
except Exception as e: # noqa: BLE001 |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bare Exception
catch is too broad and could hide important errors. Consider catching specific exceptions like ValueError
or ParseError
that are expected from parse_log_entry
.
except Exception as e: # noqa: BLE001 | |
except (ValueError, ParseError) as e: |
Copilot uses AI. Check for mistakes.
708d172
to
3b7d1e4
Compare
3b7d1e4
to
ca92ed4
Compare
fields: str = '*', | ||
facet: bool = True, | ||
spellcheck_count: int | None = None, | ||
query_label: QueryLabel = 'UNLABELLED', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNLABELLED_WORK_SEARCH
may help us at least differentiate between e.g. author, list, subject searches.
Closes #10290
Closes #10239
When this is 0, it means the homepage is empty.
Adds a
query_label
option to a few spots we make solr queries that we can use to monitor certain types of queriesAdd ol-solr0/ol-solr1 to our deploy flow. This lets these monitoring changes stay up-to-date
Switch nginx monitoring to use
obfi_previous_minute
instead of using the last 17500 entries. This results in more accurate numbers and lets use get absolute values, so we can now easily see if e.g a certain bot has seen a spike in traffic, as opposed to a bot looks like it has a spike in traffic because everything else went down.The flat line is the before since we were sampling a constant value. Now we get the actual previous minute in full.
Technical
Testing
https://grafana.us.archive.org/d/000000176/open-library-dev?orgId=1&refresh=1m&from=1755279851212&to=1755290651214&viewPanel=41
Screenshot
Stakeholders