User Details
- User Since
- Feb 27 2015, 10:47 PM (555 w, 5 d)
- Availability
- Available
- IRC Nick
- urandom
- LDAP User
- Eevans
- MediaWiki User
- EEvans (WMF) [ Global Accounts ]
Yesterday
Tue, Oct 21
Mon, Oct 20
Wed, Oct 15
Mon, Oct 13
Provided that the moves happen one at a time (probably goes without saying), then the Cassandra hosts can be done at any time, and without coordination. The Cassandra hosts here are: aqs*, restbase*, & sessionstore*
Fri, Oct 10
Thu, Oct 9
Tue, Oct 7
Mon, Oct 6
Thu, Oct 2
Wed, Oct 1
The RESTBase cluster has been upgraded to v1.29.12 (sorry for the delay, I was out all last week and missed the message).
Tue, Sep 30
Mon, Sep 29
Sep 16 2025
Sep 10 2025
Sep 5 2025
Sep 4 2025
So to (try to )make this a bit more concrete:
Sep 3 2025
Sep 2 2025
Aug 29 2025
Aug 28 2025
Aug 26 2025
Aug 21 2025
@elukey as I recall, you didn't want to go the IP SAN route, is that correct?
Aug 20 2025
Aug 19 2025
rsyslog is back up and running after clearing the queue (/var/spool/rsyslog/*), which apparently was corrupted.
Aug 18 2025
Aug 14 2025
The debug logs (text log files located on the Cassandra cluster nodes) currently cover a period spanning from about the middle of May (about May 20) to today (Aug 14). Among them (all nodes) I can find 290 examples, 259 of which are for image_suggestions.suggestions. Of those 259 timeouts, there are only 93 unique wiki/page_id pairs (the partition key).
The 500s are indeed the result of query read timeouts at the coordinator nodes, and for the queries in question, they all reliably timeout even when ran from a command shell:
Aug 13 2025
Aug 12 2025
Aug 7 2025
Aug 5 2025
Aug 4 2025
Oh good, so it's not just me. :)