Page MenuHomePhabricator

AnalyticsComponent
ArchivedPublic

Details

Description

ARCHIVED and superseded by Data-Engineering, see T287531: Create project tag for Data-Engineering.

Main place where the WMF Analytics Engineering team triages incoming work. If you tag something with Analytics it will go to the Incoming column and we will triage it usually within a week. For urgent problems, you can contact the team members on IRC or email.

Recent Activity

Mon, Oct 20

Ottomata added a parent task for T160311: Sort inconsistency in AQS timestamp behavior: T342018: compile list of known issues for triage post AQS 2.0 launch.
Mon, Oct 20, 5:00 PM · Data-Engineering-Icebox, Data-Engineering, Analytics

Wed, Oct 15

Ottomata moved T198628: Count the number of video plays from Incoming (new tickets) to Backlog on the Data-Engineering board.
Wed, Oct 15, 4:08 PM · Experimentation Lab, Data-Engineering, Analytics

Wed, Oct 8

Milimetric raised the priority of T198628: Count the number of video plays from Low to Medium.

Hi @Ahoelzl. We have new interest in this work and volunteers to do it. I'm also tagging Experiment Platform because we can support this work with it. cc @phuedx.

Wed, Oct 8, 5:55 PM · Experimentation Lab, Data-Engineering, Analytics
Ahoelzl closed T376026: Update event-producing tools to overwrite `meta.dt`, a subtask of T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas, as Resolved.
Wed, Oct 8, 1:15 AM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics

Wed, Sep 24

Ahoelzl closed T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas as Resolved.
Wed, Sep 24, 9:39 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ahoelzl closed T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas, a subtask of T240460: Clients need to generate an ISO 8601 formatted timestamp, as Resolved.
Wed, Sep 24, 9:39 PM · Data-Engineering, MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), Analytics, Event-Platform, MW-1.35-notes (1.35.0-wmf.37; 2020-06-16), Better Use Of Data
Ahoelzl closed T395727: Sharp spike in unique devices for past month on all projects as Resolved.
Wed, Sep 24, 9:38 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Analytics-Data-Problem, Movement-Insights, Analytics, Data-Engineering-Wikistats
Maintenance_bot added a project to T272863: EventLogging PHP EventServiceClient should use EventBus->send().: Data-Engineering.
Wed, Sep 24, 4:31 PM · Data-Engineering, MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Config, ci-test-error, Patch-For-Review, Product-Data-Infrastructure, Event-Platform, Analytics, Better Use Of Data
Ottomata added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

@dcausse should we resolve this task?

Wed, Sep 24, 3:59 PM · Discovery-Search, Data-Engineering-Icebox, Data-Engineering, Epic, Event-Platform, Analytics, Wikidata, EventStreams

Tue, Sep 23

Maintenance_bot removed a project from T275674: MEP: Schema fragments shouldn't require fields: Patch-For-Review.
Tue, Sep 23, 8:33 PM · Data-Engineering, Analytics-Kanban, Product-Data-Infrastructure, Event-Platform, Analytics
CodeReviewBot added a comment to T275674: MEP: Schema fragments shouldn't require fields.

mediawiki/page/restrictions-change/2.0.0 - don't require rev_id, etc.

Tue, Sep 23, 7:49 PM · Data-Engineering, Analytics-Kanban, Product-Data-Infrastructure, Event-Platform, Analytics
Maintenance_bot added a project to T275674: MEP: Schema fragments shouldn't require fields: Data-Engineering.
Tue, Sep 23, 7:31 PM · Data-Engineering, Analytics-Kanban, Product-Data-Infrastructure, Event-Platform, Analytics
CodeReviewBot added a project to T275674: MEP: Schema fragments shouldn't require fields: Patch-For-Review.

otto opened https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-primary/-/merge_requests/26

Tue, Sep 23, 6:57 PM · Data-Engineering, Analytics-Kanban, Product-Data-Infrastructure, Event-Platform, Analytics

Sep 22 2025

gerritbot added a comment to T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

Change #1190301 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] EventSerializer - fix logic for setting of meta.dt

https://gerrit.wikimedia.org/r/1190301

Sep 22 2025, 6:51 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
gerritbot added a comment to T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

Change #1190301 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] EventSerializer - fix logic for setting of meta.dt

https://gerrit.wikimedia.org/r/1190301

Sep 22 2025, 3:51 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics

Sep 17 2025

Ottomata moved T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams from To be Estimated/To be discussed to Estimated/ Discussed on the Event-Platform board.
Sep 17 2025, 3:07 PM · Discovery-Search, Data-Engineering-Icebox, Data-Engineering, Epic, Event-Platform, Analytics, Wikidata, EventStreams
Ottomata moved T213561: Discovery for Kafka cluster brokers from Backlog to Components on the Event-Platform board.
Sep 17 2025, 3:07 PM · Data-Engineering-Radar, Data-Platform-SRE, Data-Engineering, SRE, Services (watching), Event-Platform, Analytics
Ottomata moved T268027: Automate EventGate validation error reporting from Backlog to Components on the Event-Platform board.
Sep 17 2025, 3:06 PM · Analytics, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Event-Platform
Ottomata moved T276955: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization from Backlog to Radar on the Event-Platform board.
Sep 17 2025, 3:06 PM · Analytics, Data-Engineering, Product-Analytics, Event-Platform, Better Use Of Data

Sep 8 2025

mforns moved T395727: Sharp spike in unique devices for past month on all projects from In progress to Done on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Sep 8 2025, 3:27 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Analytics-Data-Problem, Movement-Insights, Analytics, Data-Engineering-Wikistats

Aug 28 2025

Ottomata closed T240387: MW REST API Historical Data Endpoint Needs, a subtask of T258511: Data Lake incremental Data Updates , as Declined.
Aug 28 2025, 5:54 PM · Patch-For-Review, Analytics, Epic, Product-Analytics

Aug 27 2025

Maintenance_bot removed a project from T212778: Add is_pageview as a dimension to the 'webrequest_sampled_128' Druid dataset: Patch-For-Review.
Aug 27 2025, 5:31 PM · Analytics-Kanban, Analytics
Ottomata moved T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas from Next Up to Done on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 27 2025, 4:02 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata claimed T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.
Aug 27 2025, 4:02 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata edited projects for T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas, added: Data-Engineering (Q1 FY25/26 July 1st - September 30th); removed Data-Engineering.
Aug 27 2025, 4:02 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata updated the task description for T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.
Aug 27 2025, 2:31 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata triaged T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas as Medium priority.
Aug 27 2025, 2:29 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata updated the task description for T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.
Aug 27 2025, 2:29 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics

Aug 12 2025

AndrewTavis_WMDE added a comment to T198628: Count the number of video plays.

Thanks for the note here, @Milimetric! Please let us know when all of this is finished up and we'll switch out process over to the new metrics.

Aug 12 2025, 9:46 AM · Experimentation Lab, Data-Engineering, Analytics
Milimetric added a comment to T198628: Count the number of video plays.

Hi @AndrewTavis_WMDE, yes I'm supporting Yaron in his work here. He has a proof of concept metric implementation. We just have to get it incorporated and deployed, but that may take some doing.

Aug 12 2025, 9:43 AM · Experimentation Lab, Data-Engineering, Analytics

Aug 11 2025

MusikAnimal added a comment to T159046: Track page views by page ID rather than title (handles moved pages).

A related, older task: T121912: Better redirect handling for pageview API

Aug 11 2025, 6:36 PM · Data-Engineering-Radar, Data-Engineering, AQS2.0, Pageviews-API, Analytics
MusikAnimal merged T401475: Topviews Analysis does not update or merge pageviews after article is moved to a new title into T159046: Track page views by page ID rather than title (handles moved pages).
Aug 11 2025, 6:35 PM · Data-Engineering-Radar, Data-Engineering, AQS2.0, Pageviews-API, Analytics

Aug 7 2025

CDanis added a comment to T263049: Avoid extra HTTPS connections for most Event Platform beacons.

You got me curious, so I spent a little time digging on this -- as far as I can tell not much has changed in the real world since Daniel Stenberg's blog post.

Aug 7 2025, 3:14 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
dr0ptp4kt added a comment to T263049: Avoid extra HTTPS connections for most Event Platform beacons.

I should have put the comment on this here ticket instead of the old closed ticket, so doing so now (thanks again @Ottomata for heads up) - question for you @Krinkle . Any guidance appreciated if you've happened to be in the browser codebases or happened to be viewing their real world behaviors around this lately.

Aug 7 2025, 2:49 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform

Jul 29 2025

Doc_James added a comment to T198628: Count the number of video plays.

Okay thanks, we are funding Yaron to help with this work and hopefully make some headway with more video specific metrics.

Jul 29 2025, 2:54 PM · Experimentation Lab, Data-Engineering, Analytics
AndrewTavis_WMDE added a comment to T198628: Count the number of video plays.

Hey @Doc_James 👋 Would be nice if we could have these kinds of high level metrics, but that's out of the scope of the current project which is just trying to get as good of metrics as possible within the current scope of the available data. Exact instrumentation of a video play including duration watched would likely require WMF data engineering to get involved. This is a known issue, but as WMDE understood isn't something that can be prioritized right now, so we chose to go with the current approach.

Jul 29 2025, 9:17 AM · Experimentation Lab, Data-Engineering, Analytics

Jul 28 2025

Doc_James added a comment to T198628: Count the number of video plays.

This is something we at VideoWiki would love to see. Accurate metrics for number of plays of videos and duration of video played. Andrew do you plan to look at that second bit?

Jul 28 2025, 1:48 AM · Experimentation Lab, Data-Engineering, Analytics

Jul 27 2025

dr0ptp4kt updated subscribers of T263049: Avoid extra HTTPS connections for most Event Platform beacons.

@phuedx @dr0ptp4kt I assume the new beacon endpoint (/beacon/v2/events ?) exists now? If so, I believe this task is do-able by changing wgEventLoggingServiceUri to /beacon/v2/events?hasty=true ?

Jul 27 2025, 4:23 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform

Jul 25 2025

Ahoelzl edited projects for T395727: Sharp spike in unique devices for past month on all projects, added: Data-Engineering (Q1 FY25/26 July 1st - September 30th); removed Data-Engineering (Q4 2025 April 1st - June 30th).
Jul 25 2025, 10:56 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Analytics-Data-Problem, Movement-Insights, Analytics, Data-Engineering-Wikistats

Jul 24 2025

gerritbot added a comment to T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

Change #1171696 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] EventFactory - allow the intake service to set meta.dt

https://gerrit.wikimedia.org/r/1171696

Jul 24 2025, 1:49 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics

Jul 23 2025

Ottomata updated subscribers of T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

I'm not totally sure who to notify of this, so @Joe can maybe help?

Jul 23 2025, 7:56 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics

Jul 22 2025

gerritbot added a comment to T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas.

Change #1171696 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] createRecentChangeEvent - allow the intake service to set meta.dt

https://gerrit.wikimedia.org/r/1171696

Jul 22 2025, 7:04 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Ottomata updated subscribers of T263049: Avoid extra HTTPS connections for most Event Platform beacons.

@phuedx @dr0ptp4kt I assume the new beacon endpoint (/beacon/v2/events ?) exists now? If so, I believe this task is do-able by changing wgEventLoggingServiceUri to /beacon/v2/events?hasty=true ?

Jul 22 2025, 1:34 PM · Wikimedia-Performance-recommendation, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Analytics, Event-Platform
Ottomata lowered the priority of T290211: EventStreams sending same data over and over (page links change) from High to Low.
Jul 22 2025, 12:40 PM · Data-Engineering, Platform Engineering, Analytics, Event-Platform
AndrewTavis_WMDE updated subscribers of T198628: Count the number of video plays.

Thanks for all the information, @TheDJ :) Bringing @Ben.buchenau in as well as we've been discussing this. Does seem like we're reliant on preload=none for this. Is there a way for us to monitor if and when this change is made? I guess we'd see the spike in the data.

Jul 22 2025, 9:45 AM · Experimentation Lab, Data-Engineering, Analytics

Jul 21 2025

Ottomata moved T267648: Adopt conventions for server receive and client/event timestamps in non analytics event schemas from Backlog to Components on the Event-Platform board.
Jul 21 2025, 8:28 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review, MW-1.41-notes (1.41.0-wmf.15; 2023-06-27), Platform Team Workboards (Clinic Duty Team), Event-Platform, Better Use Of Data, Analytics
Mayakp.wiki updated the task description for T395727: Sharp spike in unique devices for past month on all projects.
Jul 21 2025, 7:08 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Analytics-Data-Problem, Movement-Insights, Analytics, Data-Engineering-Wikistats
Mayakp.wiki updated the task description for T395727: Sharp spike in unique devices for past month on all projects.
Jul 21 2025, 7:07 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Analytics-Data-Problem, Movement-Insights, Analytics, Data-Engineering-Wikistats

Jul 17 2025

TheDJ added a comment to T198628: Count the number of video plays.

Additional note. If you also count the thumbnail poster downloads of videos, then by combining these numbers, You can give a very rough estimate of the click through rate for the impressions (inclusions of videos in pages). In Industry, this is called the "Play rate" metric.

Jul 17 2025, 12:35 PM · Experimentation Lab, Data-Engineering, Analytics
TheDJ added a comment to T198628: Count the number of video plays.

Can you give examples of these requests that we are filtering? If they are directly on the video, are you only checking requests that have either no Range header or a Range header that begins with 0- ? Because videos are often a progressive downloads, and you'd be counting each chunk being downloaded if you don't account for that, whereas Ranges that begin with 0 at least ensure you only count the first chunk of the download.

Jul 17 2025, 12:16 PM · Experimentation Lab, Data-Engineering, Analytics