Page MenuHomePhabricator

Add hyphens to break temporary user names into groups of 5 digits
Closed, ResolvedPublic

Description

Backgound

The format for temporary user names was specified in T349494: Update the format of temporary user names to include the year and hyphens.

This part was never implemented:

  • The identifying temp user # n is broken into groups of 5 separated by hyphens. If the numbers don't neatly divide into groups of 5 then the very last group on the right can have fewer numbers.
Acceptance criteria
  • Temporary user names with a serial number greater than 5 digits are broken up with hyphens grouping 5 digits at a time.
Notes

At the time of writing there appear to be no temporary users with 6 digit prefixes on production*, so if we solve this soon, we shouldn't get any wrongly formatted names.

[ * ] There are none at the time of writing with 5-digit prefixes starting with ~2024-3

Update: This change was made to WMF production wikis just after names started to be created with 6 digits. So for a few temporary users created in 2025 we do have a mixture of formats, e.g. ~2025-154757 and ~2025-15474-9.

Event Timeline

Change #1101745 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/core@master] temp accounts: Break up temporary user names with hyphens

https://gerrit.wikimedia.org/r/1101745

Change #1101745 merged by jenkins-bot:

[mediawiki/core@master] temp accounts: Break up temporary user names with hyphens

https://gerrit.wikimedia.org/r/1101745

Djackson-ctr subscribed.

QA is completed... new code changes have been implemented (Temporary user names with a serial number greater than 5 digits are broken up with hyphens grouping 5 digits at a time).

image.png (114×1 px, 18 KB)

There is an internal discussion to consider changing this to chunks of 3 digits, but no consensus yet to change to that.

Note that if we do that, we will need to decide what to do with the usernames that have already been made in the format ~2024-n and ~2025-n, where n is a number with 4 or more digits.

We didn't hit a 6 digit number before this task was completed, so it wasn't an issue for adding hyphens after every 5 digits.

There is an internal discussion to consider changing this to chunks of 3 digits, but no consensus yet to change to that.

We had some further internal discussion, and some limited discussion with Stewards, but did not have a clear consensus on a proposed change. Consequently, we've decided to table this, and we could revisit later if there are stronger arguments for making a change.

@kostajh the following temporary account has been created today. Is it OK?
https://fr.wikipedia.org/wiki/Sp%C3%A9cial:CentralAuth/~2025-100618

Yes, 5 digits are expected. Per T381845#10505349, we did not reach agreement to change this to 3 or 4 digits, as some had proposed.

@kostajh points out that this change was made, but required a config change that was never done. The current config is:

$wgAutoCreateTempUser['serialMapping'] = [ 'type' => 'plain-numeric', 'offset' => 1500 ];

...where the type is plain-numeric but should be readable-numeric to get the hyphens.

What should we do?

If we correct the config now, there will be some temp accounts with format like ~2025-100618 and some like ~2025-10061-8. (The name is saved in database tables so won't update.) This might look a little confusing for a while until the wrongly-formatted accounts expire, and then will become a historical quirk. Sort order will be affected, but otherwise I think this should be fairly harmless. The serial mapping is only used when creating the name, and there is no assumption that the correct name can be derived from the index stored in the database using the serial mapping later.

We could:

  • Fix the config now, and have a mixture of formats, but from now on all new temp accounts will have hyphens separating groups of 5 digits
  • Fix the config now and rename the users with the wrong style (not recommended as the work is not worth the gain)
  • Update the code to hyphenate in groups of 6, and set the config, thus avoiding a mix of formats

(Temp accounts do not have more than 6 digits at the time of writing: https://login.wikimedia.org/wiki/Special:ListUsers?username=&group=&temporaryAccountsOnly=1&creationSort=1&desc=1&wpsubmit=&wpFormIdentifier=mw-listusers-form&limit=50)

I think 6-digits numbers are as identifiable as 5-digits numbers, so you could avoid a mix of formats.

Change #1166791 had a related patch set uploaded (by Tchanders; author: Tchanders):

[operations/mediawiki-config@master] temp accounts: Separate digits in user names with hyphens

https://gerrit.wikimedia.org/r/1166791

It looks as though we should have a while until the first 7-digit name*, so we can use the normal deployment process (no emergency backports necessary).

[ * ] There were roughly 3000 created in the last 12 hours based on loginwiki; the latest one is roughly ~2025-150000; and the first 7-digit name would be ~2025-1000000, so we probably won't get a 7-digit name before the next rollout stage.

@Tchanders Let's fix the format to be 5 digits as previously decided. It's okay to have a mix of digit sizes for a while until they expire. We aren't rolled out globally so the impact is somewhat limited at the moment.

Change #1166791 merged by jenkins-bot:

[operations/mediawiki-config@master] temp accounts: Separate digits in user names with hyphens

https://gerrit.wikimedia.org/r/1166791

Mentioned in SAL (#wikimedia-operations) [2025-07-08T07:03:42Z] <tchanders@deploy1003> Started scap sync-world: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]]

Mentioned in SAL (#wikimedia-operations) [2025-07-08T07:05:48Z] <tchanders@deploy1003> tchanders: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-07-08T07:14:44Z] <tchanders@deploy1003> Finished scap sync-world: Backport for [[gerrit:1166791|temp accounts: Separate digits in user names with hyphens (T381845)]] (duration: 11m 02s)

Tchanders renamed this task from Add hyphens to break temporary user names into groups of <5 digits to Add hyphens to break temporary user names into groups of 5 digits.Jul 8 2025, 7:19 AM
Tchanders closed this task as Resolved.
Tchanders updated the task description. (Show Details)

Not sure if this is an actual issue and @Niharika already explained to me that there are multiple reasons why new temporary accounts don't always appear with ascending names, but some of the differences appear quite large to me, e.g. https://login.wikimedia.org/w/index.php?title=Special:ListUsers&group=&username=~2025-6&wpFormIdentifier=mw-listusers-form&wpsubmit=&offset=~2025-6000&limit=5 (~2025-60000 was created five days after ~2025-60005). Similarly current loginwiki creations of temp accounts with the new naming scheme surprise me, e.g. https://login.wikimedia.org/w/index.php?title=Special:Log/newusers&type=newusers&user=&offset=20250708205618%7C44858512 (the newest temporary account within this sample ~2025-15564-8 has a much lower number than older temp accounts in this sample like ~2025-15970-9).

The numbers will increase over time, and then be reset again at the start of the new calendar year (UTC). But the actual incrementing is not done by adding 1 to the previously created user, for technical reasons.

The name is generated from a shard ID (0-7) and an incrementing value (exact calculation here). This is done "to avoid acquiring a global lock when allocating IDs, at the expense of making the IDs be non-monotonic" (quoted from the code comments) - essentially to make the chance of several process trying to access and increment the same number at the same time (and therefore get locked out) really low.