Page MenuHomePhabricator

Multidirectional language conversion for content pages using LanguageConverter should be prevented on multilingual wikis
Open, Stalled, Needs TriagePublic

Description

Multidirectional language conversion for content pages using LanguageConverter should be prevented on multilingual wikis.

This was caused by T39338: $wgTranslateBlacklist of zh-* on metawiki .

This is not an issue for monodirectional language conversion; however, this had caused several issues:

  • Search engines didn't index pages in language variants
  • The {{ll template cannot recognize LanguageConverter syntax in the page display title
    • Especially for MediaWiki.org
  • LanguageConverter syntax exposed on pages without original converter and cannot easily choose the variant to transclude
    • For example, we cannot choose to transclude the converted zh-hant version from zh pages in non-zh pages.
  • The Translate special page become a mess because:
    • Badly mixed translation memories (TM)
    • There are no proper places to place page-wide conversion rules
    • There are many manual conversion tags in the translation unit
    • The preview cannot handle LanguageConverter syntax properly
  • The current LC zh workaround making zh translation suggestions "polluting" to other languages with zh as a fallback
    • For example, there are nan and wuu translations misused LC zh template from the translation suggestions from zh
  • The new introduced translation bundles cannot properly handle LC zh template as the messages were frame:preprocess'd every time, meaning we need to pass the language tag every time calling LC zh
  • T39557: Untranslated units should not be converted to script variants
  • (to be addressed)

Without using /zh-hans, /zh-hant, /zh-hk, we have to pass the language tag every time using message bundle messages.

-- Wrapping all of them under /zh using {{LC zh|, without using /zh-hans, /zh-hant, /zh-hk
tmb.new( mb_page_title, lang_tag ):t( message_key ):params( lang_tag ):plain()
-- Using separated /zh-hans, /zh-hant, /zh-hk, we no longer need to pass the language tag :params( lang_tag ) every time
tmb.new( mb_page_title, lang_tag ):t( message_key ):plain()

With this change, every Lua module using translation bundles can be simplified:

- :t( message_key ):params( lang_tag ):plain()
+ :t( message_key ):plain()

Without this change, every Lua module using translation bundles need to:

- :t( message_key ):plain()
+ :t( message_key ):params( lang_tag ):plain()

Former discussions:

Related Objects

StatusSubtypeAssignedTask
OpenNone
StalledNone
StalledNone
ResolvedNone
OpenNone
OpenNone
StalledNone
OpenNone
OpenNone
OpenNone
Resolvedcscott
Invalid GWicke
Resolvedliangent
Resolvedthiemowmde
OpenNone
Resolvedcscott
Resolvedcscott
Resolved Elitre
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
OpenNone
DuplicateBUG REPORTNone
Resolvedcscott
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedBUG REPORTJgiannelos
OpenNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone

Event Timeline

Change 889616 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[operations/mediawiki-config@master] Update $wgTranslateBlacklist

https://gerrit.wikimedia.org/r/889616

This should be declined.
The Translate extension and other components should try to address the issues you mentioned, not asking translators to translate into multiple variants of Chinese.
And the ancient discussion on babel doesn't make any sense here.

Func changed the task status from Open to Stalled.Feb 23 2023, 3:00 PM

The Translate extension and other components should try to address the issues you mentioned, not asking translators to translate into multiple variants of Chinese.

I think there should always be NO badly mixed translation memories (TM). It's always a bad practice for translators.

And the ancient discussion on babel doesn't make any sense here.

Just addressing this is a long-time issue.

Winston_Sung renamed this task from Language conversion for content pages using LanguageConverter should be prevented on multilingual wikis to Multidirectional language conversion for content pages using LanguageConverter should be prevented on multilingual wikis.Feb 23 2023, 3:15 PM
Winston_Sung updated the task description. (Show Details)
Winston_Sung updated the task description. (Show Details)

I think there should always be NO badly mixed translation memories (TM). It's always a bad practice for translators.

The Translate extension can try to categorize suggestions by guessing their variants, or maybe try to convert suggestions and label them with a "converted" warning.

guessing their variants

The guessVariant function is already considered a bad practice and will not implement in Parsoid.

try to convert suggestions and label them with a "converted" warning.

Then there's a problem:

  • How to keep -{ tags from TM even after conversion to prevent missing context?

The guessVariant function is already considered a bad practice and will not implement in Parsoid.

Because that function so far was only used in the LanguageConverter itself.
And that function was never implemented for ZhConverter.

But as an extension that manages the translation, I think it's acceptable to have some guessing feature if we decided to go that path.

Then there's a problem:

  • How to keep -{ tags from TM even after conversion to prevent missing context?

IMHO just skip converting it. Most translations didn't come with conversion markups.

I think it's acceptable to have some guessing feature if we decided to go that path.

I believe having all of them marked as specific "variants" is a better approach.

In my opinion, we shouldn't "guess".

IMHO just skip converting it.

Then that's not something "converted".

Let's talk about the mechanism issues:

What would you think about dealing these problems (any solutions that are not considered as bad practice)?

The {{ll template cannot recognize LanguageConverter syntax in the page display title

Introduce another PAGENAME parser function with language conversion? But how should it work in non-conversion pages?

LanguageConverter syntax exposed on pages without original converter and cannot easily choose the variant to transclude. For example, we cannot choose to transclude the converted zh-hant version from zh pages in non-zh pages.

I don't think passing a parameter would be the fix.

Badly mixed translation memories (TM)
try to categorize suggestions by guessing their variants
I think it's acceptable to have some guessing feature if we decided to go that path.

In my opinion, we shouldn't "guess".

There are no proper places to place page-wide conversion rules

Add a section for them?

The preview cannot handle LanguageConverter syntax properly

Add preview variant selection? List all variants at the same time?

Change #1143697 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[operations/mediawiki-config@master] Make Wikifunctions $wgTranslateDisabledTargetLanguages align with the translate target languages of ZObjects

https://gerrit.wikimedia.org/r/1143697

Change #1143697 merged by jenkins-bot:

[operations/mediawiki-config@master] Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages

https://gerrit.wikimedia.org/r/1143697

Mentioned in SAL (#wikimedia-operations) [2025-05-29T07:47:53Z] <dcausse@deploy1003> Started scap sync-world: Backport for [[gerrit:1143697|Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-29T07:50:16Z] <dcausse@deploy1003> wsung, dcausse: Backport for [[gerrit:1143697|Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Change #1152004 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[operations/mediawiki-config@master] Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages

https://gerrit.wikimedia.org/r/1152004

Mentioned in SAL (#wikimedia-operations) [2025-05-29T08:04:41Z] <dcausse@deploy1003> Finished scap sync-world: Backport for [[gerrit:1143697|Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838)]] (duration: 16m 47s)

Change #1152004 merged by jenkins-bot:

[operations/mediawiki-config@master] Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages

https://gerrit.wikimedia.org/r/1152004

Mentioned in SAL (#wikimedia-operations) [2025-05-29T08:08:28Z] <dcausse@deploy1003> Started scap sync-world: Backport for [[gerrit:1152004|Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-29T08:10:37Z] <dcausse@deploy1003> wsung, dcausse: Backport for [[gerrit:1152004|Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-05-29T08:18:37Z] <dcausse@deploy1003> Finished scap sync-world: Backport for [[gerrit:1152004|Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838)]] (duration: 10m 08s)

I would not support this unless there's a bot or something that automatically copies updates between Chinese language variants. It can skip 词汇 that are labelled inline as unique to each variant.

Otherwise, this more than quadruples the work of Chinese translators.

The {{ll template cannot recognize LanguageConverter syntax in the page display title

I think this has been fixed with LC zh, correct me if I'm wrong; I'm probably wrong.

Badly mixed translation memories (TM)

  1. Practically this seems like just an annoyance.
  2. As somewhat mentioned above, this can be fixed by having the translate extension convert everything to the user's preferred variant.

There are no proper places to place page-wide conversion rules

The preview cannot handle LanguageConverter syntax properly

I think it would be more rewarding to work on fixing these in the Translator extension.

There are many manual conversion tags in the translation unit

I don't see how this is a problem.

Search engines didn't index pages in language variants

This is a good point. I think it might be worth contacting some affected search engines but I can also see how that would be a tall ask.

Without using /zh-hans, /zh-hant, /zh-hk, we have to pass the language tag every time.

-- Wrapping all of them under /zh using {{LC zh|, without using /zh-hans, /zh-hant, /zh-hk
tmb.new( mb_page_title, lang_tag ):t( message_key ):params( lang_tag ):plain()
-- Using separated /zh-hans, /zh-hant, /zh-hk, we no longer need to pass the language tag :params( lang_tag ) every time
tmb.new( mb_page_title, lang_tag ):t( message_key ):plain()

With this change, every Lua module using translation bundles can be simplified:

- :t( message_key ):params( lang_tag ):plain()
+ :t( message_key ):plain()

Without this change, every Lua module using translation bundles need to:

- :t( message_key ):plain()
+ :t( message_key ):params( lang_tag ):plain()

It does seem frustrating technically, but from a volunteer side it is an immense positive to have a good default translation that's converted between the variants. Which is why I think that the best way to move this forward is to first have some sort of technology that would be able to automatically copy updates between the variant versions.

I would suggest implement the pretranslate-by-convert feature to Translate extension, something similar to Wikibase:

圖片.png (364×1 px, 78 KB)

What I mentioned would need to be more than the Wikibase thing. IIRC that only synchronizes the variants if the translation for that is empty.