Issue36239
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-03-08 13:58 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| parse.py | vstinner, 2019-03-08 13:58 | |||
| comments.po | vstinner, 2019-03-08 13:58 | |||
| messages.mo | vstinner, 2019-03-08 13:59 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12255 | merged | mdk, 2019-03-09 22:53 | |
| PR 13218 | closed | miss-islington, 2019-05-09 14:23 | |
| Messages (12) | |||
|---|---|---|---|
| msg337476 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 13:58 | |
When a translation .po file contains a comment in headers, it's kept when compiled as .mo by msgfmt.
Example with test.po:
---
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"#-#-#-#-# plo.po (PACKAGE VERSION) #-#-#-#-#\n"
---
Compile it with "msgfmt". Parse the output file messages.mo using test.py script:
---
import gettext, pprint
with open("messages.mo", "rb") as fp:
t = gettext.GNUTranslations()
t._parse(fp)
pprint.pprint(t._info)
---
Output on Python 3.7.2:
---
{'content-type': 'text/plain; charset=UTF-8',
'plural-forms': 'nplurals=2; plural=(n != 1);\n'
'#-#-#-#-# plo.po (PACKAGE VERSION) #-#-#-#-#'}
---
Output of Fedora Python 2.7.15 which contains a fix:
---
{'content-type': 'text/plain; charset=UTF-8',
'plural-forms': 'nplurals=2; plural=(n != 1);'}
---
I'm not sure that keeping the comment as part of plural forms is correct. Comments should not be ignored?
I made my test on Fedora 29: msgfmt 0.19.8.1, Python 3.7.2.
Links:
* https://bugs.python.org/issue1448060#msg27754
* https://bugs.python.org/issue1475523
* https://bugzilla.redhat.com/show_bug.cgi?id=252136
Fedora has a patch since 2007 to ignore comments:
https://src.fedoraproject.org/rpms/python2/blob/master/f/python-2.5.1-plural-fix.patch
I can easily convert the patch to a PR, maybe with a test. The question is more if the fix is correct or not.
|
|||
| msg337477 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 13:59 | |
Attached files: * comments.po: PO file with a comment in headers * messages.mo: comments.po compiled with msgfmt * parse.py: Python script to parse messages.mo |
|||
| msg337486 - (view) | Author: Julien Palard (mdk) * ![]() |
Date: 2019-03-08 14:43 | |
After some research I found a few comments around comments being marked as starting by #-#-#-#-# and ending with #-#-#-#-#, not just starting with #.
In gettext-0.19.8.1 sources for example:
$ grep -r '#-#-#-#-' | head
gettext-tools/misc/po-mode.el:#-#-#-#-# file name reference #-#-#-#-#
gettext-tools/misc/po-mode.el: (let* ((marker-regex "^#-#-#-#-# \\(.*\\) #-#-#-#-#\n")
gettext-tools/src/msgl-cat.c: char *id = xasprintf ("#-#-#-#-# %s #-#-#-#-#",
Or more precisly in `gettext-tools/tests/msgcat-10`:
# Verify msgcat of two files, when the header entries have different comments
# but the same contents. The resulting header entry is not marked fuzzy,
# because the #-#-#-#-# are only in comments and do not necessarily require
# translator attention; in other words, an msgstr which is valid in both input
# files is also valid in the result.
I'm however surprised not to find much of "#-#-#-#-#" in the source code, like if they are just looking a single # like you do here.
Not sure which one is the better, eliminating lines with a pair of #-#-#-#-# or lines starting with a #, both looks OK to me (we're only speaking about the header here, not the msgstr, so it won't have much impact).
Personally I'd go for eliminating #-#-#-#-# as this is the only case we've seen, and is the "documented" one in the GNU gettext test cases.
|
|||
| msg337490 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 15:20 | |
I found a .po file with "#" in headers on the Internet, Sympa mailing list project: https://www.sympa.org/distribution/sympa-6.0.10/po-wwsympa/et.po: # #-#-#-#-# blank_web_help_et.po (sympa) #-#-#-#-# # Sympa online help internationalisation. # Copyright (C) 2007 # This file is distributed under the same license as Sympa. # FIRST AUTHOR <david.verdin@cru.fr>, 2007. # # #-#-#-#-# tmp_web_help_et.po (et) #-#-#-#-# # translation of et.po to # translation of et.po to # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # Copyright (C) 2005 Free Software Foundation, Inc. # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # #-#-#-#-# et.po (PACKAGE VERSION) #-#-#-#-# # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL>, YEAR. # Copyright (C) YEAR Free Software Foundation, Inc. # FIRST AUTHOR <EMAIL>, YEAR.#. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER. # root <root@vykk.vil.ee>, 2005. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: et\n" "POT-Creation-Date: 2007-11-13 14:50+0200\n" "PO-Revision-Date: 2007-10-22 00:03+0200\n" "Last-Translator: Alar Sing <alar.sing@etv.ee>\n" "Language-Team: Estonian\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "#-#-#-#-# blank_web_help_et.po (sympa) #-#-#-#-#\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" "#-#-#-#-# tmp_web_help_et.po (et) #-#-#-#-#\n" "X-Generator: Pootle 1.0.2\n" They are 2 headers starting with >"#-#-#-#-# < and ending with > #-#-#-#-#\n"<. |
|||
| msg337491 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 15:20 | |
I hacked gettext.py to parse all files of my system. I found 3 .mo files which contain "#" in headers:
/usr/share/locale/fa/LC_MESSAGES/digikam.mo:
{'content-transfer-encoding': '8bit\n'
'#-#-#-#-# digikamimageplugin_channelmixer.po '
'(digikamimageplugin_channelmixer) #-#-#-#-#',
'content-type': 'text/plain; charset=UTF-8',
'language': 'fa',
'language-team': 'Farsi (Persian) <>',
'last-translator': 'Mohammad Reza Mirdamadi <mohi@ubuntu.ir>',
'mime-version': '1.0',
'plural-forms': 'nplurals=1; plural=0;',
'po-revision-date': '2012-01-13 15:00+0330',
'pot-creation-date': '2018-03-18 03:11+0100',
'project-id-version': 'digikam',
'report-msgid-bugs-to': 'http://bugs.kde.org',
'x-generator': 'KBabel 1.11.4'}
/usr/share/locale/ia/LC_MESSAGES/akonadicontact5-serializer.mo:
{'content-transfer-encoding': '8bit\n'
'#-#-#-#-# akonadi_kalarm_resource.po '
'#-#-#-#-#',
'content-type': 'text/plain; charset=UTF-8',
'language': 'ia',
'language-team': 'Interlingua <kde-i18n-it@kde.org>',
'last-translator': 'g.sora <g.sora@tiscali.it>',
'mime-version': '1.0',
'plural-forms': 'nplurals=2; plural=n != 1;',
'po-revision-date': '2011-11-29 19:38+0100',
'pot-creation-date': '2018-11-12 06:56+0100',
'project-id-version': '',
'report-msgid-bugs-to': 'http://bugs.kde.org',
'x-generator': 'Lokalize 1.2'}
/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo:
{'content-transfer-encoding': '8bit',
'content-type': 'text/plain; charset=UTF-8',
'language': 'ml',
'language-team': 'Swathanthra|സ്വതന്ത്ര Malayalam|മലയാളം '
'Computing|കമ്പ്യൂട്ടിങ്ങ് <smc-discuss@googlegroups.com>',
'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>',
'mime-version': '1.0',
'plural-forms': 'nplurals=2; plural=(n != 1);',
'po-revision-date': '2008-07-10 22:04+0530',
'pot-creation-date': '2018-09-14 06:47+0200',
'project-id-version': 'ktraderclient',
'report-msgid-bugs-to': 'http://bugs.kde.org',
'x-generator': 'KBabel 1.11.4'}
|
|||
| msg337492 - (view) | Author: Julien Palard (mdk) * ![]() |
Date: 2019-03-08 15:27 | |
The 'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>', case does not looks like an issue, it does *not* starts with #, it's in the middle of the line, the line starts with "Last-Translator". |
|||
| msg337493 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 15:30 | |
/usr/share/locale/fa/LC_MESSAGES/digikam.mo: I downloaded the .po file using: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/fa/messages/extragear-graphics/digikam.po > fa_digikam.po It contains many comments in headers. Extract: (...) # MaryamSadat Razavi <razavi@itland.ir>, 2007. # Nasim Daniarzadeh <daniarzadeh@itland.ir>, 2007. # Nazanin Kazemi <kazemi@itland.ir>, 2007. # Mohammad Reza Mirdamadi <mohi@ubuntu.ir>, 2011, 2012. msgid "" msgstr "" "Project-Id-Version: digikam\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" "POT-Creation-Date: 2019-03-08 03:08+0100\n" "PO-Revision-Date: 2012-01-13 15:00+0330\n" "Last-Translator: Mohammad Reza Mirdamadi <mohi@ubuntu.ir>\n" "Language-Team: Farsi (Persian) <>\n" "Language: fa\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "#-#-#-#-# digikamimageplugin_channelmixer.po " "(digikamimageplugin_channelmixer) #-#-#-#-#\n" "X-Generator: Lokalize 1.2\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_refocus.po (digikamimageplugin_refocus) #-#-#-" "#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_oilpaint.po (digikamimageplugin_oilpaint) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_perspective.po " "(digikamimageplugin_perspective) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_freerotation.po " "(digikamimageplugin_freerotation) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugins.po (digikamimageplugins) #-#-#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_blowup.po (digikamimageplugin_blowup) #-#-#-#-" "#\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_charcoal.po (digikamimageplugin_charcoal) #-#-" "#-#-#\n" (...) |
|||
| msg337494 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-03-08 15:38 | |
/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/ml/messages/kde-workspace/ktraderclient5.po > ml_ktraderclient5.po Extract: msgid "" msgstr "" "Project-Id-Version: ktraderclient\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" "POT-Creation-Date: 2018-08-16 09:14+0200\n" "PO-Revision-Date: 2008-07-10 22:04+0530\n" "Last-Translator: # ANI PETER|അനി പീറ്റര്<200d> <peter.ani@gmail.com>\n" "Language-Team: Swathanthra|സ്വതന്ത്ര Malayalam|മലയാളം Computing|കമ്പ്യൂട്ടിങ്ങ് <smc-" "discuss@googlegroups.com>\n" "Language: ml\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "X-Generator: KBabel 1.11.4\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" |
|||
| msg337495 - (view) | Author: Julien Palard (mdk) * ![]() |
Date: 2019-03-08 15:38 | |
That's literally sick þ Looks like we have to trust the "\n", not the file wrapping, but this means that: msgstr "" "Pro" "jec" "t-I" "d-V" "ers" "ion" ": " "dig" "ika" "m\n" "Report-Msgid-Bugs-To: http://bugs.kde.org\n" is valid, too? I have to try it! HAHA it is: $ cat ~/clones/python-docs-fr/glossary.po | head -n 20 # Copyright (C) 2001-2018, Python Software Foundation # For licence information, see README file. # msgid "" msgstr "" "Pr" "oj" "ec" "t-" "Id" "-V" "er" "si" "on" ":" " P" "ython 3.6\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-12-21 09:48+0100\n" "PO-Revision-Date: 2019-03-08 14:48+0100\n" $ msgcat ~/clones/python-docs-fr/glossary.po | head -n 20 # Copyright (C) 2001-2018, Python Software Foundation # For licence information, see README file. # msgid "" msgstr "" "Project-Id-Version: Python 3.6\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-12-21 09:48+0100\n" "PO-Revision-Date: 2019-03-08 14:48+0100\n" "Last-Translator: Jules Lasne <jules.lasne@gmail.com>\n" "Language-Team: FRENCH <traductions@lists.afpy.org>\n" "Language: fr\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "X-Generator: Poedit 2.0.2\n" "# Pouette\n" |
|||
| msg337497 - (view) | Author: Julien Palard (mdk) * ![]() |
Date: 2019-03-08 15:56 | |
I tested further, and when we have this horrible mess in the po files: msgstr "" "Pro" "jec" "t-I" "d-V" "ers" "ion" ": " "dig" "ika" "m\n" We have a clean string in the .mo file. So there is no fear to have of: "Plural-Forms: nplurals=1; plural=0;\n" "#-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-" "#-#-#\n" "X-Generator: KBabel 1.11.4\n" It will be nicely stored in the mo as: Plural-Forms: nplurals=1; plural=0; #-#-#-#-# digikamimageplugin_raindrop.po (digikamimageplugin_raindrop) #-#-#-#-# X-Generator: KBabel 1.11.4 So you can safely remove lines starting and ending with #-#-#-#-#. |
|||
| msg341981 - (view) | Author: Julien Palard (mdk) * ![]() |
Date: 2019-05-09 14:22 | |
New changeset afd1e6d2f0f5aaf4030d13342809ec0915dedf81 by Julien Palard in branch 'master': bpo-36239: Skip comments in gettext infos (GH-12255) https://github.com/python/cpython/commit/afd1e6d2f0f5aaf4030d13342809ec0915dedf81 |
|||
| msg342002 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2019-05-09 22:24 | |
Julien: Why not fixing Python 3.7? You approved https://github.com/python/cpython/pull/13218 (Python 3.7 backport) but then you closed it. Only Azure Pipelines PR failed on "ERROR: test_drain_raises (test.test_asyncio.test_streams.StreamTests)" which is unrelated. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:12 | admin | set | github: 80420 |
| 2019-05-09 22:24:38 | vstinner | set | messages: + msg342002 |
| 2019-05-09 19:31:19 | mdk | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-05-09 14:23:00 | miss-islington | set | pull_requests: + pull_request13129 |
| 2019-05-09 14:22:33 | mdk | set | messages: + msg341981 |
| 2019-03-10 12:50:25 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2019-03-09 22:53:15 | mdk | set | keywords:
+ patch stage: patch review pull_requests: + pull_request12241 |
| 2019-03-08 15:56:14 | mdk | set | messages: + msg337497 |
| 2019-03-08 15:38:22 | mdk | set | messages: + msg337495 |
| 2019-03-08 15:38:08 | vstinner | set | messages: + msg337494 |
| 2019-03-08 15:30:19 | vstinner | set | messages: + msg337493 |
| 2019-03-08 15:27:47 | mdk | set | messages: + msg337492 |
| 2019-03-08 15:20:58 | vstinner | set | messages: + msg337491 |
| 2019-03-08 15:20:04 | vstinner | set | messages: + msg337490 |
| 2019-03-08 14:43:54 | mdk | set | messages: + msg337486 |
| 2019-03-08 13:59:53 | vstinner | set | messages: + msg337477 |
| 2019-03-08 13:59:12 | vstinner | set | files: + messages.mo |
| 2019-03-08 13:58:57 | vstinner | set | files: + comments.po |
| 2019-03-08 13:58:51 | vstinner | set | files: + parse.py |
| 2019-03-08 13:58:19 | vstinner | create | |
