Skip to content

gh-138004: fix threadmodule ascii and make thread naming test more lenient #138017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

jadonduff
Copy link

@jadonduff jadonduff commented Aug 21, 2025

Fallback to ASCII in _thread.set_name() when pthread_setname_np() rejects UTF-8

Summary

Issue: test_set_name fails on OpenIndiana (and Solaris descendants).
It seems that pthread_setname_np() only accepts ASCII-only names there:

>>> import _thread
>>> _thread.set_name('€')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Fix

1. _threadmodule.c

  • Attempt to encode the thread name in UTF-8 and call pthread_setname_np() as usual.
  • If the call fails with EINVAL, fall back to ASCII encoding using "replace" (non-ASCII → ?).
  • On macOS and other platforms that already accept UTF-8, the original UTF-8 succeeds and the fallback is never taken. ASCII names continue to work unchanged.
int rc;

// Fallback: If EINVAL, try ASCII encoding with "replace"
if (rc == EINVAL) {
    name_encoded = PyUnicode_AsEncodedString(name_obj, "ascii", "replace");
    if (name_encoded == NULL) {
        return NULL;
    }
#ifdef _PYTHREAD_NAME_MAXLEN
    if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
        PyObject *truncated;
        truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded),
                                              _PYTHREAD_NAME_MAXLEN);
        if (truncated == NULL) {
            Py_DECREF(name_encoded);
            return NULL;
        }
        Py_SETREF(name_encoded, truncated);
    }
#endif
    name = PyBytes_AS_STRING(name_encoded);
#ifdef __APPLE__
    rc = pthread_setname_np(name);
#elif defined(__NetBSD__)
    thread = pthread_self();
    rc = pthread_setname_np(thread, "%s", (void *)name);
#elif defined(HAVE_PTHREAD_SETNAME_NP)
    thread = pthread_self();
    rc = pthread_setname_np(thread, name);
#else /* defined(HAVE_PTHREAD_SET_NAME_NP) */
    thread = pthread_self();
    rc = 0;
    pthread_set_name_np(thread, name);
#endif
    Py_DECREF(name_encoded);
}

Explanation:

  • Keeps current behavior everywhere UTF-8 is supported.
  • Provides a fallback for platforms that only accept ASCII (e.g. OpenIndiana).
  • Avoids raising OSError(22) for non-ASCII names.

2. test_thread.py

  • Updated test_set_name to be lenient on platforms that reject non-ASCII names.
  • If OSError(22) occurs and the attempted name contained non-ASCII, the test is skipped instead of failing.
try:
    thread.start()
    thread.join()
    self.assertEqual(work_name, expected,
                     f"{len(work_name)=} and {len(expected)=}")
except OSError as exc:
    # Accept EINVAL (22) for non-ASCII names on platforms that do not support them
    if getattr(exc, 'errno', None) == 22 and any(ord(c) > 127 for c in name):
        self.skipTest(f"Platform does not support non-ASCII thread names: {exc}")
    else:
        raise

Explanation:

  • On macOS/Linux: runs as before, asserts the name round-trips correctly.
  • On OpenIndiana: skips the test if non-ASCII is not supported.

Verification

  • Ran locally on macOS (UTF-8 names continue to work as before).
  • Did not verify on OpenIndiana (cannot reproduce locally). Fix based on reports.
  • All tests passed on macOS ARM.

@python-cla-bot
Copy link

python-cla-bot bot commented Aug 21, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Aug 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Comment on lines 2632 to 2662
name_encoded = PyUnicode_AsEncodedString(name_obj, "ascii", "replace");
if (name_encoded == NULL) {
return NULL;
}
#ifdef _PYTHREAD_NAME_MAXLEN
if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
PyObject *truncated;
truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded),
_PYTHREAD_NAME_MAXLEN);
if (truncated == NULL) {
Py_DECREF(name_encoded);
return NULL;
}
Py_SETREF(name_encoded, truncated);
}
#endif
name = PyBytes_AS_STRING(name_encoded);
#ifdef __APPLE__
rc = pthread_setname_np(name);
#elif defined(__NetBSD__)
thread = pthread_self();
rc = pthread_setname_np(thread, "%s", (void *)name);
#elif defined(HAVE_PTHREAD_SETNAME_NP)
thread = pthread_self();
rc = pthread_setname_np(thread, name);
#else /* defined(HAVE_PTHREAD_SET_NAME_NP) */
thread = pthread_self();
rc = 0;
pthread_set_name_np(thread, name);
#endif
Py_DECREF(name_encoded);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not duplicate such complex code. Move it to function and reuse. Or use a loop (but this may be more complicated).

Copy link
Author

@jadonduff jadonduff Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved to a function and reused. Despite most tests passing, I am receiving the following for the "Check if generated files are up to date" test:

 /usr/bin/ld: Modules/_threadmodule.o: in function `_thread_set_name_impl':
/home/runner/work/cpython/cpython/./Modules/_threadmodule.c:2617:(.text+0x2ed5): undefined reference to `_set_thread_name'
/usr/bin/ld: /home/runner/work/cpython/cpython/./Modules/_threadmodule.c:2637:(.text+0x2f99): undefined reference to `_set_thread_name'
collect2: error: ld returned 1 exit status
make: *** [Makefile:1927: Programs/_freeze_module] Error 1
Error: Process completed with exit code 2.

Please let me know if you see the issue. It appeared to be working before changing to function.

thread.join()
self.assertEqual(work_name, expected,
f"{len(work_name)=} and {len(expected)=}")
except OSError as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OSError even raised here? The test failure was different -- that work_name was unexpectedly empty.

Copy link
Author

@jadonduff jadonduff Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

                    if any(ord(c) > 127 for c in name) and (not work_name or work_name == ""):
                        self.skipTest(f"Platform does not support non-ASCII thread names: got empty name for {name!r}")
                    self.assertEqual(work_name, expected,
                                     f"{len(work_name)=} and {len(expected)=}")

Kept OSError fallback because it was shown in original issue, but can remove if needed.

>>> import _thread
>>> _thread.set_name('€')
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    _thread.set_name('€')
    ~~~~~~~~~~~~~~~~^^^^^
OSError: [Errno 22] Invalid argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was shown that _thread.set_name() fails, but that OSError is not leaked from the Thread code.

@bedevere-app
Copy link

bedevere-app bot commented Aug 21, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@jadonduff
Copy link
Author

Please see comments regarding the requested changes. OS tests passed, but a function not found error occurred in "Tests / Check if generated files are up to date (pull_request)", which I am having trouble fixing. Apologies - this is my first attempt at contributing to Python.

@jadonduff
Copy link
Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Aug 22, 2025

Thanks for making the requested changes!

@serhiy-storchaka: please review the changes made to this pull request.

Comment on lines 2651 to 2660
#ifdef _PYTHREAD_NAME_MAXLEN
if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
PyObject *truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded), _PYTHREAD_NAME_MAXLEN);
if (truncated == NULL) {
Py_DECREF(name_encoded);
return NULL;
}
Py_SETREF(name_encoded, truncated);
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please try to avoid duplicating code. This should be factored out into its own function. Here's an outline:

static PyObject *
get_truncated(PyObject *name_encoded /* stolen */)
{
#ifdef _PYTHREAD_NAME_MAXLEN
    if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
        PyObject *truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded), _PYTHREAD_NAME_MAXLEN);
        if (truncated == NULL) {
            Py_DECREF(name_encoded);
            return NULL;
        }
        Py_SETREF(name_encoded, truncated);
    }
#endif
    return name_encoded;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply include the encoding and truncating code in _set_thread_name(). BTW, there is no need to use underscored names here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created function encode_thread_name and used to replace duplicated code

@@ -2241,6 +2241,7 @@ def __init__(self, a, *, b) -> None:

with warnings.catch_warnings(record=True) as warnings_log:
CustomRLock(1, b=2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray newline change:

Suggested change

jadonduff and others added 3 commits August 22, 2025 13:28
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
@jadonduff
Copy link
Author

Pulled _threadmodule.c from main and re-wrote functions. Added two functions - set_native_thread_name and encode_thread_name (using ZeroIntensity's outline). Also removed OSError exception handing in test_threading.py (serhiy-storchaka)

Comment on lines 2646 to 2651
name_encoded = encode_thread_name(name_obj, "ascii");
if (name_encoded == NULL) {
return NULL;
}
name = PyBytes_AS_STRING(name_encoded);
rc = set_native_thread_name(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge all this (and the following Py_DECREF(name_encoded)) in a single function.

@@ -2360,6 +2360,9 @@ def work():
thread = threading.Thread(target=work, name=name)
thread.start()
thread.join()
# If the name is non-ASCII and the result is empty, skip (platform limitation)
if any(ord(c) > 127 for c in name) and (not work_name or work_name == ""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the isascii() method. And why not work_name or work_name == ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used isascii() method, and changed to:

# If the name is non-ASCII and the result is empty, skip (platform limitation)
if not name.isascii() and not work_name:
    self.skipTest(f"Platform does not support non-ASCII thread names: got empty name for {name!r}")
self.assertEqual(work_name, expected,
                 f"{len(work_name)=} and {len(expected)=}")

Comment on lines 2584 to 2589
#ifdef __sun
// Solaris always uses UTF-8
const char *encoding = "utf-8";
#else
// Encode the thread name to the filesystem encoding using the "replace"
// error handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to encode_thread_name

@@ -2894,4 +2924,4 @@ PyMODINIT_FUNC
PyInit__thread(void)
{
return PyModuleDef_Init(&thread_module);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change (newline?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May have accidentally added a newline in a previous commit. It now matches the current _threadmodule.c in the main branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

Py_RETURN_NONE;
#else
// Windows implementation
assert(pSetThreadDescription != NULL);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those blank lines could be kept as this part of the code is not touched.

jadonduff and others added 3 commits August 22, 2025 16:58
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@@ -2360,6 +2361,9 @@ def work():
thread = threading.Thread(target=work, name=name)
thread.start()
thread.join()
# If the name is non-ASCII and the result is empty, skip (platform limitation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not repeat the code in a comment.

@@ -2894,4 +2924,4 @@ PyMODINIT_FUNC
PyInit__thread(void)
{
return PyModuleDef_Init(&thread_module);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

if (rc) {
errno = rc;
int err = rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it was needed to introduce the err variable?

Comment on lines 116 to 117
Py_DECREF(name_encoded);
return truncated;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code.

Comment on lines 127 to 134
PyObject *name_encoded = encode_thread_name(name_obj, encoding);
if (name_encoded == NULL) {
return -1; // error, exception set
}
const char *name = PyBytes_AS_STRING(name_encoded);
int rc = set_native_thread_name(name);
Py_DECREF(name_encoded);
return rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can now inline set_native_thread_name() and encode_thread_name() as it was in the original code.

{
PyObject *name_encoded = encode_thread_name(name_obj, encoding);
if (name_encoded == NULL) {
return -1; // error, exception set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens at the caller place when set_thread_name_with_encoding() returns -1?

@jadonduff
Copy link
Author

Update

Rewrote functions in _threadmodule.c for clarity. Now adds minimal code to address issue, including two helpers. These helpers are used more than once, so no redundancy with inline code is present. They are positioned closer to the _thread_set_name_impl function in the previous code.


threadmodule.c

  • Added/include guards: pthread.h, errno.h, string.h.

  • Added two helpers: set_native_thread_name() and encode_thread_name().

  • Updated _thread_set_name_impl() to:

    1. pick encoding (fs codec or "utf-8" on __sun),
    2. call encode_thread_name(...), then set_native_thread_name(...),
    3. if rc == EINVAL retry with encode_thread_name(..., "ascii"),
    4. propagate other errors.

threading_test.py

  • Made the test tolerant: if platform returns empty name for non-ASCII input, the test skips instead of failing (avoids false negatives on ASCII-only platforms).
  • Skips for empty work_name (checks if name is non-ascii and not workname) as specified by serhiy-storchaka.

Helper functions

set_native_thread_name(const char *name)

  • Moved pthread_* name APIs (macOS/Linux/NetBSD variants) from _thread_set_name_impl, since we are implementing a fallback (removing duplicated code as in original commit)
  • Returns 0 on success or a POSIX error code like EINVAL.
  • Purpose: return success status for _thread_set_name_impl(), so it can decide to retry on EINVAL if needed.

encode_thread_name(PyObject *name_obj, const char *encoding)

  • Encodes Python strPyBytes using PyUnicode_AsEncodedString(..., "replace").
  • Truncates to _PYTHREAD_NAME_MAXLEN if defined.
  • Allows us to specify encoding. So, if an EINVAL error is given and ASCII encoding is chosen, this function will encode with that encoding.
  • Returns new PyBytes* or NULL on error.
  • Since we use PyUnicode_AsEncodedString,"replace" parameter ensures ASCII fallback yields visible placeholders (e.g. ?).

New Flow OpenIndiana / Solaris (what happens now, from original issue)

  1. User calls _thread.set_name('€').
  2. _thread_set_name_impl() picks encoding: fs codec or "utf-8" on Solaris.
  3. encode_thread_name(..., "utf-8") → yields UTF-8 bytes for .
  4. set_native_thread_name(utf8_bytes) calls pthread_setname_np()returns EINVAL (fail: non-ASCII rejected).
  5. Code sees rc == EINVAL → retries: encode_thread_name(..., "ascii") with "replace" → yields ASCII bytes like "?".
  6. set_native_thread_name("?" ) → native call succeeds; _thread.set_name() returns None.
  7. If any other error occurs (e.g. not EINVAL), it is propagated as a Python OSError.

Notes

  • Uses "replace" so non-ASCII becomes visible placeholders (?) rather than silently dropped. Can use "ignore" if you do not want ?s replacing the symbols

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you just reverted the recent changes?

Please extract all duplicate code that calls encode_thread_name() and set_native_thread_name() into a separate function, as was the case in the previous version. Then inline the encode_thread_name() and set_native_thread_name() code, which will only be used once. If you did this to handle errors in PyUnicode_AsEncodedString() and PyBytes_FromStringAndSize(), you can simply check if the returned code is -1 and PyErr_Occurred() is true.

Also, remove all unrelated changes.

There is also a problem with tests, but we will get to that when we finish with the code.

@jadonduff
Copy link
Author

jadonduff commented Aug 24, 2025

Update

  • I moved all duplicate code into a single function called set_encoded_thread_name(). So, now there is only 1 total new function that addresses the duplicated code.
  • Inlined/removed the old encode_thread_name() and set_native_thread_name() logic directly into that helper (both functions removed).
  • The helper now returns 0 on success, positive errno-style codes on native errors, and -1 on Python errors.
  • At the call site, _thread_set_name_impl checks for -1 and uses PyErr_Occurred() to handle Python exceptions as you suggested.
  • Removed all unrelated changes, so we only address the issue.
  • Note that you may see some newline/comment changes throughout the file. These are some stray changes that occurred throughout the file in previous commits for this pull request. I compared to the current version in main branch and ensured that my file matched.

I have made the requested changes; please review again.

Comment on lines 2653 to 2659
/* Confirm a Python exception was set by the helper.
If not, convert to a runtime error (defensive). */
if (PyErr_Occurred()) {
return NULL;
}
Py_SETREF(name_encoded, truncated);
PyErr_SetString(PyExc_RuntimeError, "internal error in set_encoded_thread_name");
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If errno=-1 was set by pthread_setname_np() or other functions (very unlikely), it should be treated as any other errno value. I.e. fallback to raising OSError instead of RuntimeError.

So just write

    if (rc == -1 && PyErr_Occurred()) {
        return NULL;
    }

Most likely all errno values set by library functions are positive, so PyErr_Occurred() may be not needed, but it will not harm to check.

Comment on lines 2667 to 2671
if (PyErr_Occurred()) {
return NULL;
}
PyErr_SetString(PyExc_RuntimeError, "internal error in set_encoded_thread_name");
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@jadonduff
Copy link
Author

I have made the requested changes; please review again.
(Also fixed a minor lint issue from the checks)

@bedevere-app
Copy link

bedevere-app bot commented Aug 24, 2025

Thanks for making the requested changes!

@serhiy-storchaka: please review the changes made to this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants