Skip to content

CI: update sanitizer CI to use python compiled with ASAN and TSAN #28273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions .github/workflows/compiler_sanitizers.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: Test with compiler sanitizers

on:
push:
branches:
- main
pull_request:
branches:
- main
- maintenance/**

defaults:
run:
shell: bash

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

permissions:
contents: read # to fetch code (actions/checkout)

jobs:
clang_ASAN:
# To enable this workflow on a fork, comment out:
if: github.repository == 'numpy/numpy'
runs-on: macos-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: recursive
fetch-tags: true
persist-credentials: false
- name: Set up pyenv
run: |
git clone https://github.com/pyenv/pyenv.git "$HOME/.pyenv"
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
PYENV_SHIMS="$PYENV_ROOT/shims"
echo "$PYENV_BIN" >> $GITHUB_PATH
echo "$PYENV_SHIMS" >> $GITHUB_PATH
echo "PYENV_ROOT=$PYENV_ROOT" >> $GITHUB_ENV
- name: Check pyenv is working
run:
pyenv --version
- name: Set up LLVM
run: |
brew install llvm@19
LLVM_PREFIX=$(brew --prefix llvm@19)
echo CC="$LLVM_PREFIX/bin/clang" >> $GITHUB_ENV
echo CXX="$LLVM_PREFIX/bin/clang++" >> $GITHUB_ENV
echo LDFLAGS="-L$LLVM_PREFIX/lib" >> $GITHUB_ENV
echo CPPFLAGS="-I$LLVM_PREFIX/include" >> $GITHUB_ENV
- name: Build Python with address sanitizer
run: |
CONFIGURE_OPTS="--with-address-sanitizer" pyenv install 3.13
pyenv global 3.13
- name: Install dependencies
run: |
pip install -r requirements/build_requirements.txt
pip install -r requirements/ci_requirements.txt
pip install -r requirements/test_requirements.txt
- name: Build
run:
python -m spin build -j2 -- -Db_sanitize=address
- name: Test
run: |
# pass -s to pytest to see ASAN errors and warnings, otherwise pytest captures them
ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true:allocator_may_return_null=1:halt_on_error=1 \
python -m spin test -- -v -s --timeout=600 --durations=10

clang_TSAN:
# To enable this workflow on a fork, comment out:
if: github.repository == 'numpy/numpy'
runs-on: macos-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: recursive
fetch-tags: true
persist-credentials: false
- name: Set up pyenv
run: |
git clone https://github.com/pyenv/pyenv.git "$HOME/.pyenv"
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
PYENV_SHIMS="$PYENV_ROOT/shims"
echo "$PYENV_BIN" >> $GITHUB_PATH
echo "$PYENV_SHIMS" >> $GITHUB_PATH
echo "PYENV_ROOT=$PYENV_ROOT" >> $GITHUB_ENV
- name: Check pyenv is working
run:
pyenv --version
- name: Set up LLVM
run: |
brew install llvm@19
LLVM_PREFIX=$(brew --prefix llvm@19)
echo CC="$LLVM_PREFIX/bin/clang" >> $GITHUB_ENV
echo CXX="$LLVM_PREFIX/bin/clang++" >> $GITHUB_ENV
echo LDFLAGS="-L$LLVM_PREFIX/lib" >> $GITHUB_ENV
echo CPPFLAGS="-I$LLVM_PREFIX/include" >> $GITHUB_ENV
- name: Build Python with thread sanitizer support
run: |
# free-threaded Python is much more likely to trigger races
CONFIGURE_OPTS="--with-thread-sanitizer" pyenv install 3.13t
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes about 10 minutes and Python releases happen infrequently enough I think we'd benefit from using a cache. Is it ok to add a new Github actions cache entry for this step? I don't think the Python installation is all that big on-disk but I can double check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only five minutes on the CI run for this PR 🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. Not essential though if it's indeed 5 minutes - whatever you prefer.

pyenv global 3.13t
- name: Install dependencies
run: |
# TODO: remove when a released cython supports free-threaded python
pip install -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython
pip install -r requirements/build_requirements.txt
pip install -r requirements/ci_requirements.txt
pip install -r requirements/test_requirements.txt
- name: Build
run:
python -m spin build -j2 -- -Db_sanitize=thread
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might also benefit from setting up ccache

- name: Test
run: |
# These tests are slow, so only run tests in files that do "import threading" to make them count
TSAN_OPTIONS=allocator_may_return_null=1:halt_on_error=1 \
python -m spin test \
`find numpy -name "test*.py" | xargs grep -l "import threading" | tr '\n' ' '` \
-- -v -s --timeout=600 --durations=10
59 changes: 0 additions & 59 deletions .github/workflows/linux_compiler_sanitizers.yml

This file was deleted.

5 changes: 4 additions & 1 deletion numpy/_core/src/umath/ufunc_object.c
Original file line number Diff line number Diff line change
Expand Up @@ -5941,7 +5941,6 @@ ufunc_at(PyUFuncObject *ufunc, PyObject *args)
NPY_AUXDATA_FREE(auxdata);

Py_XDECREF(op2_array);
Py_XDECREF(iter);
Py_XDECREF(iter2);
for (int i = 0; i < nop; i++) {
Py_XDECREF(operation_descrs[i]);
Expand All @@ -5957,9 +5956,13 @@ ufunc_at(PyUFuncObject *ufunc, PyObject *args)
if (PyArray_FLAGS(op1_array) & NPY_ARRAY_WRITEBACKIFCOPY) {
PyArray_DiscardWritebackIfCopy(op1_array);
}
// iter might own the last refrence to op1_array,
// so it must be decref'd second
Py_XDECREF(iter);
return NULL;
}
else {
Py_XDECREF(iter);
Py_RETURN_NONE;
}
}
Expand Down
26 changes: 0 additions & 26 deletions numpy/_core/tests/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,32 +590,6 @@ def test_too_many_advanced_indices(self, index, num, original_ndim):
with pytest.raises(IndexError):
arr[(index,) * num] = 1.

@pytest.mark.skipif(IS_WASM, reason="no threading")
def test_structured_advanced_indexing(self):
# Test that copyswap(n) used by integer array indexing is threadsafe
# for structured datatypes, see gh-15387. This test can behave randomly.
from concurrent.futures import ThreadPoolExecutor

# Create a deeply nested dtype to make a failure more likely:
dt = np.dtype([("", "f8")])
dt = np.dtype([("", dt)] * 2)
dt = np.dtype([("", dt)] * 2)
# The array should be large enough to likely run into threading issues
arr = np.random.uniform(size=(6000, 8)).view(dt)[:, 0]

rng = np.random.default_rng()

def func(arr):
indx = rng.integers(0, len(arr), size=6000, dtype=np.intp)
arr[indx]

tpe = ThreadPoolExecutor(max_workers=8)
futures = [tpe.submit(func, arr) for _ in range(10)]
for f in futures:
f.result()

assert arr.dtype is dt

def test_nontuple_ndindex(self):
a = np.arange(25).reshape((5, 5))
assert_equal(a[[0, 1]], np.array([a[0], a[1]]))
Expand Down
89 changes: 89 additions & 0 deletions numpy/_core/tests/test_multithreading.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import concurrent.futures
import threading
import string

import numpy as np
import pytest
Expand Down Expand Up @@ -165,3 +167,90 @@ def closure(b):
x = np.repeat(x0, 2, axis=0)[::2]

run_threaded(closure, max_workers=10, pass_barrier=True)


def test_structured_advanced_indexing():
# Test that copyswap(n) used by integer array indexing is threadsafe
# for structured datatypes, see gh-15387. This test can behave randomly.

# Create a deeply nested dtype to make a failure more likely:
dt = np.dtype([("", "f8")])
dt = np.dtype([("", dt)] * 2)
dt = np.dtype([("", dt)] * 2)
# The array should be large enough to likely run into threading issues
arr = np.random.uniform(size=(6000, 8)).view(dt)[:, 0]

rng = np.random.default_rng()

def func(arr):
indx = rng.integers(0, len(arr), size=6000, dtype=np.intp)
arr[indx]

tpe = concurrent.futures.ThreadPoolExecutor(max_workers=8)
futures = [tpe.submit(func, arr) for _ in range(10)]
for f in futures:
f.result()

assert arr.dtype is dt


def test_structured_threadsafety2():
# Nonzero (and some other functions) should be threadsafe for
# structured datatypes, see gh-15387. This test can behave randomly.
from concurrent.futures import ThreadPoolExecutor

# Create a deeply nested dtype to make a failure more likely:
dt = np.dtype([("", "f8")])
dt = np.dtype([("", dt)])
dt = np.dtype([("", dt)] * 2)
# The array should be large enough to likely run into threading issues
arr = np.random.uniform(size=(5000, 4)).view(dt)[:, 0]

def func(arr):
arr.nonzero()

tpe = ThreadPoolExecutor(max_workers=8)
futures = [tpe.submit(func, arr) for _ in range(10)]
for f in futures:
f.result()

assert arr.dtype is dt


def test_stringdtype_multithreaded_access_and_mutation(
dtype, random_string_list):
# this test uses an RNG and may crash or cause deadlocks if there is a
# threading bug
rng = np.random.default_rng(0x4D3D3D3)

chars = list(string.ascii_letters + string.digits)
chars = np.array(chars, dtype="U1")
ret = rng.choice(chars, size=100 * 10, replace=True)
random_string_list = ret.view("U100")

def func(arr):
rnd = rng.random()
# either write to random locations in the array, compute a ufunc, or
# re-initialize the array
if rnd < 0.25:
num = np.random.randint(0, arr.size)
arr[num] = arr[num] + "hello"
elif rnd < 0.5:
if rnd < 0.375:
np.add(arr, arr)
else:
np.add(arr, arr, out=arr)
elif rnd < 0.75:
if rnd < 0.875:
np.multiply(arr, np.int64(2))
else:
np.multiply(arr, np.int64(2), out=arr)
else:
arr[:] = random_string_list

with concurrent.futures.ThreadPoolExecutor(max_workers=8) as tpe:
arr = np.array(random_string_list, dtype=dtype)
futures = [tpe.submit(func, arr) for _ in range(500)]

for f in futures:
f.result()
2 changes: 0 additions & 2 deletions numpy/_core/tests/test_nep50_promotions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
"""

import operator
import threading
import warnings

import numpy as np

Expand Down
23 changes: 0 additions & 23 deletions numpy/_core/tests/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -1956,29 +1956,6 @@ def __bool__(self):
a = np.array([[ThrowsAfter(15)]] * 10)
assert_raises(ValueError, np.nonzero, a)

@pytest.mark.skipif(IS_WASM, reason="wasm doesn't have threads")
def test_structured_threadsafety(self):
# Nonzero (and some other functions) should be threadsafe for
# structured datatypes, see gh-15387. This test can behave randomly.
from concurrent.futures import ThreadPoolExecutor

# Create a deeply nested dtype to make a failure more likely:
dt = np.dtype([("", "f8")])
dt = np.dtype([("", dt)])
dt = np.dtype([("", dt)] * 2)
# The array should be large enough to likely run into threading issues
arr = np.random.uniform(size=(5000, 4)).view(dt)[:, 0]

def func(arr):
arr.nonzero()

tpe = ThreadPoolExecutor(max_workers=8)
futures = [tpe.submit(func, arr) for _ in range(10)]
for f in futures:
f.result()

assert arr.dtype is dt


class TestIndex:
def test_boolean(self):
Expand Down
Loading
Loading