In this post, I’ll walk through a couple of issues we ran into in the early stages of testing our PyTorch builds caused by runtime conflicts.
I come from a classic Linux distribution background, where the developers put a lot of effort into avoiding duplication of runtime libraries. As much as possible, everything in the distribution uses the same C and C++ libraries as everything else. But general-purpose Python libraries don’t have that luxury. They need to work across a wide range of distributions and versions, relying on only a prescribed version of the C and C++ standard libraries. And they also need to work with each other. Producing our own PyTorch wheels gave me the opportunity to learn first-hand about these types of issues, and what it takes to make a (uv) pip install that Just Works(TM).
Segfault #1: OpenMP/OpenBLAS
When we began validation on our initial set of PyTorch wheels, we were greeted with a segfault. Surprisingly, this happened before any test case had actually started – it was happening while initializing the test script itself:
$ python ./test/run_test.py -h
import pkg_resources
Segmentation fault (core dumped)
Which we further narrowed down to just:
$ python -c 'import torch._dynamo'
Segmentation fault (core dumped)
When other users online had experienced this, the most frequently cited cause was conflicting OpenMP runtimes. OpenMP defines an API – but there are multiple implementations. In the free software world, GCC’s libgomp and LLVM’s libomp are the most widely used. An application may use either one, but it mustn’t use both. Well, our torch wheel was vendoring both!
$ find guarded/ | grep -e libgomp -e libomp
guarded/lib/python3.13/site-packages/torch/lib/libgomp.so.1
guarded/lib/python3.13/site-packages/torch.libs/libomp-c3bfb36f.so
Smoking gun? Maybe.
Upstream only vendors libgomp:
$ find upstream | grep -e libgomp -e libomp
upstream/lib/python3.13/site-packages/torch/lib/libgomp.so.1
Smoking gun now?
Not quite. Turns out we were building only against libiomp-dev. And using LD_DEBUG=files, we could see that only libomp.so was being initialized. libgomp.so was vendored, but not being accessed. So we dug deeper with gdb. A backtrace of the segfault showed us initializing the optree C++ module:
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff7ca1ed3 in __pthread_once_slow (once_control=0x7fff24bfeaf8 <optree::GetCxxModule(std::optional<pybind11::module_> const&)::storage+8>, init_routine=0x7fff24ceb420 <__once_proxy>)
at ./nptl/pthread_once.c:116
#2 0x00007fff24b35274 in __gthread_once (__func=<optimized out>, __once=0x7fff24bfeaf8 <optree::GetCxxModule(std::optional<pybind11::module_> const&)::storage+8>)
at /usr/include/x86_64-linux-gnu/c++/13/bits/gthr-default.h:700
#3 std::call_once<pybind11::gil_safe_call_once_and_store<pybind11::module_>::call_once_and_store_result<optree::GetCxxModule(const std::optional<pybind11::module_>&)::<lambda()> >(optree::GetCxxModule(const std::optional<pybind11::module_>&)::<lambda()>&&)::<lambda()> >(std::once_flag &, struct {...} &&) (__once=..., __f=...) at /usr/include/c++/13/mutex:907
#4 0x00007fff24b36d45 in pybind11::gil_safe_call_once_and_store<pybind11::module_>::call_once_and_store_result<optree::GetCxxModule(const std::optional<pybind11::module_>&)::<lambda()> >(struct {...} &&) (this=this@entry=0x7fff24bfeaf0 <optree::GetCxxModule(std::optional<pybind11::module_> const&)::storage>, fn=...)
at /home/dann-frazier/noomp/lib/python3.13/site-packages/pybind11/include/pybind11/gil_safe_call_once.h:61
#5 0x00007fff24b36daf in optree::GetCxxModule (module=std::optional = {...}) at /home/dann-frazier/optree/src/optree.cpp:37
#6 0x00007fff24b37651 in optree::BuildModule (mod=...) at /home/dann-frazier/optree/src/optree.cpp:45
#7 0x00007fff24b3a061 in pybind11_init__C (mod=...) at /home/dann-frazier/optree/src/optree.cpp:362
#8 0x00007fff24b3a125 in pybind11_exec__C (pm=<optimized out>) at /home/dann-frazier/optree/src/optree.cpp:362
#9 0x0000000001994256 in PyModule_ExecDef ()
optree itself seemed unlikely to be the cause – the interpreter was just initializing this module and, for some reason, tried to execute code at address 0x0. This suggested stack corruption. The backtraces of the other threads were more illuminating:
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007ffedacf1b8b in blas_thread_server () from /home/dann-frazier/noomp/lib/python3.13/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
#5 0x00007fffa25514fb in ?? () from /home/dann-frazier/noomp/lib/python3.13/site-packages/torch/lib/../../torch.libs/libopenblasp-r0-2eadb500.3.30.so
Interesting – 2 different implementations of libopenblas in the same runtime. One is vendored by scipy, and the other vendored by our torch build – perhaps they were running over each other. This didn’t occur with upstream’s build because they do not link w/ OpenBLAS. Our linking against OpenBLAS was unintentional – our build environment happened to have it installed, and PyTorch opportunistically uses it if found. Removing it from our build environment resolved the issue, but uncovered a new one:
Segfault #2: libstdc++
Our test case was still segfaulting, but we’d moved forward in the initialization process:
Thread 1 (Thread 0x7ffff7eb3740 (LWP 102148) "python"):
#0 0x00007fffe98a0c24 in std::codecvt<char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&, char*, char*, char*&) const () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch.so
#1 0x00007fffe9907d1d in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<long>(long) () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch.so
#2 0x00007fffc68a126f in ?? () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so
#3 0x00007fffc68a19f1 in c10::operator<<(std::basic_ostream<char, std::char_traits<char> >&, c10::FunctionSchema const&) () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so
#4 0x00007fffd4fa2763 in ?? () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch_python.so
#5 0x00007fffd4fa2ef5 in ?? () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch_python.so
#6 0x00007fffd4a22915 in ?? () from /home/dann-frazier/blas/lib/python3.13/site-packages/torch/lib/libtorch_python.so
#7 0x0000000001819278 in cfunction_call ()
#8 0x0000000001821765 in _PyEval_EvalFrameDefault ()
#9 0x000000000188e695 in PyObject_Vectorcall ()
#10 0x0000000001aa30d2 in call_attribute ()
#11 0x0000000001be90ea in _Py_slot_tp_getattr_hook.cold ()
#12 0x00000000018230f2 in _PyEval_EvalFrameDefault ()
#13 0x00000000018bf494 in PyEval_EvalCode ()
These std:: calls are all part of the C++ standard library. But initially, nothing looked odd about this call path. We ran a bunch of experiments to diagnose. We discovered:
- It does not crash when we build it in a normal OS environment – only the builds from our manylinux_2_28 environment crash.
- It does not crash when we manually load
libstdc++.so.6betweennumpyandtorch:
Python 3.13.7 (tags/v3.13.7-0-gbcee1c3-dirty:bcee1c3, Sep 17 2025, 19:22:59) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes, os, sys
>>> import numpy
>>> ctypes.CDLL('/usr/lib/libstdc++.so.6', os.RTLD_GLOBAL)
<CDLL '/usr/lib/libstdc++.so.6', handle 5646acfeb3e0 at 0x7f2936881010>
>>> import torch
>>>
- Nor does it crash if we
LD_PRELOADlibstdc++.so.6
What I hadn’t noticed, but in retrospect is clear from the initial backtrace, is that we were statically linking libstdc++. Why?
Digression: one of the problems that “manylinux” wheel variants are intended to solve is that the C++ ABI is constantly evolving. GCC is great about maintaining libstdc++ backwards compatibility. This lets users run decades old C++ binaries on the latest Linux distributions. But it does not provide forward compatibility. If you build a C++ application with GCC-15, it is not guaranteed to work with GCC-14’s libstdc++. One solution here is simply to build with the lowest-common-denominator version of GCC. But there are problems with that:
- manylinux_2_28 is our target compatibility level, and it restricts C++ applications to the ABI of GCC 8. GCC 8 came out in 2018 – nearly 10 years ago.
- Packages may use compiler features that are only available in newer versions of GCC. If we tried to build it all with GCC 8, there would be a lot of things we could not build.
- Compilers have gained a number of hardening features since 2018 – many of which Chainguard enables in our toolchain by default. Building with older compilers would throw that away.
Early on, we had considered different approaches for . It ultimately came down to 2 options:
- The manylinux community provides reference build container images that can build manylinux_2_28-compatible wheels with modern versions of GCC. They use a clever system by which they make use of the system libstdc++ for the GCC-8-era symbols, but they statically link any newer symbols the binary needs into the binary itself. However, this requires a heavily patched toolchain, while we strive to ship unmodified upstream releases as much as possible.
- Statically link all of libstdc++. If every C++ library came with a copy of the libstdc++ version it built against, we’d gain independence from the system libstdc++. Of course, we were aware that came with tradeoffs. Our libraries would be larger, and a security issue in libstdc++ itself would require a massive rebuild of our wheels.
We opted for option #2. When building manylinux binaries, we would pass `-static-libstdc++` into the build. Our segfaults were correlated with our static manylinux builds, and injecting a shared libstdc++ using ctypes.CDLL() or LD_PRELOAD would avoid it. Once again, using PyTorch as an early test was going to prove educational.
Our hypothesis was that it simply isn’t safe to mix multiple libstdc++ implementations in the same runtime environment – regardless of how they were linked. To test this, I sat down with Claude code and had it generate a simple test tool. I asked it to build 2 simple C++ modules that could be loaded into the same application, where either one or both could be configured to statically link libstdc++. While I thought this would just serve as initial scaffolding for developing a reproducer, it turned out that just printing status messages from those modules was sufficient to reproduce:
Python 3.13.7 (tags/v3.13.7-0-gbcee1c3-dirty:bcee1c3, Sep 17 2025, 19:22:59) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes, os, sys
>>> import numpy
>>> ctypes.CDLL('/usr/lib/libstdc++.so.6', os.RTLD_GLOBAL)
<CDLL '/usr/lib/libstdc++.so.6', handle 5646acfeb3e0 at 0x7f2936881010>
>>> import torch
>>>
What we learned is that libstdc++ has global structures that must be managed by a single runtime context. Initializing a 2nd runtime will corrupt the structures of the original. Statically linking libstdc++ may be OK if only one static copy is loaded, but it is a recipe for disaster to load multiple copies into a single application. Statically linking a full libstdc++ library into each individual C++ Python wheel was not going to work – we would need to fallback to the more complex option of maintaining patches against the upstream toolchain, which we have done using the patchset provided by the manylinux toolset maintainers as a basis.
Conclusion
PyTorch was a complicated package to tackle early on in our native Python library work – but it likey ended up accelerating our development by providing early demonstrations of some of the pitfalls that might only be observed with non-trivial library stacks. These are complications that many of us with traditional Linux distribution backgrounds may take for granted.