{"id":395,"date":"2026-02-03T17:29:28","date_gmt":"2026-02-04T00:29:28","guid":{"rendered":"https:\/\/bloggf.dannf.org\/?p=395"},"modified":"2026-02-03T17:29:28","modified_gmt":"2026-02-04T00:29:28","slug":"building-pytorch-runtime-collisions","status":"publish","type":"post","link":"https:\/\/bloggf.dannf.org\/index.php\/2026\/02\/03\/building-pytorch-runtime-collisions\/","title":{"rendered":"Building PyTorch: Runtime Collisions"},"content":{"rendered":"\n<p>In this post, I\u2019ll walk through a couple of issues we ran into in the early stages of testing our PyTorch builds caused by runtime conflicts.<\/p>\n\n\n\n<p>I come from a classic Linux distribution background, where the developers put a lot of effort into avoiding duplication of runtime libraries. As much as possible, everything in the distribution uses the same C and C++ libraries as everything else. But general-purpose Python libraries don\u2019t have that luxury. They need to work across a wide range of distributions and versions, relying on only a prescribed version of the C and C++ standard libraries. And they also need to work <em>with each other<\/em>. Producing our own PyTorch wheels gave me the opportunity to learn first-hand about these types of issues, and what it takes to make a <code>(uv) pip install<\/code> that Just Works<sup>(TM)<\/sup>.\u00a0<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Segfault #1: OpenMP\/OpenBLAS<\/h1>\n\n\n\n<p>When we began validation on our initial set of PyTorch wheels, we were greeted with a segfault. Surprisingly, this happened before any test case had actually started &#8211; it was happening while initializing the test script itself:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ python .\/test\/run_test.py -h\n  import pkg_resources\nSegmentation fault (core dumped)<\/code><\/pre>\n\n\n\n<p>Which we further narrowed down to just:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ python -c 'import torch._dynamo'\nSegmentation fault (core dumped)<\/code><\/pre>\n\n\n\n<p>When other users online had experienced this, the most frequently cited cause was conflicting OpenMP runtimes. <a href=\"https:\/\/www.openmp.org\/\">OpenMP<\/a> defines an API &#8211; but there are multiple implementations. In the free software world, GCC\u2019s <code>libgomp<\/code> and LLVM\u2019s <code>libomp<\/code> are the most widely used. An application may use either one, but it mustn\u2019t use both. Well, our torch wheel was vendoring both!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ find guarded\/ | grep -e libgomp -e libomp\nguarded\/lib\/python3.13\/site-packages\/torch\/lib\/libgomp.so.1\nguarded\/lib\/python3.13\/site-packages\/torch.libs\/libomp-c3bfb36f.so<\/code><\/pre>\n\n\n\n<p>Smoking gun? Maybe.<\/p>\n\n\n\n<p>Upstream only vendors <code>libgomp<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ find upstream | grep -e libgomp -e libomp\nupstream\/lib\/python3.13\/site-packages\/torch\/lib\/libgomp.so.1<\/code><\/pre>\n\n\n\n<p>Smoking gun now?<\/p>\n\n\n\n<p>Not quite. Turns out we were building only against <code>libiomp-dev<\/code>. And using <code>LD_DEBUG=files<\/code>, we could see that only <code>libomp.so<\/code> was being initialized. <code>libgomp.so<\/code> was vendored, but not being accessed. So we dug deeper with <code>gdb<\/code>. A backtrace of the segfault showed us initializing the <code>optree<\/code> C++ module:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Thread 1 \"python\" received signal SIGSEGV, Segmentation fault.\n0x0000000000000000 in ?? ()\n(gdb) bt\n#0  0x0000000000000000 in ?? ()\n#1  0x00007ffff7ca1ed3 in __pthread_once_slow (once_control=0x7fff24bfeaf8 &lt;optree::GetCxxModule(std::optional&lt;pybind11::module_> const&amp;)::storage+8>, init_routine=0x7fff24ceb420 &lt;__once_proxy>)\n    at .\/nptl\/pthread_once.c:116\n#2  0x00007fff24b35274 in __gthread_once (__func=&lt;optimized out>, __once=0x7fff24bfeaf8 &lt;optree::GetCxxModule(std::optional&lt;pybind11::module_> const&amp;)::storage+8>)\n    at \/usr\/include\/x86_64-linux-gnu\/c++\/13\/bits\/gthr-default.h:700\n#3  std::call_once&lt;pybind11::gil_safe_call_once_and_store&lt;pybind11::module_>::call_once_and_store_result&lt;optree::GetCxxModule(const std::optional&lt;pybind11::module_>&amp;)::&lt;lambda()> >(optree::GetCxxModule(const std::optional&lt;pybind11::module_>&amp;)::&lt;lambda()>&amp;&amp;)::&lt;lambda()> >(std::once_flag &amp;, struct {...} &amp;&amp;) (__once=..., __f=...) at \/usr\/include\/c++\/13\/mutex:907\n#4  0x00007fff24b36d45 in pybind11::gil_safe_call_once_and_store&lt;pybind11::module_>::call_once_and_store_result&lt;optree::GetCxxModule(const std::optional&lt;pybind11::module_>&amp;)::&lt;lambda()> >(struct {...} &amp;&amp;) (this=this@entry=0x7fff24bfeaf0 &lt;optree::GetCxxModule(std::optional&lt;pybind11::module_> const&amp;)::storage>, fn=...)\n    at \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/pybind11\/include\/pybind11\/gil_safe_call_once.h:61\n#5  0x00007fff24b36daf in optree::GetCxxModule (module=std::optional = {...}) at \/home\/dann-frazier\/optree\/src\/optree.cpp:37\n#6  0x00007fff24b37651 in optree::BuildModule (mod=...) at \/home\/dann-frazier\/optree\/src\/optree.cpp:45\n#7  0x00007fff24b3a061 in pybind11_init__C (mod=...) at \/home\/dann-frazier\/optree\/src\/optree.cpp:362\n#8  0x00007fff24b3a125 in pybind11_exec__C (pm=&lt;optimized out>) at \/home\/dann-frazier\/optree\/src\/optree.cpp:362\n#9  0x0000000001994256 in PyModule_ExecDef ()<\/code><\/pre>\n\n\n\n<p><code>optree<\/code> itself seemed unlikely to be the cause &#8211; the interpreter was just initializing this module and, for some reason, tried to execute code at address <code>0x0<\/code>. This suggested stack corruption. The backtraces of the other threads were more illuminating:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/<strong>libscipy_openblas64_-ff651d7f.so<\/strong>\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/<strong>libscipy_openblas64_-ff651d7f.so<\/strong>\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/<strong>libscipy_openblas64_-ff651d7f.so<\/strong>\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/libscipy_openblas64_-ff651d7f.so\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/libscipy_openblas64_-ff651d7f.so\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/libscipy_openblas64_-ff651d7f.so\n#5  0x00007ffedacf1b8b in blas_thread_server () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/numpy\/_core\/..\/..\/numpy.libs\/libscipy_openblas64_-ff651d7f.so\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/<strong>libopenblasp-r0-2eadb500.3.30.so<\/strong>\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/<strong>libopenblasp-r0-2eadb500.3.30.so<\/strong>\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/<strong>libopenblasp-r0-2eadb500.3.30.so<\/strong>\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/libopenblasp-r0-2eadb500.3.30.so\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/libopenblasp-r0-2eadb500.3.30.so\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/libopenblasp-r0-2eadb500.3.30.so\n#5  0x00007fffa25514fb in ?? () from \/home\/dann-frazier\/noomp\/lib\/python3.13\/site-packages\/torch\/lib\/..\/..\/torch.libs\/libopenblasp-r0-2eadb500.3.30.so<\/code><\/pre>\n\n\n\n<p>Interesting &#8211; 2 different implementations of <code>libopenblas<\/code> in the same runtime. One is vendored by <code>scipy<\/code>, and the other vendored by our torch build &#8211; perhaps they were running over each other. This didn\u2019t occur with upstream\u2019s build because they do not link w\/ OpenBLAS. Our linking against <code>OpenBLAS<\/code> was unintentional &#8211; our build environment happened to have it installed, and PyTorch opportunistically uses it if found. Removing it from our build environment resolved the issue, but uncovered a new one:<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Segfault #2: libstdc++<\/h1>\n\n\n\n<p>Our test case was still segfaulting, but we\u2019d moved forward in the initialization process:<br><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Thread 1 (Thread 0x7ffff7eb3740 (LWP 102148) \"python\"):\n#0  0x00007fffe98a0c24 in std::codecvt&lt;char16_t, char, __mbstate_t>::do_unshift(__mbstate_t&amp;, char*, char*, char*&amp;) const () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch.so\n#1  0x00007fffe9907d1d in std::basic_ostream&lt;char, std::char_traits&lt;char> >&amp; std::basic_ostream&lt;char, std::char_traits&lt;char> >::_M_insert&lt;long>(long) () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch.so\n#2  0x00007fffc68a126f in ?? () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch_cpu.so\n#3  0x00007fffc68a19f1 in c10::operator&lt;&lt;(std::basic_ostream&lt;char, std::char_traits&lt;char> >&amp;, c10::FunctionSchema const&amp;) () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch_cpu.so\n#4  0x00007fffd4fa2763 in ?? () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch_python.so\n#5  0x00007fffd4fa2ef5 in ?? () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch_python.so\n#6  0x00007fffd4a22915 in ?? () from \/home\/dann-frazier\/blas\/lib\/python3.13\/site-packages\/torch\/lib\/libtorch_python.so\n#7  0x0000000001819278 in cfunction_call ()\n#8  0x0000000001821765 in _PyEval_EvalFrameDefault ()\n#9  0x000000000188e695 in PyObject_Vectorcall ()\n#10 0x0000000001aa30d2 in call_attribute ()\n#11 0x0000000001be90ea in _Py_slot_tp_getattr_hook.cold ()\n#12 0x00000000018230f2 in _PyEval_EvalFrameDefault ()\n#13 0x00000000018bf494 in PyEval_EvalCode ()\n<\/code><\/pre>\n\n\n\n<p>These <code>std::<\/code> calls are all part of the C++ standard library. But initially, nothing looked odd about this call path. We ran a bunch of experiments to diagnose. We discovered:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It does not crash when we build it in a normal OS environment &#8211; only the builds from our manylinux_2_28 environment crash.<\/li>\n\n\n\n<li>It does not crash when we manually load <code>libstdc++.so.6<\/code> between <code>numpy<\/code> and <code>torch<\/code>:<br><\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>Python 3.13.7 (tags\/v3.13.7-0-gbcee1c3-dirty:bcee1c3, Sep 17 2025, 19:22:59) &#91;GCC 15.2.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import ctypes, os, sys\n>>> import numpy\n>>> ctypes.CDLL('\/usr\/lib\/libstdc++.so.6', os.RTLD_GLOBAL)\n&lt;CDLL '\/usr\/lib\/libstdc++.so.6', handle 5646acfeb3e0 at 0x7f2936881010>\n>>> import torch\n>>><\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Nor does it crash if we <code>LD_PRELOAD<\/code> <code>libstdc++.so.6<\/code><\/li>\n<\/ul>\n\n\n\n<p>What I hadn\u2019t noticed, but in retrospect is clear from the initial backtrace, is that we were <em>statically linking<\/em> libstdc++. Why?<\/p>\n\n\n\n<div class=\"wp-block-cover\"><span aria-hidden=\"true\" class=\"wp-block-cover__background has-background-dim\"><\/span><div class=\"wp-block-cover__inner-container is-layout-flow wp-block-cover-is-layout-flow\">\n<div class=\"wp-block-group has-global-padding is-layout-constrained wp-block-group-is-layout-constrained\">\n<p>Digression: one of the problems that \u201cmanylinux\u201d wheel variants are intended to solve is that the C++ ABI is constantly evolving. GCC is great about maintaining libstdc++ backwards compatibility. This lets users run decades old C++ binaries on the latest Linux distributions. But it does not provide <em>forward<\/em> compatibility. If you build a C++ application with GCC-15, it is not guaranteed to work with GCC-14\u2019s libstdc++. One solution here is simply to build with the lowest-common-denominator version of GCC. But there are problems with that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>manylinux_2_28 is our target compatibility level, and it restricts C++ applications to the ABI of GCC 8. GCC 8 came out in 2018 &#8211; nearly 10 years ago.<\/li>\n\n\n\n<li>Packages may use compiler features that are only available in newer versions of GCC. If we tried to build it all with GCC 8, there would be a lot of things we could not build.<\/li>\n\n\n\n<li>Compilers have gained a number of hardening features since 2018 &#8211; many of which Chainguard enables in our toolchain by default. Building with older compilers would throw that away.<\/li>\n<\/ul>\n<\/div>\n<\/div><\/div>\n\n\n\n<p>Early on, we had considered different approaches for . It ultimately came down to 2 options:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The manylinux community provides reference build container images that can build manylinux_2_28-compatible wheels with modern versions of GCC. They use a clever system by which they make use of the system libstdc++ for the GCC-8-era symbols, but they statically link any newer symbols the binary needs into the binary itself. However, this requires a heavily patched toolchain, while we strive to ship unmodified upstream releases as much as possible.<\/li>\n\n\n\n<li>Statically link all of libstdc++. If every C++ library came with a copy of the libstdc++ version it built against, we\u2019d gain independence from the system libstdc++. Of course, we were aware that came with tradeoffs. Our libraries would be larger, and a security issue in libstdc++ itself would require a massive rebuild of our wheels.<\/li>\n<\/ol>\n\n\n\n<p>We opted for option #2. When building manylinux binaries, we would pass `-static-libstdc++` into the build. Our segfaults were correlated with our static manylinux builds, and injecting a shared libstdc++ using <code>ctypes.CDLL()<\/code> or <code>LD_PRELOAD<\/code> would avoid it. Once again, using PyTorch as an early test was going to prove educational.<\/p>\n\n\n\n<p>Our hypothesis was that it simply isn\u2019t safe to mix multiple libstdc++ implementations in the same runtime environment &#8211; regardless of how they were linked. To test this, I sat down with Claude code and had it generate a simple test tool. I asked it to build 2 simple C++ modules that could be loaded into the same application, where either one or both could be configured to statically link libstdc++. While I thought this would just serve as initial scaffolding for developing a reproducer, it turned out that just printing status messages from those modules was sufficient to reproduce:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Python 3.13.7 (tags\/v3.13.7-0-gbcee1c3-dirty:bcee1c3, Sep 17 2025, 19:22:59) &#91;GCC 15.2.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import ctypes, os, sys\n>>> import numpy\n>>> ctypes.CDLL('\/usr\/lib\/libstdc++.so.6', os.RTLD_GLOBAL)\n&lt;CDLL '\/usr\/lib\/libstdc++.so.6', handle 5646acfeb3e0 at 0x7f2936881010>\n>>> import torch\n>>><\/code><\/pre>\n\n\n\n<p>What we learned is that libstdc++ has global structures that must be managed by a single runtime context. Initializing a 2nd runtime will corrupt the structures of the original. Statically linking libstdc++ may be OK if only one static copy is loaded, but it is a recipe for disaster to load multiple copies into a single application. Statically linking a full libstdc++ library into each individual C++ Python wheel was not going to work &#8211; we would need to fallback to the more complex option of maintaining patches against the upstream toolchain, which we have done using the patchset provided by the manylinux toolset maintainers as a basis.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n\n\n\n<p>PyTorch was a complicated package to tackle early on in our native Python library work &#8211; but it likey ended up accelerating our development by providing early demonstrations of some of the pitfalls that might only be observed with non-trivial library stacks. These are complications that many of us with traditional Linux distribution backgrounds may take for granted.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, I\u2019ll walk through a couple of issues we ran into in the early stages of testing our PyTorch builds caused by runtime conflicts. I come from a classic Linux distribution background, where the developers put a lot of effort into avoiding duplication of runtime libraries. As much as possible, everything in the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22],"tags":[19,20,21],"class_list":["post-395","post","type-post","status-publish","format-standard","hentry","category-software","tag-chainguard","tag-python","tag-pytorch"],"_links":{"self":[{"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/posts\/395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/comments?post=395"}],"version-history":[{"count":1,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/posts\/395\/revisions"}],"predecessor-version":[{"id":396,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/posts\/395\/revisions\/396"}],"wp:attachment":[{"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/media?parent=395"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/categories?post=395"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bloggf.dannf.org\/index.php\/wp-json\/wp\/v2\/tags?post=395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}