Build: Link CUDA runtime library statically by default#3867
Build: Link CUDA runtime library statically by default#3867maxhgerlach merged 3 commits intomasterfrom
Conversation
- setting is only effective with CMake 3.17+ (and there is no Horovod build environment variable to control it at this time); with CMake 3.13 the CUDA runtime library is always linked statically on my system - related code in add_cuda macro that adds libraries may not actually have any effect - added warning if dynamic CUDA RT is combined with static NCCL (default) as this can lead to NCCL error "Cuda failure 'CUDA driver is a stub library'" - added CMake status message printing full NCCL_VERSION_CODE (e.g., 21505) - made build environment variable HOROVOD_NCCL_LINK case-insensitive Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
|
Thanks for picking this up, @maxhgerlach ! |
|
Linker observation on master (with -> This adds both dynamic -- The same observation for the CI run for this PR (with -> This adds -- On CMake 3.13 (the minimum currently supported by Horovod) it doesn't seem to be possible to use I think the code in |
…aneous and on CMake < 3.17 may contradict the setting imposed by `enable_language(CUDA)` Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Unit Test Results 1 047 files - 70 1 047 suites - 70 12h 55m 55s ⏱️ - 48m 39s Results for commit da3c5c9. ± Comparison against base commit b1d0ce8. This pull request skips 62 tests.♻️ This comment has been updated with latest results. |
Unit Test Results (with flaky tests) 1 210 files - 21 1 210 suites - 21 14h 22m 22s ⏱️ + 1m 20s For more details on these failures, see this check. Results for commit da3c5c9. ± Comparison against base commit b1d0ce8. This pull request skips 62 tests.♻️ This comment has been updated with latest results. |
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
There was a problem hiding this comment.
@maxhgerlach Thanks a lot for investigating the cudart/nccl linking and making the changes!
Open question: Should we add a build environment variable HOROVOD_CUDA_RUNTIME_LIBRARY so users can easily change this setting when pip installing Horovod? Caveat is that it would only be effective with CMake 3.17+.
That is a good question. Since it would only work for specific versions of cmake, I would not worry about it for now except if we have a real use case for it.
romerojosh
left a comment
There was a problem hiding this comment.
LGTM! Thanks for picking this up @maxhgerlach.
|
Thanks for the quick review!
OK, if we don't see a pressing use case for linking the shared runtime library now, we can leave it as it is. If we want to introduce the option eventually, it might make sense to just bump the CMake requirement to 3.17. -- For completeness: After the last changes on this PR, the library appears only one time in the linker command line: |
Checklist before submitting
Description
This change follows the discussion in #3831 to avoid the NCCL error "Cuda failure 'CUDA driver is a stub library'".
CMAKE_CUDA_RUNTIME_LIBRARYsetting is only effective with CMake 3.17+ (and there is no Horovod build environment variable to control it at this time); with CMake 3.13 the CUDA runtime library is always linked statically on my system. https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_RUNTIME_LIBRARY.htmlRelated code in
add_cudamacro that is meant to explicitly add libraries depending onCMAKE_CUDA_RUNTIME_LIBRARYmay not actually have the intended effect (horovod/CMakeLists.txt
Lines 183 to 188 in b1d0ce8
LINKER_LIBSthere. Edit: Removed this in follow-up commit, see below.Open question: Should we add a build environment variable
HOROVOD_CUDA_RUNTIME_LIBRARYso users can easily change this setting whenpipinstalling Horovod? Caveat is that it would only be effective with CMake 3.17+.Fixes #3831, and hopefully should also help with CUDA 11.8 build problems seen in #3845