Releases
v0.28.0
v0.28.0: Keras 2.11+ optimizers, faster reducescatter, fixes for latest TensorFlow, CUDA, NCCL
Compare
Sorry, something went wrong.
No results found
Added
TensorFlow: Added new get_local_and_global_gradients to PartialDistributedGradientTape to retrieve local and non-local gradients separately. (#3859 )
Changed
Improved reducescatter performance by allocating output tensors before enqueuing the operation. (#3824 )
TensorFlow: Ensured that tf.logical_and within allreduce tf.cond runs on CPU. (#3885 )
TensorFlow: Added support for Keras 2.11+ optimizers. (#3860 )
CUDA_VISIBLE_DEVICES environment variable is no longer passed to remote nodes. (#3865 )
Fixed
Fixed build with ROCm. (#3839 , #3848 )
Fixed build of Docker image horovod-nvtabular. (#3851 )
Fixed linking recent NCCL by defaulting CUDA runtime library linkage to static and ensuring that weak symbols are overridden. (#3867 , #3846 )
Fixed compatibility with TensorFlow 2.12 and recent nightly versions. (#3864 , #3894 , #3906 , #3907 )
Fixed missing arguments of Keras allreduce function. (#3905 )
Updated with_device functions in MXNet and PyTorch to skip unnecessary cudaSetDevice calls. (#3912 )
You can’t perform that action at this time.