r1.15.5-deeprec2212

liutongxuan released this 24 Jan 11:25

· 346 commits to main since this release

r1.15.5-deeprec2212

Major Features and Improvements

Embedding

Refactor GPU Embedding Variable storage layer.
Remove TENSORFLOW_USE_GPU_EV macro from embedding storage layer.
Refactor KvResourceGather GPU Op.
Add embedding memory pool for HBM storage of EmbeddingVariable.
Refine the code HBM storage of EmbeddingVariable.
Reuse the embedding files on SSD generated by EmbeddingVariable when save and restore checkpoint.
Integrate single HBM EV into multi_tier EmbeddingVariable.

Graph & Grappler Optimization

Filter out the 'stream_id' attribute in arithmetic optimizer.
Add SimplifyEmbeddingLookupStage optimizer.
Add ForwardBackwardJointOptimizationPass to eliminate duplicate hash in Gather and Apply ops for Embedding Variable.

Runtime Optimization

Add allocators for each stream_executor in multi-context mode.
Set multi-gpu devices in session_group mode.
Add blacklist and whitelist to JitCugraph.
Optimize CPU EVAllocator to speedup EmbeddingVariable performance.
Support independent GPU host allocator for each session.
Add GPU EVAllocator to speedup EmbeddingVariable on GPU.

Ops & Hardware Acceleration

Add GPU implementation for Unique.
Support indices type with DT_INT64 in sparse segment ops.
Add list of gradient implementation for the following ops including SplitV, ConcatV2, BroadcastTo, Tile, GatherV2, Cumsum, Cast.
Add C++ gradient op for Select.
Add gradient implementation for SelectV2.
Add C++ gradient op for Atan2.
Add C++ gradients for UnsortedSegmentMin/Max/Sum.
Refactor KvSparseApplyAdagrad GPU Op.
Merge NV-TF r1.15.5+22.12.

Distributed

Update seastar to control SDT by macro HAVE_SDT.
Update WORKER_DEFAULT_CORE_NUM(8) and PS_EFAULT_CORE_NUM(2) default values.

Serving

Support multi-model deployment in SessionGroup.
Support user setup cpu-sets for each session_group.
Support processor to load multi-models.
Support GPU compilation in processor.
Optimize independent GPU host allocator for each session.

Environment & Build

Update systemtap to a valid source address.
Support DeepRec's ABI compatible with TensorFlow 1.15 by configure TF_API_COMPATIBLE_1150.
Upgrade base docker images based on ubuntu20.04 and python3.8.10.
Update pcre-8.44 urls.
Remove systemtap from third party and related dependency.
Enable gcc optimization option -O3 by default.

BugFix

Fix function definition issue in processor.
Fix the hang when insert item into lockless hash map.
Fix EmbeddingVariable hang/coredump in GPU mode.
Fix memory leak in CUDA multi-stream when merge compute and copy stream.
Fix wrong session devices order.
Fix hwloc build error on alinux3.
Fix double clear resource_mgr bug when use SessionGroup.
Fix wrong Shrink causes unit tests to fail randomly.
Fix the conflict when the EmbeddingVariable and embedding fusion is enabled simultaneously.
Fix EmbeddingVarGPU coredump in destructor.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2212-cpu-py38-ubuntu20.04

GPU Image

alideeprec/deeprec-release:deeprec2212-gpu-py38-cu116-ubuntu20.04

Assets 2