Skip to content

r1.15.5-deeprec2212

Compare
Choose a tag to compare
@liutongxuan liutongxuan released this 24 Jan 11:25
· 346 commits to main since this release

Major Features and Improvements

Embedding

  • Refactor GPU Embedding Variable storage layer.
  • Remove TENSORFLOW_USE_GPU_EV macro from embedding storage layer.
  • Refactor KvResourceGather GPU Op.
  • Add embedding memory pool for HBM storage of EmbeddingVariable.
  • Refine the code HBM storage of EmbeddingVariable.
  • Reuse the embedding files on SSD generated by EmbeddingVariable when save and restore checkpoint.
  • Integrate single HBM EV into multi_tier EmbeddingVariable.

Graph & Grappler Optimization

  • Filter out the 'stream_id' attribute in arithmetic optimizer.
  • Add SimplifyEmbeddingLookupStage optimizer.
  • Add ForwardBackwardJointOptimizationPass to eliminate duplicate hash in Gather and Apply ops for Embedding Variable.

Runtime Optimization

  • Add allocators for each stream_executor in multi-context mode.
  • Set multi-gpu devices in session_group mode.
  • Add blacklist and whitelist to JitCugraph.
  • Optimize CPU EVAllocator to speedup EmbeddingVariable performance.
  • Support independent GPU host allocator for each session.
  • Add GPU EVAllocator to speedup EmbeddingVariable on GPU.

Ops & Hardware Acceleration

  • Add GPU implementation for Unique.
  • Support indices type with DT_INT64 in sparse segment ops.
  • Add list of gradient implementation for the following ops including SplitV, ConcatV2, BroadcastTo, Tile, GatherV2, Cumsum, Cast.
  • Add C++ gradient op for Select.
  • Add gradient implementation for SelectV2.
  • Add C++ gradient op for Atan2.
  • Add C++ gradients for UnsortedSegmentMin/Max/Sum.
  • Refactor KvSparseApplyAdagrad GPU Op.
  • Merge NV-TF r1.15.5+22.12.

Distributed

  • Update seastar to control SDT by macro HAVE_SDT.
  • Update WORKER_DEFAULT_CORE_NUM(8) and PS_EFAULT_CORE_NUM(2) default values.

Serving

  • Support multi-model deployment in SessionGroup.
  • Support user setup cpu-sets for each session_group.
  • Support processor to load multi-models.
  • Support GPU compilation in processor.
  • Optimize independent GPU host allocator for each session.

Environment & Build

  • Update systemtap to a valid source address.
  • Support DeepRec's ABI compatible with TensorFlow 1.15 by configure TF_API_COMPATIBLE_1150.
  • Upgrade base docker images based on ubuntu20.04 and python3.8.10.
  • Update pcre-8.44 urls.
  • Remove systemtap from third party and related dependency.
  • Enable gcc optimization option -O3 by default.

BugFix

  • Fix function definition issue in processor.
  • Fix the hang when insert item into lockless hash map.
  • Fix EmbeddingVariable hang/coredump in GPU mode.
  • Fix memory leak in CUDA multi-stream when merge compute and copy stream.
  • Fix wrong session devices order.
  • Fix hwloc build error on alinux3.
  • Fix double clear resource_mgr bug when use SessionGroup.
  • Fix wrong Shrink causes unit tests to fail randomly.
  • Fix the conflict when the EmbeddingVariable and embedding fusion is enabled simultaneously.
  • Fix EmbeddingVarGPU coredump in destructor.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2212-cpu-py38-ubuntu20.04

GPU Image

alideeprec/deeprec-release:deeprec2212-gpu-py38-cu116-ubuntu20.04