Releases: Gavin-Development/GavinBackendDatasetUtils
WordPiece
New Features
- Completely re worked Tokenizer now based on WordPiece
- More pythonic interface for Tokenizer
Important Info
- This build is considered stable enough for a minor release but is the first implimentation of the WordPiece algo. It wont be fast and probably not efficient but it should get the job done with little issue.
- GPU acceleration has been temporarily removed from the Tokenizer in this release due to the ground up rework.
- This build is Windows only and requires Intel python 3.9.15 or newer.
Latest Release
Quick Release with latest binaries x64 only.
Windows: .pyd File
Linux: .so File
Ensure these libraries are in your PYTHONPATH to allow them to be loaded by python.
A second Release it seems...
Gavin Backend Dataset Utils Release 28/07/2022 Intel LLVM compiler SYCL / CUDA support.
New Features
- BIN file class with methods for creatin, modifying & reading BIN files.
- Tokenizer class to create, build, save, load and use GPU accelerated BPE tokenization algo.
- Some performance tweaks?
Important Info
This build has been built with the Intel LLVM compiler allowing us to build the module with support for Intel GPUs & Nvidia GPUs via SYCL for GPU accelerated functions. This will have adverse effects on performance on AMD systems as the Intel compiler deliberately produces sub par code for AMD systems so be warned.
NOTE If you choose to use the new CUDA version, you will require the DLLs included in the zip file, if you choose to stick with the SYCL version, you will need the OneAPI toolkit installed from Intel (base toolkit) and need to use Intel python.
A release? I guess so...
Gavin Backend Dataset Utils Release 04/02/2022 VS 2022 preview build
Features
- BIN file format for storing tokenized data
- Multithreaded loading of BIN
- Singlethreaded loading of BIN
- Transcoding of old file format to BIN
- Data generator to stream data into RAM
- Old file type legacy support
Important info
This build was built using MSVCC with C++ std17 on VS 2022 preview, this build has passed basic testing and is ready for use. This build also contains uncompleted features which if used could result in instability. Refer to the included README to see what is and is not available in this build: README.md