①A toy large model for recommender system based on LLaMA2, SASRec, and Meta's generative recommenders. ②Note and experiments of official implementation for Meta's generative recommenders.
Toy RecLM includes DIY model based on LLaMA2(HSTU) + SASRec prediction layer.
-
Version 1: Basic Implementation -- Combination of LLaMA2 and SASRec.
LLaMA2 model as backbone based on baby-llama2-chinese and SASRec(SASRec.pytorch) at the prediction part.
- Part 1: At first, we stack LLaMA2's Transformer Blocks. Note that since LLaMA2 uses decoder-only framework, it utilizes casual mask just the same as SASRec.
- Part 2.1: After Transformer Blocks, we implement prediction layer in SASRec. Specifically, we adopt an MF layer to predict the relevance of item
$i$ by sharing item embedding.
- Part 2.2: we generate an embedding by considering all actions of a user following SASrec.
-
Version 2. actions-speak-louder-than-words's modification for Model -- Hierarchical Sequential Transduction Unit (HSTU)
Note that HSTU adopts a new pointwise aggregated attention mechanism instead of softmax attention in Transformers. (Just as in Deep Interest Network).
-
HSTU
-
We convert each user sequence (excluding the last action)
$$
(\mathcal{S}{1}^{u},\mathcal{S}{2}^{u},\cdots,\mathcal{S}_{|\mathcal{S}^{u}|-1}^{u})
$$
to a fixed length sequence
actions-speak-louder-than-words's design for Matching
Input is dataset of samples of user historical behavior sequences as follows.
<user_1 profile> <item_1 id features> ... <item_n id features>
<user_2 profile> <item_1 id features> ... <item_n id features>
Moreover, auxiliary time series tokens could be added into seqs above if available.
use the command to setup environment
# add conda-forge
conda config --add channels conda-forge
# install env
conda env create -f env.yml -n reclm
conda activate reclm
Predict
To train this model, use the command (set mode by $eval_only$ and choose backbone by $model_name$)
# LLaMA2 as backbone
torchrun --standalone --nproc_per_node=2 main.py --eval_only=false --model_name='llama'
# HSTU as backbone
torchrun --standalone --nproc_per_node=2 main.py --eval_only=false --model_name='hstu'
Use NDCG@10 and HR@10 to evaluate performance on whole dataset.
To evalutate this model by NDCG and hit ratio, use the command (set mode by $eval_only$ and load checkpoint by $ckpt_name$, and choose backbone by $model_name$)
# LLaMA2 as backbone
torchrun --standalone --nproc_per_node=2 main.py --eval_only=true --ckpt_name='epoch_15.pth' --model_name='llama'
# HSTU as backbone
torchrun --standalone --nproc_per_node=2 main.py --eval_only=true --ckpt_name='epoch_15.pth' --model_name='hstu'
About evaluation function, please refer to https://pmixer.github.io/posts/Argsort.
Dataset:
To accelerate by DeepSpeed, use the command
# pip install deepspeed
# recommended local compilation
git clone https://github.com/microsoft/DeepSpeed/
cd DeepSpeed
rm -rf build
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install . \
--global-option="build_ext" --global-option="-j8" --no-cache -v \
--disable-pip-version-check 2>&1 | tee build.log
For resource configuration of multi-node, we need generate ssh-key by ssh-keygen and pass key to other nodes by ssh-copy-id. Here we have one node with 2 gpus(NVIDIA RTX A6000). Still, we set multi-node configuration for guidance and in this case host just need to communicate with itself.
- First generate ssh-key
ssh-keygen
- Write ~/.ssh/config to get nickname and identity file of host and as follows.
Host host1
User guyr
Hostname 127.0.0.1
port 22
IdentityFile ~/.ssh/id_rsa
- Write hostfile for deepspeed to get multi-node.
host1 slots=2
- Use ssh-copy-id to copy identity file to other nodes
ssh-copy-id -i ~/.ssh/id_rsa host1
# test
ssh host1
To successfully run deepspeed, we need to add local_rank & deepspeed parameters in argparser, since deepspeed will add these hyperparameters to each process when launching tasks.
# when use deepspeed
parser.add_argument("--local_rank", type=int, default=0)
parser.add_argument("--deepspeed", type=str, default="ds_config.json")
Finally run deepspeed using the command
deepspeed --hostfile ./hostfile --master_port 12345 --include="host1:0,1" main.py --eval_only=false --model_name='llama' --deepspeed ds_config.json
Add setup.py to set package configuration and required packages.
- Install Ubuntu docker, test installation using the command
docker run hello-world
- Add Dockerfile
Add Dockerfile to build docker env. Specifically, we use the command $ pip install -e .[deepspeed] to build env by setup.py in path './' and make this package editable. Use the command
# build docker
docker build -t toyreclm .
- Add gpu configuration to docker
Use the command to add nvidia-container-script
sudo sh nvidia-container-runtime-script.sh
sudo apt-get install nvidia-container-runtime
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# check
which nvidia-container-runtime # /usr/bin/nvidia-container-runtime
- Run docker
Use the command to run docker container
# run docker with container name $reclm$ and volume corresponding folders
docker run -it --gpus all --name reclm -e CONTAINER_NAME=reclm -v ./data:/app/data toyreclm /bin/bash
# # to copy data files into container
# docker cp ./data $CONTAINER_NAME:/app/
# check container name
echo $CONTAINER_NAME
# run ddp
torchrun --standalone --nproc_per_node=2 main.py --eval_only=false --model_name='llama'
WHY NOT DEEPSPEED?
Install SSH is not recommended in docker since it conflicts with the concept of docker that each container runs only one process.
-
[2024.4.4]: Support Deepspeed.
-
[2024.4.7]: Support setuptools and Docker.
Facebookresearch updates official implementation of generative recommenders, here to analyze and carry out additional experiments for better comprehension. This part is placed at the folder './analysis'.
├─analysis
├──note
│ ├───details.md
│ └───neural-retrieval-accelerator.md
├──exp
└──src
In details.md, we discuss the details of the generative-recommender, e.g.,
- data preprocess
- training process
- model architecture
Notably, this work inherits many ideas from Revisiting Neural Retrieval on Accelerators, and these work are from same authors.
So, in neural-retrieval-accelerator.md, we discuss the details of this related work.
- LM new features
- parameter-efficient fine-tuning
- quantization
- model size
- hyper-parameter