Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sequential Recommendation models #543

Closed
hieuddo opened this issue Nov 1, 2023 · 14 comments
Closed

Add Sequential Recommendation models #543

hieuddo opened this issue Nov 1, 2023 · 14 comments
Assignees

Comments

@hieuddo
Copy link
Member

hieuddo commented Nov 1, 2023

Description

Most of the currently supported models in cornac are categorized as general recommenders. Recently, sequential recommendations have gained more and more attention (e.g., the most popular topic in RecSys'23). It would be nice if cornac extends to adopt some more recommendation tasks, especially sequential/session-based recommendation (next item(s), next-basket).

Expected behavior with the suggested feature

  • Add generic data pipeline (i.e., parser, eval_method, evaluation) for next-basket, next-itemS, and next-item recommendations.
  • Add some fundamental and popular models (e.g., RNN, GRU4Rec).
  • Adopt general recommender models to sequential context, for example:
    • kNN: nearest items to current session's items
    • BPR: aggregated current session's items (e.g., avg, weighted) as "user" representation
@hieuddo
Copy link
Member Author

hieuddo commented Nov 1, 2023

I, myself, will try to integrate GRU4Rec if this idea aligns with cornac's scope.

Interesting note: it's important to implement the algorithms fully and correctly. Recently, the GRU4Rec's authors assessed some re-implementations and found out most (if not all) of them are partially flawed or missing some key features, even in RecSys's endorsed frameworks, like Microsoft's Recommenders and RecPack.
Reference: The Effect of Third Party Implementations on Reproducibility

@tqtg
Copy link
Member

tqtg commented Nov 1, 2023

This is awesome!
We were thinking about the family of sequential models when Trong was still with us, though we didn't have enough capacity to put them inside Cornac. If you're interested in doing this, let's chat more and see how we can organize Cornac to better support the models. I believe this will be a big enough change, together with graph-based models, to release Cornac version 2.

@lthoang
Copy link
Member

lthoang commented Nov 2, 2023

For next-basket recommendation task, I found this interesting paper "A Next Basket Recommendation Reality Check", specifying some basic baselines as well as how to evaluate NBR models thoroughly.
Source code: https://github.com/liming-7/A-Next-Basket-Recommendation-Reality-Check

@hieuddo
Copy link
Member Author

hieuddo commented Nov 2, 2023

Some more references:

Two sequential recommendation frameworks endorsed by ACMRecSys:

Frameworks from some published papers, e.g:

Let's take some time and later discuss our pipeline for generic next-basket/item(s) tasks.

@tqtg
Copy link
Member

tqtg commented Nov 2, 2023

A few questions to start with:

  1. How to load data? We support UIRT data format in Reader to deal with timestamp. Do we need more than that?
  2. How to implement training loop? We have user_iter and item_iter in Dataset. Also, we can retrieve chrono_user_data/chrono_item_data for for training/evaluation.
  3. How to evaluate model performance? I suppose we still follow standard ranking evaluation scheme? Do we have additional approaches to do evaluation (maybe for next basket)? If yes, let's think through it with the current evaluation scheme in Cornac. I guess it might be easier to start with next item recommendation first.

@lthoang
Copy link
Member

lthoang commented Nov 3, 2023

Also noting that the current Dataset does not support manipulating repeating items for next-item/basket recommendation.

@tqtg
Copy link
Member

tqtg commented Nov 8, 2023

Let's have an option to keep interactions between a pair of user-item if timestamps provided.

@tqtg
Copy link
Member

tqtg commented Nov 18, 2023

Few things to note:

  • 'USIT' data format for sequential recs
  • 'UBIT' data format for basket recs
  • Consider using json to represent extras (e.g., order quantity, price) for each interaction in the basket data.

@lthoang @hieuddo

@lthoang
Copy link
Member

lthoang commented Dec 1, 2023

We should consider to support some augmentation strategy (e.g., slide-window) for user to increase their training data.

@hieuddo hieuddo mentioned this issue Dec 8, 2023
6 tasks
@lthoang
Copy link
Member

lthoang commented Dec 18, 2023

We are currently consider the last item in sequence as the target test instance. For example, for a sequence a b c d, the first 3 items a b c are the inputs and the last item d is considered as output. Eventually, the total number of test instances are equivalent to the total number of test sequences.

Looking at the source code of GRU4Rec https://github.com/hidasib/GRU4Rec_PyTorch_Official/, I find that they consider every next items as ground truth for evaluation. For example, for a test sequence a b c d, the test ground truth are b, c, d. The respecting inputs are a, a b, a b c or just b, c, d for GRU4Rec.

Should we also support the above scenario? @tqtg @hieuddo

@lthoang
Copy link
Member

lthoang commented Dec 22, 2023

For user_based evaluation, take HGRU4Rec https://arxiv.org/pdf/1706.04148.pdf as example, beside user_idx, it also need the user's sessions (sorted chronologically) for constructing user hidden factors passing through sessions.

In https://github.com/mquad/hgru4rec/, although every user is initialize with zeros vector. The history sequences definitely affect the final representation of user vector.

@tqtg
Copy link
Member

tqtg commented Dec 24, 2023

We are currently consider the last item in sequence as the target test instance. For example, for a sequence a b c d, the first 3 items a b c are the inputs and the last item d is considered as output. Eventually, the total number of test instances are equivalent to the total number of test sequences.

Looking at the source code of GRU4Rec https://github.com/hidasib/GRU4Rec_PyTorch_Official/, I find that they consider every next items as ground truth for evaluation. For example, for a test sequence a b c d, the test ground truth are b, c, d. The respecting inputs are a, a b, a b c or just b, c, d for GRU4Rec.

Should we also support the above scenario? @tqtg @hieuddo

Yes, we should definitely support this. Do we already have a solution?

@tqtg
Copy link
Member

tqtg commented Dec 24, 2023

@lthoang Let's create different issues/features for your suggestions raised above. We will try to address them separately from this general feature.

@lthoang
Copy link
Member

lthoang commented Dec 25, 2023

We are currently consider the last item in sequence as the target test instance. For example, for a sequence a b c d, the first 3 items a b c are the inputs and the last item d is considered as output. Eventually, the total number of test instances are equivalent to the total number of test sequences.

Looking at the source code of GRU4Rec https://github.com/hidasib/GRU4Rec_PyTorch_Official/, I find that they consider every next items as ground truth for evaluation. For example, for a test sequence a b c d, the test ground truth are b, c, d. The respecting inputs are a, a b, a b c or just b, c, d for GRU4Rec.

Should we also support the above scenario? @tqtg @hieuddo

For user_based evaluation, take HGRU4Rec https://arxiv.org/pdf/1706.04148.pdf as example, beside user_idx, it also need the user's sessions (sorted chronologically) for constructing user hidden factors passing through sessions.

In https://github.com/mquad/hgru4rec/, although every user is initialize with zeros vector. The history sequences definitely affect the final representation of user vector.

Let's move these two into new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants