Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add dmrl: Add DMRL Model #597

Merged
merged 51 commits into from
Mar 20, 2024
Merged

Conversation

mabeckers
Copy link
Contributor

Description

I added Disentangled Multimodal Representation Learning (https://arxiv.org/pdf/2203.05406.pdf) as the DMRL model to cornac.
In the context of this addition I had to:

  1. add the PWLearningSampler: A Sampler to be wrapped around PyTorche DataLoader class that does the necessary loading of the data in a way that DMRL training mechanism requires it (pairwise based ranking approach)
  2. add the TransformersTextModality (as described in the paper) to encode textual features into latent space
  3. add the TransformersVisionModality (as described in the paper) to encode visual features into latent space
  4. add the DistanceCorrelationCalculator which can be use to calculate the disentangled loss (as described in the paper)
  5. add dmrl and recom_dmrl as described by paper and cornac framework
  6. added tests for all above described modules
  7. had to modify BaseMethod to include TransformersTextModality as allowed testmodality
  8. TransformersVision Modality not yet included in BaseMehthod as not used in dmrl examples (preencoded cornac vision features were used). Please add if wanted.

Checklist:

  • I have added tests.
  • I have updated the documentation accordingly.
  • I have updated README.md (if you are adding a new model).
  • I have updated examples/README.md (if you are adding a new example).
  • I have updated datasets/README.md (if you are adding a new dataset).

@tqtg
Copy link
Member

tqtg commented Mar 12, 2024

Hi @mabeckers, thanks for the contribution. It's great to see DMRL being added into Cornac. However, there are a few things that we might need to reconsider. First, each model in Cornac is very self-contained and model dependencies should not be added as global requirements. This is to minimize the maintenance effort for the core functions, also to facilitate a wider-range of model implementation. With that said, there are two directions to proceed with the DMRL model:

  1. Considering the model taking in raw text and raw video. With this approach, we should bring the text/video transformer-based encoders as part of the model implementation. For example, CDL model has an autoencoder for text.
  2. Considering the model taking in text/video features. In this case, we should do text/video encoding separately (possibly as part of the example) prior to model training in Cornac. With the embedding ready, we simply employ the FeatureModality for either text/video. We can consider if additional VideoModality is needed or ImageModality could be used as an alternative for data input.

Hope that my explanation is clear enough. Happy to chat more.

@mabeckers
Copy link
Contributor Author

Hi @tqtg. thanks for taking a look at my PR. Yeah I notice that the two new transformer modalities introduced more general dependencies. I can go ahead and move those modules inside of recom_dmrl, that way the model receives basic text and image as input. I will make it general so that in case one already has encoded features from somewhere (say it comes with the example) the model can take that in as well and will not run another layer of encoding on top of that feature set.
Does that sound good to you?

Thanks,
Max

@tqtg
Copy link
Member

tqtg commented Mar 13, 2024

sounds good to me. Let's do that and see how it goes.

@mabeckers
Copy link
Contributor Author

Made the requested changes, please let me know if there's anything else I can change for this PR. Also remerged with the latest cornac master.
Thanks!

@tqtg
Copy link
Member

tqtg commented Mar 16, 2024

@mabeckers I did some changes to make the tests work and also refactoring. Please have a look and see if they make sense to you.

@darrylong darrylong added the models New models, changes to models label Mar 18, 2024
@mabeckers
Copy link
Contributor Author

Everything looks great to me!

@tqtg
Copy link
Member

tqtg commented Mar 18, 2024

Hey @mabeckers, there is something that we need to modify about the model input (text and image modalities). By design, we don't input the modalities directly to the model, but we input them to an evaluation method (e.g., RatioSplit). The reason is that the modalities will be aligned with user/item data splitting and user/item ID being mapped properly. Taking CDL model as an example, we input text modality to the RatioSplit eval method (here) and we can access the text modality inside the model implementation via the train_set (here). Can we work on this last change before we merge the model into Cornac?

@mabeckers
Copy link
Contributor Author

Hey @tqtg Yeah I understand that's how the cornac framework works with modalities, which is why until commit 9fc96b3 I had it that way and was feeding modalities from the outside to the RatioSplit Instance. I only changed it and moved them inside the model because you mentioned you didn't want to add any general dependencies (such as new TransformerModalities) to the cornac core but move that into the DMRL folder and have the model accept raw text and images. That's why I moved it out of RatioSplit. I am happy to reverse the commit back to the earlier version and introduce TransformerVisionModadality and TransformerTextModality as new general modality encoders. I can of course also just keep them in the DMRL folder and still use them as normal modalities and input to the RatioSplit instance. Just let me know which way you would prefer it.
Thanks!

@tqtg
Copy link
Member

tqtg commented Mar 18, 2024

My point is that you can reuse TextModality and ImageModality to hold the image/text corpus and input them into RatioSplit to perform data splitting. The only part we want to move inside model implementation is where we use Transformers to encode the raw data. Does that make sense to you?

@mabeckers
Copy link
Contributor Author

Ok I see. So if I am understanding you correctly you would want the example file running the DMRL example to look something like this?:

"""Example for Disentangled Multimodal Recommendation, with only feedback and textual modality.
For an example including image modality please see dmrl_clothes_example.py"""

import cornac
from cornac.data import Reader
from cornac.datasets import citeulike
from cornac.eval_methods import RatioSplit
from cornac.models.dmrl.recom_dmrl import TextModalityInput

The necessary data can be loaded as follows

docs, item_id_ordering_text = citeulike.load_text()
feedback = citeulike.load_feedback(reader=Reader(item_set=item_id_ordering_text))

text_modality_input = TextModalityInput(item_id_ordering_text, docs)

Instantiate DMRL recommender

dmrl_recommender = cornac.models.dmrl.DMRL(
batch_size=4096,
epochs=20,
log_metrics=False,
learning_rate=0.01,
num_factors=2,
decay_r=0.5,
decay_c=0.01,
num_neg=3,
embedding_dim=100,
text_features=text_modality_input)

NEW METHOD THAT HOLDS THE TRANSFORMER ENCODING WITHIN DMRL MODEL:

item_text_modality = dmrl_recommender.encode_text() # returns a generic feature modality (or even a TextModality) # where pre-encoded text is given in .features attribute and uses Transformer internally.

Define an evaluation method to split feedback into train and test sets

ratio_split = RatioSplit(
data=feedback,
test_size=0.2,
exclude_unknowns=True,
verbose=True,
seed=123,
rating_threshold=0.5,
item_text = item_text_modality
)

Use Recall@300 for evaluations

rec_300 = cornac.metrics.Recall(k=300)
prec_30 = cornac.metrics.Precision(k=30)

Put everything together into an experiment and run it

cornac.Experiment(eval_method=ratio_split, models=[dmrl_recommender], metrics=[prec_30, rec_300]).run()

@tqtg
Copy link
Member

tqtg commented Mar 19, 2024

@mabeckers I made some changes to illustrate my idea. Please have a look and let me know if they make sense to you. We can further refactor the code to remove some unused parts.

@mabeckers
Copy link
Contributor Author

mabeckers commented Mar 19, 2024

@tqtg Had to set preencode=True (that means to be pre-encoded as part of TransformersModality init, preencoded means it's already pre-encoded from outside), but other than that looks very good! I understand now what you meant. We use TextModality for data splitting and id mapping on outside and "overwrite" it on the inside with TransformerModalities. Just running some final checks then will commit! Thanks for showing me this way of doing it. Only downside here is that we call vectorizer.fit_transform(self.corpus) in _build_text() of the TextModality when all we want is _swap_text() ... so a little overhead but I'm fine doing it that way :)

@tqtg
Copy link
Member

tqtg commented Mar 19, 2024

@mabeckers OK, I though it should be encoded batch by batch during training thus preencode=False. Anw, please help check because I might misinterpret your implementation.

For the basic text transformation overhead, I'm aware of that and it might be an issue with a big text corpus. I was thinking of using tokenizer as the indicator whether we want to do any transformation or not. If the tokenizer is not provided during the initialization of the TextModality, we just bypass the _build_text() call. What do you think?

@mabeckers
Copy link
Contributor Author

yeah the tokenizer is a good idea. That would make sense. The TransformerModalities work both in batch as well as pre-encoding. Just using them as pre-encoding in my examples bc it makes runtime a lot faster. I just pushed my final changes and checks. Thanks

@tqtg
Copy link
Member

tqtg commented Mar 20, 2024

yeah the tokenizer is a good idea. That would make sense. The TransformerModalities work both in batch as well as pre-encoding. Just using them as pre-encoding in my examples bc it makes runtime a lot faster. I just pushed my final changes and checks. Thanks

Cool! This PR looks good to me. Let's merge it and have another one to update the TextModality. Thanks @mabeckers!

@tqtg tqtg merged commit 296d2d9 into PreferredAI:master Mar 20, 2024
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models New models, changes to models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants