-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describing the full end to end pipeline #9
Comments
Dear @kondilidisn , Thanks for being interested in this work! I apologize that we did not make these points you mentioned clear in the paper. For Q1, Q2, Q7 and Q8:First, although the task is named conversational recommendation (following REDIAL authors), it really comes down to two separate parts when using existing automatic evaluation metrics. Based on this, we evaluate the two parts separately as in Table 2 and Table 3 (as indicated in the first sentence of table captions), and leaving devising new evaluation metrics for joint performance for future work. Second, it is important to note that the proposed recommender system does consider the conversation part by utilizing entities in dialog contents, although it ignores the dialog model in this work. It's also worth mentioning that the entity linking module should be viewed as part of the dialog system (as shown in Figure 1), which enables the possibility of adopting many knowledge aware dialog models. In contrast, the conversational model depends on the representation provided by the recommender, which is why the recommender can and must be trained first. Now let's regard these four questions: Q3Sorry for missing this info in the paper... It is calculated as Q4We tried identifying entities on the fly. However, it had high latency (perhaps because it is web-based) and became the bottleneck of the training process, which is why we cached and saved Q5, Q6We did not perform sentiment analysis. On the one hand, our main objective in this work is to provide a general framework in which recommendation and conversation truly involves and improves the other. Deciding whether to use sentiment analysis, and how to use it properly both should be delegated to the recommender system and the dialog system, based on whether sentiment analysis will improve their performance. On the other hand, as ReDial treats sentiment analysis as an auxiliary task, and does not shown its contribution to the two main tasks, we believe whether and how to add sentiment analysis to improve the whole system is still an open question and an interesting topic to follow. Best, |
Dear @qibinc, thank you very much for your thorough analysis, it was very helpful. I will close this Issue, as all my questions have been answered and I am leaving up to you to decide whether or not you want to display these questions and answers in any way. Thank you again for your contribution and for the time you took to explain me some details. Best, |
Dear authors. thank you very much for your contribution. I know you have improved the code structure but I am afraid it is still very hard for me to understand some method details.
I thought I should ask here, for anyone else having the same questions.
On Table 2 on the paper, presents the recommender system evaluation. If I understand correctly, you ignore the conversational part wile performing these experiments, so that you can properly compare only the recommender methods.
On Table 3, again you only evaluate the conversational part, ignoring the recommender task. In this case, you calculate the perplexity of the Ground Truth sentences and some of them may include UKN tokens, that might be predicted properly.
I do not understand what is the Dist-N metric. Is it the ratio of distinct N-grams divided by the total number of words produced by the model ? In that case, I would expect it to be greater than one, since the possible distinct N-grams are way more than the distinct 1-gram (distinct one words)
Regarding the big picture of the complete End-to-End model.
You identify Named entities on real time from the conversation or do you have a dictionary with all mentioned named entities mentioned at each utterance (Similarly to the ReDial authors) ?
Do you perform sentiment analysis and use it on your recommending module, or do you ignore the sentiment regarding the entities and only use them as an ordered "bag of words"?
If you perform sentiment analysis during the time of conversation, you only give the utterances that have been sent up to that time ?
You use the same Switching technique for joining the Conversational output space with the Recommending output space, like the ReDial authors. Does any of your results (maybe Table 3) present joint evaluation (recommending and NLG tasks)? If so, when you evaluate the token of some mentioned movie, do you evaluate if the specific movie was predicted, or do you simply evaluate if any movie was predicted, and use that as a correct NLG evaluation?
Figure 2, evaluates the recommending performance of the full End-to-End model or only the performance of the recommending method? If it is about the full End-to-End model, does the predicted recommended item needs to be on the same token position with the Ground Truth
one, or just mentioned anywhere on the generated response?
I hope my question will not be a lot of trouble, and will help more of us to better understand your work.
Thank you in advance for your time!
Best Regards,
Nikos.
The text was updated successfully, but these errors were encountered: