Describing the full end to end pipeline #9

kondilidisn · 2019-11-18T14:14:43Z

Dear authors. thank you very much for your contribution. I know you have improved the code structure but I am afraid it is still very hard for me to understand some method details.

I thought I should ask here, for anyone else having the same questions.

On Table 2 on the paper, presents the recommender system evaluation. If I understand correctly, you ignore the conversational part wile performing these experiments, so that you can properly compare only the recommender methods.
On Table 3, again you only evaluate the conversational part, ignoring the recommender task. In this case, you calculate the perplexity of the Ground Truth sentences and some of them may include UKN tokens, that might be predicted properly.
I do not understand what is the Dist-N metric. Is it the ratio of distinct N-grams divided by the total number of words produced by the model ? In that case, I would expect it to be greater than one, since the possible distinct N-grams are way more than the distinct 1-gram (distinct one words)

Regarding the big picture of the complete End-to-End model.

You identify Named entities on real time from the conversation or do you have a dictionary with all mentioned named entities mentioned at each utterance (Similarly to the ReDial authors) ?
Do you perform sentiment analysis and use it on your recommending module, or do you ignore the sentiment regarding the entities and only use them as an ordered "bag of words"?
If you perform sentiment analysis during the time of conversation, you only give the utterances that have been sent up to that time ?
You use the same Switching technique for joining the Conversational output space with the Recommending output space, like the ReDial authors. Does any of your results (maybe Table 3) present joint evaluation (recommending and NLG tasks)? If so, when you evaluate the token of some mentioned movie, do you evaluate if the specific movie was predicted, or do you simply evaluate if any movie was predicted, and use that as a correct NLG evaluation?
Figure 2, evaluates the recommending performance of the full End-to-End model or only the performance of the recommending method? If it is about the full End-to-End model, does the predicted recommended item needs to be on the same token position with the Ground Truth
one, or just mentioned anywhere on the generated response?

I hope my question will not be a lot of trouble, and will help more of us to better understand your work.
Thank you in advance for your time!

Best Regards,
Nikos.

qibinc · 2019-11-21T09:07:47Z

Dear @kondilidisn ,

Thanks for being interested in this work! I apologize that we did not make these points you mentioned clear in the paper.

For Q1, Q2, Q7 and Q8:

First, although the task is named conversational recommendation (following REDIAL authors), it really comes down to two separate parts when using existing automatic evaluation metrics. Based on this, we evaluate the two parts separately as in Table 2 and Table 3 (as indicated in the first sentence of table captions), and leaving devising new evaluation metrics for joint performance for future work.

Second, it is important to note that the proposed recommender system does consider the conversation part by utilizing entities in dialog contents, although it ignores the dialog model in this work. It's also worth mentioning that the entity linking module should be viewed as part of the dialog system (as shown in Figure 1), which enables the possibility of adopting many knowledge aware dialog models.

In contrast, the conversational model depends on the representation provided by the recommender, which is why the recommender can and must be trained first.

Now let's regard these four questions:
Q1: That's right. The dialog model is not used during the evaluation. However, the entities linked from previous utterances and the knowledge graph is used, which both benefit the recommendation performance.
Q2: Yes. We masked the movies to UNK so the results truly represent the conversation quality. This is also true for other metrics, BLEU, etc. We also did this for the ReDial baseline.
Q7: No. Table 2 and 3 shows separate evaluations of the recommendation and conversation, which show the two systems can enhance each other.
Q8: As said earlier, it is the recommendation performance, which guarantees the mentioned position doesn't matter.

Q3

Sorry for missing this info in the paper... It is calculated as distinct n grams produced by the model in test, divided by number of sentences produced in test, which roughly captures how many novel n-grams are there in a sentence, which can be smaller than 1 if the test set is large, etc.

Q4

We tried identifying entities on the fly. However, it had high latency (perhaps because it is web-based) and became the bottleneck of the training process, which is why we cached and saved {utterance: entities_list} like you mentioned. In this way, the identification on the same utterance will not be executed over and over again as the training epoch increases. However, the latency is unnoticeable to humans so it is able to run on realtime in interactive mode.

Q5, Q6

We did not perform sentiment analysis. On the one hand, our main objective in this work is to provide a general framework in which recommendation and conversation truly involves and improves the other. Deciding whether to use sentiment analysis, and how to use it properly both should be delegated to the recommender system and the dialog system, based on whether sentiment analysis will improve their performance. On the other hand, as ReDial treats sentiment analysis as an auxiliary task, and does not shown its contribution to the two main tasks, we believe whether and how to add sentiment analysis to improve the whole system is still an open question and an interesting topic to follow.

Best,
Qibin

kondilidisn · 2019-11-22T14:23:02Z

Dear @qibinc,

thank you very much for your thorough analysis, it was very helpful.

I will close this Issue, as all my questions have been answered and I am leaving up to you to decide whether or not you want to display these questions and answers in any way.

Thank you again for your contribution and for the time you took to explain me some details.

Best,
Nikos.

kondilidisn closed this as completed Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describing the full end to end pipeline #9

Describing the full end to end pipeline #9

kondilidisn commented Nov 18, 2019

qibinc commented Nov 21, 2019 •

edited

Loading

kondilidisn commented Nov 22, 2019

Describing the full end to end pipeline #9

Describing the full end to end pipeline #9

Comments

kondilidisn commented Nov 18, 2019

qibinc commented Nov 21, 2019 • edited Loading

For Q1, Q2, Q7 and Q8:

Q3

Q4

Q5, Q6

kondilidisn commented Nov 22, 2019

qibinc commented Nov 21, 2019 •

edited

Loading