Online advertising is, perhaps, the most successful business model for the Internet known to date and the major element of the online ecosystem. Advertising companies help their clients market products and services to the right audiences of online users. In doing so, advertising companies collect a lot of user generated data, e.g. browsing logs and ad clicks, perform sophisticated user profiling, and compute the similarity of ads to user profiles. User identity plays the essential role in the success of an online advertising company/platform.
As the number and variety of different devices increases, the online user activity becomes highly fragmented. People check their mobile phones on the go, do their main work on laptops, and read documents on tablets. Unless a service supports persistent user identities (e.g. Facebook Login), the same user on different devices is viewed independently. Rather than doing modeling at the user level, online advertising companies have to deal with weak user identities at the level of devices. Moreover, even the same device could be shared by many users, e.g. both kids and parents sharing a computer at home. Therefore, building accurate user identity becomes a very difficult and important problem for advertising companies. The crucial task in this process is finding the same user across multiple devices and integrating her/his digital traces together to perform more accurate profiling.
The Cross-Device Entity Linking Challenge provides a unique opportunity for academia and industry researchers to work on this challenging task. We encourage both early career and senior researchers to participate in the challenge by testing new ideas for cross-device matching and consolidating the approaches already published and described in the existing work. The successful participation in the challenge implies solid knowledge of entity resolution, link prediction, and record linkage algorithms, to name just a few.
For the model development, we release a new dataset provided by Data-Centric Alliance (DCA). The dataset contains an anonymized browse log for a set of anonymized userIDs representing the same user across multiple devices. We also provide obfuscated site URLs and HTML titles. By looking at this problem from the graph-theoretical perspective, we release data about nodes (userIDs at the level of devices and the corresponding click-stream logs) and a subset of known existing edges. The participants have to predict new edges (identify the same user across multiple devices). The evaluation is done by calculating the ratio of correctly predicted edges using the F1 measure.
The Challenge is a part of the CIKM2016 and continues the CIKM Cups series co-arranged as part of the ACM CIKM conference. The reports of the winning teams will be publicly released online. We also invite all participants to present their approaches at the CIKM Cup Workshop on October 28th in Indianapolis, USA.
Since online advertising is an industry dealing with sensitive and large-scale datasets, it is hard for academic researchers to get access and work with such datasets. Therefore, this challenge might be especially interesting for researchers from academia, who want to work with the real large-scale advertising dataset and experiment with various known and new graph mining algorithms applied to the cross-device matching problem.
We also very welcome and encourage the participation of: industry researchers from companies working on online advertising including major RTB/DMP/DSP vendors such as BlueKai, Turn, Lotame, eXelate, OpenX, etc. industry researchers and engineers, who have accumulated a lot of expertise relevant to this problem. We encourage the teams from top research labs such as Microsoft Research, Google Research, Yahoo Labs, Yandex, Baidu Labs and to join in; early career data scientists and professors teaching data mining and (social/information) network mining, who could leverage the challenge to teach/learn by doing having the unique access to the large-scale real-world dataset. We hope that you will enjoy participating in Cross-Device Entity Linking Challenge and push to the limits your creativity and data mining talent. Good luck!