Skip to content

Latest commit

 

History

History
133 lines (81 loc) · 4.99 KB

README.md

File metadata and controls

133 lines (81 loc) · 4.99 KB

Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis, KDD 2025

Shanghai Institute of AI Education, School of Computer Science and Technology, and
Faculty of Education
, East China Normal University


Framework Image

😸 Welcome to LRCD, this is a comprehensive repository specializing in Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis published in KDD 2025.

Here, we propose LRCD, a new paradigm: a cognitive diagnosis framework that solely relies on language representation. It is both model-agnostic and scenario-agnostic.

Introduction of LRCD: Challenge, Solution and Insights

Challenge

Cognitive diagnosis aims to infer students' mastery levels based on their historical response logs. However, existing cognitive diagnosis models (CDMs), which rely on ID embeddings, often have to train specific models on specific domains. This limitation may hinder their directly practical application in various target domains, such as different subjects (e.g., Math, English and Physics) or different education platforms (e.g., ASSISTments, Junyi Academy and Khan Academy).

Framework Image

Solution

To address this issue, this paper proposes language representation favored zero-shot cross-domain cognitive diagnosis (LRCD). Specifically, LRCD first analyzes the behavior patterns of students, exercise and concepts in different domains, and then describe the profiles of students, exercises and concepts using textual descriptions (TCP). Via recent advanced text-embedding modules, these profiles can be transformed to vectors in the unified language space. Moreover, to address the discrepancy between the language space and the cognitive diagnosis space, we propose language-cognitive mappers (LCM) in LRCD to learn the mapping from the former to the latter. Then, these profiles can be simply and efficiently integrated and trained with existing CDMs.

TCP LCM

Insights

LRCD is not only model-agnostic but also scenario-agnostic, meaning it can be applied across various contexts, including Transductive CD, Inductive CD, Zero-Shot CD and Computerized Adaptive Testing.

Zero-Shot CD (Same Platform, Different Subjects)

Zero-Shot CD (Different Platform, Same Subjects)

Overlap CD (Same Platform, Different Subjects, Overlap Students)

Transductive CD (Standard Scenarios)

📰 News

  • [2024.12.16] Upload the introduction.
  • [2024.12.6] LRCD v1.0 is released.

Requirements

joblib==1.3.2
numpy==1.24.3
pandas==2.0.3
scikit-learn==1.3.2
scipy==1.10.1
torch==2.1.1
wandb==0.16.2

Please install all the dependencies listed in the requirements.txt file by running the following command:

pip install -r requirements.txt

Data Preprocess

You should process datasets by yourself, you need first

cd data

Noting: Due to some embedding text files being too large, we have zipped them. Before starting to run, you need to unzip all files. We upload the zip file on Google Drive. https://drive.google.com/file/d/10A8fdmqLXlMyw824_1zj0btaiRQ07mhf/view?usp=drive_link

Experiments

Then, you can choose different diagnostic methods based on the provided dataset to run this code. Here is an example:

python main.py --method=orcdf --train_file=data/SLP-BIO,data/SLP-PHY --test_file=data/SLP-MAT --seed=0 --batch_size=256 --device=cuda:0 --epoch_num=20  --lr=2.5e-4 --latent_dim=64

Reference 💭

Shuo Liu, Zihan Zhou, Yuanhao Liu, Jing Zhang, Hong Qian "Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis." In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025.

Bibtex

@inproceedings{Liu2025LRCD,
author = {Shuo Liu and Zihan Zhou and Yuanhao Liu and Jing Zhang and Hong Qian},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
title = {Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis},
year = {2025},
address={Toronto, Canada}
}