📄 Paper (Coming Soon) | 🚀 Project Page | 🤗 Dataset

We evaluate the alignment of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) with human perception, focusing on the Japanese concept of shitsukan.
Shitsukan represents the sensory experience when perceiving objects, an inherently vague and highly subjective concept.
We created a new dataset of shitsukan terms recalled by individuals in response to images of specified objects. We also designed benchmark tasks to evaluate the shitsukan recognition capabilities of LLMs and LVLMs.
This library is experimental and under active development. We plan to add some breaking changes in the future to improve the usability and performance of the library.
The currently supported API LLMs/LVLMs are as follows:
The currently supported Huggingface LLMs are as follows:
The currently supported Huggingface LVLMs are as follows:
The currently supported vLLM LLMs are as follows:
The currently supported vLLM LVLMs are as follows:
cd $HOME
git clone git@github.com:<ANONYMOUS>/shitsukan-eval
cd $HOME/shitsukan-eval
uv python install 3.11
uv python pin 3.11
uv sync --no-dev
uv sync --dev --no-build-isolation
# for developper
# uv run pre-commit install
# Prepare COCO 2017 images
mkdir -p $HOME/data/images
cd $HOME/data/images
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
unzip train2017.zip
unzip val2017.zip
# Prepare our Shitsukan datasets
mkdir -p $HOME/shitsukan-eval/data
cd $HOME/shitsukan-eval/data
git lfs install
git clone https://huggingface.co/datasets/<ANONYMOUS>/Shitsukan
The following command evaluate the specified model on the specified tasks in shitsukan-eval.
export CUDA_VISIBLE_DEVICES=0
uv run python -m shitsukan_eval \
--model "<model_name_or_path>" \
--model-type "<model_type>" \
--tasks "<task_name>" \
--sub-tasks "<sub-task_name>" \
--lang "<lang>" \
--image-dir "<base-image_path>" \
--save-dir outputs \
--verbose
Explanation of the available arguments
--model
(str
): The name or path of the model to evaluate. (e.g.,"Qwen/Qwen2-VL-7B-Instruct"
)--model-type
(str
): The model type of the specified model.- Model type that can be specified:
"api"
,"hf"
,"vllm"
- Model type that can be specified:
--tasks
(str
): The task name to evaluate.- Tasks that can be specified:
"perception"
,"commonsense"
,"taxonomic"
- Tasks that can be specified:
--sub-tasks
(List[str]
): List of sub-tasks within the tasks.- In case of
--tasks "perception" --language "ja"
, Sub-tasks that can be specified:"generation"
,"selection"
- In case of
--tasks "perception" --language "en"
, Sub-tasks that can be specified:"selection"
- In case of
--tasks "commonsense" --language "ja"
, Sub-tasks that can be specified:"generation"
,"classification"
- In case of
--tasks "commonsense" --language "en"
, Sub-tasks that can't be specified - In case of
--tasks "taxonomic" --language "ja"
, Sub-tasks that can be specified:"a_b_classification"
,"yes_no_classification"
,"multiple_choice_classification"
- In case of
--tasks "taxonomic" --language "en"
, Sub-tasks that can't be specified
- In case of
--lang
(str
): Language to use for the evaluation (default:"ja"
).- Language that can be specified:
"ja"
,"en"
- Language that can be specified:
--image-dir
(Optional[str]
): Directory where input images are stored (optional).- If you specify
--image-dir="data"
, the evaluation script will reference the COCO 2017 images located atdata/images/coco2017/train2017/*.png
anddata/images/coco2017/val2017/*.png
during execution. If you have not prepared the COCO 2017 images, please download them in advance from here.
- If you specify
--save-dir
(str
): Directory where evaluation results will be saved.-v
,--verbose
(bool
): If set, print detailed information during processing.
[!NOTE] The configuration files for each task are located at
shitsukan_eval/tasks/{task}/{sub_task}/{task}_{sub_task}_{lang}.yaml
.
If you want to modify the settings, please change them here.
@inproceedings{shiono-etal-2025-evaluating,
title = "Evaluating Model Alignment with Human Perception: A Study on Shitsukan in {LLM}s and {LVLM}s",
author = "Shiono, Daiki and
Brassard, Ana and
Ishizuki, Yukiko and
Suzuki, Jun",
editor = "Rambow, Owen and
Wanner, Leo and
Apidianaki, Marianna and
Al-Khalifa, Hend and
Eugenio, Barbara Di and
Schockaert, Steven",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
month = jan,
year = "2025",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.coling-main.757/",
pages = "11428--11444",
}
(🚧 Here: Add description for this repo 🚧)