This is the code repository for the research project Convolutional Neural Networks Trained to Identify Words Provide a Good Account of Visual Form Priming Effects
Project objective: comparing human orthographic perception with visual DNN models (CNNs and ViTs).
Project outcome: CNNs did a good job in predicting the pattern of human priming scores across conditions, with correlations ranging from τ = .49 (AlexNet) to τ = .71. (ResNet101) with all p-values < .01. The CNNs performed similarly to the various orthographic coding schemes word recognition models, and often better. This contrasts with the relatively poor performance of the Transformer networks, with τ ranging from .25 to .38.
-
Prime Conditions The Form Priming Project includes 28 prime conditions for how a letter string can be amended to form a new string. For example, the word
$design$ in the "final tansposition" condition will be presented as$deigns$ . -
Measuring humans' perceptual similarity of words or letter strings: For a human participant, the similarity
$sim(s_1, s_2)$ of two word strings$s_1$ (the target) and$s_2$ (the prime) is measured using a Lexical Decision Task (LDT), where$s_1$ and$s_2$ are presented one at a time, with a fixation cross in between, and the participant has to decide as quickly as possible whether$s_1$ is a word or not. The reaction time is compared to that when the target word is presented with an arbitrary random string$s_3$ as prime. The similarity$sim(s_1, s_2)$ is calculated as$sim(s_1, s_2) = RT_{s_1|s_2} - RT_{s_1|s_3}$ . For each condition$C$ and each prime string$s_2$ , the mean similarity$\bar{sim}(s_1, C)$ is calculated by averaging the similarity$sim(s_1, s_2)$ over the 420 prime strings$s_2$ for$C$ . -
Measuring models' perceptual similarity of words or letter strings: For the models, the similarity
$sim(s_1, s_2)$ is measured by the cosine similarity$sim(s_1, s_2) = \cos(s_1, s_2)$ between the two vectors$s_1$ and$s_2$ where$s_1$ and$s_2$ are the flattened penulimate layer outputs when the models are fed with two images of the two strings. For each condition$C$ and each prime string$s_2$ , the mean similarity$\bar{sim}(s_1, C)$ is calculated by averaging the similarity$sim(s_1, s_2)$ over the 420 prime strings$s_2$ for$C$ . -
Comparing the perceptual patterns between humans and models: Kendall's rank correlation coefficient
$\tau$ is used to measure the correlation between the human and model priming scores across conditions. The human priming scores are taken from the Form Priming Project, and the model priming scores are calculated by the code in this repository. For a given model$M$ , its similarity with human priming is calculated as $\tau(M) = \sum_{C}(\bar{sim}(s_1, C)M - \bar{sim}(s_1, C){human})\text{sign}(\bar{sim}(s_1, C)M - \bar{sim}(s_1, C){human})$ where $\bar{sim}(s_1, C){M}$ and $\bar{sim}(s_1, C){human}$ are the mean similarity scores of the model$M$ and the human participant, respectively, for condition$C$ .
- the Fonts used to generate the data are in
assets/fonts
stored as.ttf
files. - The human priming data is sourced from the Form Priming Project (FPP), available at this link or here or here.
- You can either download the training data and the prime data as zip files or run the
generate_data.py
script to generate as many images as you like. The configurations of letter translation, rotation, variation s of font and sizes are at here. The zip file of the training data contains 800,000 images which should be enough for all models used in the current research.
- install
python==3.10.4
- install cuda driver
- install pytorch on pytorch.org - >= torch-1.11.0
pip install -r requirements.txt
- The LTRS model simulator is available at AdelmanLab
- The Interactive Activation Model and the Spatial Coding Model are implemented using this calculator developed by Prof. Colin Davis.
The tested models are Alexnet, DenseNet169, EfficientNet-B1 , ResNet50, ResNet101, VGG16, VGG19, ViT-B/16, ViT-B/32, ViT-L/16 and ViT-L/32. The models are initiated using ImageNet pre-trained weights from Torchvision, code for loading the weights are at tune.py. The trained parameters are available here
- layer-wise correlation coefficient: to be added*
The project was conducted under the auspices of the University of Bristol Mind & Machine Lab and supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 741134).
For further instructions and enquiries, please contact Don Yin.
MIT License (see LICENSE file)