Human attention distribution and computational models.
(A) Examples of human attention distribution, quantified by the word reading time. The histograms on the right showed the mean reading time on each line, for both human data and model predictions. trans_pre: pre-trained transformer-based models; trans_fine: transformer-based models fine-tuned on the goal-directed reading task. (B) The general architecture of the 12-layer transformer-based models. The model input consists of all words in the passage and an integrated option. Output of the model relies on the node CLS12, which is used to calculate a score reflecting how likely an option is the correct answer. The CLS node is a weighted sum of the vectorial representations of all words and tokens, and the attention weight for each word in the passage, i.e., α, is the DNN attention analyzed in this study.