Introduction

Learning to read is a major milestone during childhood, and the early factors that determine reading acquisition remain an area of active research. There is converging evidence that some simple cognitive and linguistic factors are associated with early reading across orthographies. These include Rapid Automatized Naming (RAN), the ability to rapidly name letters or digits shown simultaneously (Denckla & Rudel, 1974, 1976; Georgiou et al., 2013; Georgiou & Parrila, 2020; Kail & Hall, 1994; Landerl & Wimmer, 2008; McWeeny et al., 2022; Norton & Wolf, 2012; Wijaythilake et al., 2019) and Phonological Awareness (PA), the ability to manipulate sound units in a word (Landerl & Wimmer, 2008; Mattingly, 1972; Melby-Lervåg et al., 2012; Parrila et al., 2004; Stephenson et al., 2008). Letter recognition, the ability to name visually presented letters, is also a strong predictor of reading skills in children (Clayton et al., 2020; Muter et al., 2004; Nag, 2007; Nag & Snowling, 2012; Noel Foulin, 2005; Shapiro et al., 2013; Sigmundsson et al., 2020; Stephenson et al., 2008). In addition to these cognitive measures, differences in brain activity also reflect the nature of early print knowledge. For example, kindergarten children show distinct neural responses to print compared to other visually familiar symbols (Lochy et al., 2016). Children with high vs low letter knowledge show differences in the N1 ERP component (Maurer et al., 2005), and this N1 component predicts eventual reading skills (Bach et al., 2013; Brem et al., 2013).

A tacit assumption made by most studies is that letter knowledge is taken as the ability to explicitly name or recognize letters. However, there could be two types of letter knowledge: knowledge of letter shapes and knowledge of letter-sound associations, which could be driven by different underlying brain mechanisms. While one might assume that children only become familiar with letter shapes once they start learning letter names, this assumption was indirectly challenged by the observation that children in rich home literacy environments are able to distinguish words and sentences from scribbles and even inverted versions (Levy et al., 2006). However, this discrimination could be driven by familiarity with global print features rather than familiarity with letter shapes that arguably are more relevant for subsequent reading skills.

Thus, the exact nature of early print knowledge remains to be determined, particularly whether children have letter-level knowledge that might drive subsequent reading acquisition. This forms the first question we investigated in this study.

Understanding the role of visual familiarity with letter shapes in reading development is important for another reason: several studies on typical and atypical development have shown that reading skills are also driven by or associated with basic visual functions, in addition to language-related functions. For example, children with low RAN scores (an early cognitive skill related to reading, as discussed earlier), showed slower performance in basic visual tasks, such as simple reaction time and same/different decision tasks (Stainthorp et al., 2010). Typically developing readers showed less serial search compared to disabled readers in a cancellation task using letter or letter like stimuli (Casco & Prunetti, 1996), and faster than disabled readers in both target-present and target-absent trials in a complex visual search combining shape and color (Vidyasagar & Pammer, 1999). Normal readers of left-to-right scripts, but not disabled readers, showed higher change detection sensitivity in the right hemifield (Rima et al., 2020). Importantly, the error rate of children in kindergarten on a serial search tasks using non-letter symbols, predicted reading fluency in grades 1 and 2 (Franceschini et al., 2012). Consistent with these findings, a review article has highlighted that deficits in visual and auditory processing are associated with reading disabilities (Goswami, 2015).

Turning specifically to letter familiarity, both older children and adult readers are faster on visual search involving upright compared to inverted letters and bigrams (Agrawal et al., 2022; Reicher, 1976; Richards & Reicher, 1978). Specifically for children in grades 3-5, visual search for upright (but not inverted) bigrams was found to predict reading fluency (Agrawal et al., 2022). These findings imply an association between reading experience and a visual processing advantage for upright letters. However, the minimum amount of such experience required to trigger this advantage is unclear. It could be that this advantage requires only familiarity with letters, or it might require letter name instruction.

How can children be familiar with letter shapes even without formal instruction? We propose that familiarity could come through passive exposure to print. At first glance, it might seem surprising that mere print exposure or minimal formal instruction should have any effect at all. However, there is considerable evidence that humans can learn visual shapes and statistical regularities through passive exposure. Humans are highly sensitive to image repetition and remember images repeated after surprisingly long intervals (Brady et al., 2008). They are also sensitive to higher-order statistical properties of passively viewed displays containing multiple objects (Baker et al., 2004; Brady & Oliva, 2008; Bulf et al., 2011; Fiser & Aslin, 2001, 2002b, 2002a; Kirkham et al., 2002; Turk-Browne et al., 2009, 2010), and tend to classify new object groups with similar statistical properties as familiar (Vidal et al., 2021). More specific to the implicit processing of print, children learn the visual properties of made-up spellings without depending on letter sounds, and apply this pattern knowledge to novel spellings with similar statistical properties (Samara et al., 2019; Singh et al., 2021). Finally, neural responses in the visual cortex are rapidly modified by image repetition and repeated image sequences, all during passive viewing (Esmailpour et al., 2023; McMahon & Olson, 2007; Meyer et al., 2014). While these studies show that humans get familiar to image and image statistics though implicit learning, such visual familiarity may also result in a visual processing or task advantage for frequently encountered stimuli. For instance, searches are faster and/or more accurate for targets embedded among familiar compared to unfamiliar objects and configurations (Kaiser et al., 2014; Reicher, 1976; Richards & Reicher, 1978; Võ & Wolfe, 2015; Wang et al., 1994; Wolfe, 2001; Wolfe & Horowitz, 2017). However, these studies have used visual stimuli (objects/scenes) with high familiarity and with clearly associated verbal labels and thus so do not distinguish between visual familiarity and recognition. However, there is recent evidence that even co-occurrence statistics can produce similar effects (Thorat et al., 2022).

Thus, it is unclear whether visual familiarity with letter shapes in children at the emergent literacy stage, distinct from letter name recognition, can independently provide a visual processing advantage in an unrelated task. This forms the second question we investigate in this study.

Overview of this study

To summarize, despite the many insights from previous studies, two fundamental questions remain unanswered. First, are children familiar with letter shapes even before they know letter names? Second, does this early familiarity with letter shape confer them any visual processing advantage?

To investigate these questions, we recruited a cohort of 300 children at the start of formal reading instruction (in lower and upper kindergarten and Grade 1) across 14 schools in varying urban and semi-urban settings, with varying language of instruction. Kannada is taught in these schools either as the main language of instruction for all but a dedicated lesson for English six times a week or the reverse. Kannada is an alpha-syllabary with distinct letters called akshara whose shape is modified by vowel and consonant diacritic markers (Nag, 2017). Thus, each Kannada letter represents a syllable unlike in English. For the study, we selected a mix of early/late taught and frequent/infrequent Kannada akshara/letters. We designed the study to test whether the upright search advantage observed in fluent readers (Agrawal et al., 2020, 2022; Malinowski & Hübner, 2001; Reicher, 1976; Richards & Reicher, 1978; Shen & Reingold, 2001; Wang et al., 1994; Wolfe, 2001) depends on explicit letter recognition, or whether visual familiarity alone is sufficient. By visual familiarity, we mean the ability to identify the correct orientation of a letter, without any requirement to pronounce or name the letter. We recruited children at the onset of formal reading instruction, when letter familiarity and recognition are rapidly developing and will have a large dynamic range. We also examined age, grade, and medium of instruction to account for individual differences and potential influences on early reading-related skills.

Each child performed four experiments, as summarized in Figure 1. Experiment 1 was a letter familiarity task, designed to evaluate how familiar children were to individual Kannada letters. To this end, participants were shown each letter in upright and inverted orientation and were asked to choose the orientation that they had seen more often (Figure 1A). Experiment 2 was a letter/akshara recognition task, designed to measure explicit letter knowledge. Such knowledge is a robust predictor of reading attainments in Kannada (Nag & Snowling, 2012). Participants were shown letters in random sequence and asked to name them (Figure 1B). Experiment 3 was designed to measure the visual processing of letters using a visual search. Participants had to find an oddball target among identical distractors, where both target and distractors were either all upright or all inverted letters (Figure 1C). We selected visual search because it is a natural, intuitive task for children. Unlike other tasks such as subjective dissimilarity ratings, visual search yields an objective measure of performance, since participants have to locate an oddball target.

Summary of experiments

(A) Experiment 1: Letter familiarity task. Participants were shown upright and inverted versions of each letter and had to identify which one they had seen more often. (B) Experiment 2: Letter recognition task. Participants were shown upright letters and had to name the letter. (C) Experiment 3: Letter search task. Participants saw two boxes containing four letters each and had to identify the box containing the one odd or different item. Letters were either upright or inverted in interleaved trials. (D) Experiment 4: RAN task. Participants were shown a card printed with a grid of digits and were asked to rapidly read them aloud from left to right as quickly as possible.

Since our interest is in emergent literacy skills, we examined another early skill established to be related to reading. Accordingly, experiment 4 was a Rapid Automatized Naming (RAN) task in which participants were shown a sheet of paper containing rows of digits and had to rapidly name as many digits as possible (Figure 1D). We selected RAN over Phonological Awareness (PA) measures because the RAN task is comparable across scripts, unlike the need for varying levels of phonology across orthographies (e.g., syllable, phoneme) to achieve a sensitive PA measure (Landerl et al., 2019; Sideridis et al., 2016). RAN also predicts early reading fluency better than PA across orthographies (de Jong & van der Leij, 1999; Lervåg & Hulme, 2009; Schatschneider et al., 2004; Torppa et al., 2010), including Kannada, the orthography of interest to this study (Nag & Snowling, 2012). In addition, many PA tasks at the phoneme level are prone to floor effects in studies in the Indic orthographies (Nag, 2017).

Results

We report four experiments on a cohort of 300 children recruited across 14 schools in varying urban/rural settings. We selected for testing the Indian language Kannada, which is taught in these schools with varying hours of instruction. Our goal was to characterize letter knowledge using letter familiarity and recognition tasks. We examined visual processing of letters using visual search. In addition, we also assessed children’s performance on rapid automatized naming (RAN), an early cognitive skill related to reading development.

Experiments 1 & 2: Letter familiarity and recognition

In Experiment 1, we tested children for their familiarity with correct letter orientation across 18 Kannada letters (see Methods; Figure 1A). On each trial, a letter was displayed in two orientations (upright and inverted), and participants had to choose which of the two orientations they had seen more often. Children were highly accurate on this task (accuracy, mean ± sd: 88.6% ± 15.3% across all 285 children who participated in this task). A large majority of children (94.4%) had above-chance performance on this task (see Methods).

To investigate whether some letters were consistently identified by children to be more familiar than other letters, we performed a split-half correlation analysis. We divided the participants into two equal halves, and asked whether the letter-wise accuracy on the familiarity task was correlated between the two groups. This revealed a significant correlation (r = 0.89, p < 0.00005 between 18 letter-wise accuracies of odd- and even-numbered participants).

The high performance of children on the familiarity task could have been driven by the top line (tale kattu, which depicts the vowel /a/ for many letters) present in many (ದ, ವ, ನ, ಈ ಗ, ರ, ಮ, but not all of the tested set (ಒ, ಜ, ಣ, ಅ, ಆ, ಇ, ಖ, ೪, ). To investigate this possibility, we calculated the familiarity accuracy separately for letters with and without this top line. This revealed a high correlation, suggesting that children did not rely merely on this feature to identify the correct letter orientation at high accuracy (Figure S1).

In Experiment 2, children were tested for their knowledge of letter/akshara names. On each trial, a letter was displayed on the screen and participants had to name the letter. Children were accurate in general but showed larger variability on this task compared to the familiarity task (accuracy, mean ± sd: 53.8% ± 30.5%). To ascertain whether some letters were recognized consistently more often than others by all participants, we performed a split-half correlation as before. This revealed a significant correlation (r = 0.98, p < .00005 between average accuracy for 18 letters for odd- and even-numbered participants). Here too, letter recognition accuracy was highly correlated between letters with and without the top line, suggesting that this feature did not influence recognition accuracy (Figure S1).

Next, we asked whether there is any systematic covariation between letter familiarity and recognition. To this end, we plotted the letter familiarity accuracy for each participant against their letter recognition accuracy. This revealed an overall positive correlation (r = 0.39, p < .00005; Figure 2A). This correlation is to be expected since correct recognition of a letter is only possible if it is familiar. What was surprising, however, was that there were many children with high familiarity but low recognition accuracy. For example, the children who could name less than 50% of the letters, were highly accurate on the letter familiarity task (accuracy, mean ± sd: 79.5% ± 18.5%; cyan and orange points of Figure 2A). Thus, children knew letter shapes even without knowing letter sounds.

Letter shape familiarity and letter name knowledge in early readers

(A) Accuracy on the familiarity task (Experiment 1) plotted against the accuracy on the letter recognition task across 208 children who participated in both tasks. Each dot represents one child. Participants are shown as divided into four distinct groups based on familiarity and recognition accuracy for subsequent analyses in panels D-G: high/low familiarity (cyan/blue) × high/low recognition (orange/red). The overall correlation across all participants is depicted at the bottom left with asterisks representing statistical significance (**** is p < .00005). (B) Same as (A) but using familiarity accuracy calculated only on the letters that were not recognized by each child. Recognition accuracy is calculated as before. Note that this means that familiarity accuracy is calculated over many more letters for children with low compared to high recognition accuracy. Nonetheless it can be seen that children show high levels of letter shape familiarity even on letters that they did not recognize at all. This challenges the assumption that children become familiar with letter shape only when they undergo formal reading instruction. (C) Same as (A) but using familiarity accuracy calculated only on recognized letters, shown for the sake of completeness. Note that familiarity accuracy is now calculated across many more letters for children with high compared to low recognition accuracy. It can be seen that children showed high levels of letter familiarity on letters that they did recognize, which is not surprising. (D) Average search time for upright and inverted letter searches for participants in the high familiarity, low recognition group (n = 39). Error bars represent the standard error of the mean for the average search time across participants. Asterisks above the bars represent statistical significance, calculated using a sign-rank test on average search times for upright and inverted searches (** is p < .005). (E) Same as (D) but for the high familiarity, high recognition group (n = 121, *** is p < .0005). (F) Same as (D) but for the low familiarity, low recognition group (n = 28). (G) Same as (D) but for the low familiarity, high recognition group (n = 20).

The high familiarity accuracy but low recognition accuracy in Figure 2A could be driven by performance on different subsets of letters or by individual variability across children. To rule out this possibility, we recalculated the familiarity accuracy for each child only for the letters that they did not recognize and plotted against their overall recognition accuracy across all letters. This too revealed a consistently high familiarity score (accuracy, mean ± sd: 91.8% ± 12.3%) and a similar correlation as before with overall recognition (r = 0.31, p < .00005; Figure 2B). One might think that children have learned some common letter shape statistics that they are using for doing the familiarity task. However, as we observed, children who recognized only a few letters also performed very high in the familiarity task (Figure 2B, cyan dots). It is unlikely that they are generalizing shape statistics from those few letters to all other letters in the set. We obtained a similar correlation between familiarity accuracy for recognized letters and the overall recognition accuracy (r = 0.33, p < .00005; Figure 2C).

We conclude that children are highly familiar with letter shapes even in the absence of explicit letter recognition.

Since there were consistent variations in letter-wise accuracy in both familiarity and recognition tasks, we wondered whether the two accuracies were correlated. We found no correlation across all letters, but on closer inspection we found this was due to one letter (ಣ) which had poor familiarity accuracy, presumably because it is highly similar to its inverted orientation and possibly to another letter (ಳ). Upon removing this letter, we observed a significant correlation (Figure S2). Thus, easily recognizable letters are also highly familiar.

Experiment 3: Visual processing of upright and inverted letters

In Experiment 3, we tested visual processing for upright and inverted letters using a visual search task. On each trial, participants saw a display containing one oddball among multiple identical distractors and had to indicate the box containing the oddball item using a keypress (Figure 1C). Children were highly accurate on this task (accuracy, mean ± sd: 79.8% ± 19.3% across all searches across 297 children). A large majority of the children (73%) had above-chance performance on this task (see Methods). We selected these 217 children for further analysis, but obtained qualitatively similar results on repeating our analyses across all children.

Interestingly, these children were more accurate on upright compared to inverted letter searches (accuracy, mean ± sem: 92.2% ± 0.5% for upright; 90.9% ± 0.6% for inverted; p = .003, using a sign-rank test across 217 children). At the group level, they were also faster to respond for upright compared to inverted letter searches (search times, mean ± sem: 2.87 ± 0.036 s for upright searches; 3.16 ± 0.042 s for inverted searches; p < .00005 on a sign-rank test across average search times of upright and inverted letters across 217 children). Thus, as a group, children showed a clear visual processing advantage for upright compared to inverted letters.

Next, we investigated how the visual processing advantage for upright letters for each child was related to their performance on the letter familiarity and recognition tasks. To this end, we divided children into four groups based on their performance on the letter familiarity and recognition tasks: (1) the low familiarity low recognition group, with 28 children (23 from kindergarten, 5 from grade 1) having less than 90% accuracy on the familiarity task and less than 50% accuracy on the recognition task; (2) the low familiarity, high recognition group, with 20 children (12 from kindergarten, 8 from grade 1) having less than 90% familiarity but above 50% recognition accuracy; (3) the high familiarity, low recognition group with 39 children (22 from kindergarten, 17 from grade 1) that had higher than 90% familiarity accuracy and less than 50% recognition accuracy; and (4) the high familiarity high recognition group, with 121 children (45 from kindergarten, 76 from grade 1) that had more than 90% familiarity accuracy and more than 50% recognition accuracy. We then asked whether the visual search advantage for upright letters was present in each group. The results of this analysis are summarized in Figures 2D-G. We observed significantly faster search times in the high-familiarity groups (with low or high recognition accuracy; Figure 2D: p = .001 and 2E: p < .0005, respectively). However, this trend was present but not statistically significant in the low familiarity groups (with low or high recognition accuracy; Figure 2F-G, p > 0.1). Thus, the visual processing advantage for upright letters is associated more strongly with visual familiarity for letter shape than with letter recognition.

Which factors are associated with the visual processing advantage for upright letters?

The above results are based on dividing children into four somewhat ad-hoc groups based on their familiarity & recognition accuracy. While we obtained qualitatively similar results even upon varying these criteria, we nonetheless sought alternate analyses that do not involve subdividing into groups, to understand which factors are associated with the search advantage for upright letters across children. To this end, we asked whether the search advantage, measured as the average difference in search time (inverted minus upright), was correlated with a number of factors: (1) motor speed, measured as the response time to make a keypress in a baseline motor task; (2) motor accuracy, measured as the accuracy on the baseline motor task; (3) Familiarity accuracy; (4) Recognition accuracy; (5) Search accuracy difference (upright minus inverted); (6) the RAN score; (7) Age of the child; (8) Grade of the child (taken as 1/2/3, corresponding to lower kindergarten, upper kindergarten, and Grade 1); (9) language of instruction (1/2 corresponding to English & Kannada). For this analysis, we included 181 children who participated in all Experiments 1-4 and performed the letter search experiment with significantly above-chance accuracy.

Among all these factors, only two factors - familiarity accuracy and recognition accuracy had a significant correlation with the upright search advantage (r = 0.27, p < .0005 for familiarity accuracy; r = 0.18, p = .017 for recognition accuracy; Figure 3). However, a correlation between the upright search advantage and a given factor could arise spuriously because it is correlated with another factor - for instance, recognition accuracy might be correlated with the search advantage due to its correlation with familiarity accuracy. Therefore, to assess the unique contribution of each factor, we performed a partial correlation analysis: for each factor X, we calculated the partial correlation between the upright letter search advantage and the factor X that remained after removing the joint contribution of all other factors. Upon doing so, we found that the only factor that had a significant partial correlation with the upright letter search advantage was familiarity accuracy (r = 0.23, p = .0016; Figure 3B; p-value survived Bonferroni correction for nine comparisons, with a corrected threshold of p = 0.05/9 = .0056).

Factors that determine upright letter search advantage

(A) Correlation between the upright letter search advantage and each factor across children. Error bars represent standard deviation estimated using a bootstrap analysis: participants were selected randomly with replacement 1,000 times and the correlation was calculated each time, and the error bar is taken as the standard deviation across these bootstrapped correlations. Statistically significant correlations are indicated using green bars with asterisks, and others using blue bars. Asterisks represent statistical significance (* is p < .05; ** is p < .005; *** is p < .0005, etc). (B) Partial correlation between the upright search advantage and each factor across children. All other conventions are as before.

To further confirm the robustness of this finding, we performed a multiple linear regression using the same nine predictors. This approach models the relationship between the dependent variable (upright search advantage) and all eight predictors simultaneously, while accounting for shared variance among them. By estimating the contribution of each predictor after controlling for the others, we could identify which factors were independently associated with the effect. Here also, we found that only familiarity accuracy showed a significant effect (weight = 0.95 ± 0.31, p = .003), while the others did not reach significance (p > .10 for all other factors).

We conclude that visual familiarity is uniquely associated with the upright letter search advantage compared to all the other factors.

Effect of grade on performance across experiments

The above results show that visual familiarity is uniquely associated with upright search advantage in children, even after controlling grade, medium of instruction and other factors. However, as task performance may vary with grade due to increased exposure or formal instruction, we examined performance differences across grades using only those children who completed all four experiments (92 kindergarteners, 89 Grade 1 children). As expected, Grade 1 children showed higher accuracy in both the familiarity task (mean ± sd: 90.61%±11.61% for kindergarten vs. 95.37%±7.2% in grade 1; p < 0.0005, unpaired signed-rank test) and the recognition task (mean ± sd: 55.61% ± 26.88% in kindergarten vs. 70.29% ± 24.23% in grade 1; p < 0.0005, unpaired signed-rank test), and slightly higher but not significantly different RAN scores (mean ± sd: 1.02 ± 0.29 in kindergarten vs. 1.11 ± 0.35 in grade 1; p = 0.066, unpaired signed-rank test). However, the upright search advantage (inverted minus upright search time) did not differ between groups (mean ± sd: 0.31 ± 0.4 s for kindergarten vs. 0.27 ± 0.35 s for Grade 1; p = 0.46, unpaired signed-rank test).

To examine the association between familiarity accuracy and upright search advantage while ruling out the influence of explicit letter knowledge, we repeated the correlation and partial correlation analyses in kindergarteners with no formal reading instruction. Among all factors (eight factors: motor speed, motor accuracy, familiarity accuracy, recognition accuracy, upright minus inverted search accuracy difference, RAN score, age, and medium of instruction), only familiarity accuracy (r = 0.4, p < .0005) and recognition accuracy (r = 0.28, p = .007) were significantly correlated with search advantage. A partial correlation analyses (as described before) revealed that only familiarity accuracy uniquely associated with upright search advantage, after removing the joint contribution of all other factors (r = 0.33, p = .002). A multiple linear regression including the same eight factors also revealed that only familiarity accuracy significantly predicted upright search advantage (weight (β): 1.21 ±0.38, p = .002), while all other predictors were non-significant (p > 0.2).

On repeating the same analysis for Grade 1 children (n = 89) who had some formal instruction, we found that no factor had a significant correlation (i.e. p < 0.05) with the upright letter search advantage (r = 0.1, p = .34 for familiarity accuracy, r = 0.09, p = .38 for recognition accuracy, all other factors p > .15). The partial correlation analyses also revealed a similar null result (partial correlations: r = 0.06, p = .59 for familiarity accuracy, r = 0.03, p = .79 for recognition accuracy, all other factors p > .15). On closer inspection, we found that Grade 1 familiarity accuracy scores were much higher or at ceiling, restricting their range. This likely reduced the ability of the familiarity accuracy to predict the search advantage. We speculate that harder familiarity tests, such as discriminating between vertical mirror versions, or letters with mispositioned diacritical marks (matraa/gunita), or testing on bigrams may reveal a correlation with the upright letter advantage over Grade 1. These are interesting possibilities for future work.

In sum, we conclude that familiarity accuracy is uniquely associated with the upright letter search advantage in early readers, with the effect most pronounced in kindergartners.

Effect of language of instruction on performance across experiments

Since children in our study attended schools with either English or Kannada as the medium of instruction, we examined its effect on performance across all experiments, as well as on the key finding of the unique association between familiarity accuracy and upright search advantage. These analyses included 95 English-medium and 86 Kannada-medium children who completed all four experiments. Both groups were equally accurate on the familiarity task (accuracy, mean ± sd: 93.15 ± 10.1% for English medium; 92.7 ± 9.9% for Kannada medium; p=0.58, unpaired signed-rank test), and showed comparable upright search advantage (inverted-upright search RT difference, mean ± sd: 297 ± 374 ms for English medium; 279 ± 374 ms for Kannada medium, p = 0.86, unpaired signed-rank test).

However, Kannada-medium children performed better in letter recognition (accuracy, mean ± sd: 57.6 ± 25.26% for English medium; 68.6% ± 26.94% for Kannada medium, p = 0.001, unpaired signed rank test). This is expected since the Kannada medium children would have greater exposure to Kannada letters. Conversely, children in English-medium schools performed better on the RAN task (RAN score, mean ± sd: 1.13 ± 0.33 across English medium; 0.98 ± 0.3 across Kannada medium; p = 0.002, unpaired signed-rank test). We surmise that this is because children in English medium schools were probably exposed far more to Arabic digits compared to their Kannada medium counterparts due to a relatively more accelerated mathematics curriculum.

To assess whether medium of instruction influences the unique association between familiarity accuracy and upright search advantage, we conducted partial correlation analyses separately for English- and Kannada-medium children, controlling for eight factors (motor speed, motor accuracy, recognition accuracy, upright-inverted search accuracy difference, RAN score, age, and grade). In both groups, familiarity accuracy was significantly correlated with the upright search advantage (correlation for English medium: r = 0.29, p = .004 for familiarity accuracy, r = 0.22, p = .03 for recognition accuracy, all other factors p > .2; Kannada medium: r = 0.25, p = .02 for familiarity accuracy, r = 0.16, p = .15 for recognition accuracy, all other factors p > .05), and uniquely so after taking into account all other factors through partial correlation (partial correlations, English medium: r = 0.21, p = .047 for familiarity accuracy, r = 0.15, p = .16 for recognition accuracy, all other factors p > .1; Kannada medium: r = 0.24, p = .03 for familiarity accuracy, r = .04, p = .72 for recognition accuracy, all other factors p > .3). A multiple linear regression with the same eight factors also confirmed these results.

Together, these findings suggest that familiarity accuracy uniquely accounts for variance in upright search advantage irrespective of the medium of instruction.

Which factors are associated with rapid automatized naming (RAN) scores?

Next, we wondered whether performance on a RAN task (Experiment 4), would be strongly correlated with any of the factors evaluated previously. To this end, we calculated the correlation between the RAN score of each child and each factor as before (Figure 4A). This revealed a positive correlation between the RAN score and familiarity accuracy (r = 0.26, p < 0.0005), recognition accuracy (r = 0.45, p < 0.00005), and grade (r = 0.19, p < 0.05), which means that higher values of these factors predict higher RAN scores. We also observed a negative correlation with the language of instruction (r = −0.23, p = .002), which is consistent with the observation in the previous section that children in Kannada medium schools had lower RAN scores.

Factors that determine RAN score & correlations between all factors

(A) Correlation between RAN score and each factor across children. Error bars represent standard deviation estimated using a bootstrap analysis: participants were selected randomly with replacement 1,000 times and the correlation was calculated each time, and the error bar is taken as the standard deviation across these bootstrapped correlations. Statistically significant correlations are indicated using green bars with asterisks, and others using blue bars. Asterisks represent statistical significance (* is p < 0.05; ** is p < 0.005; *** is p < 0.0005, etc). (B) Partial correlation between the upright search advantage and each factor across children. All other conventions are as before (C) Colormap of pairwise correlations between measures across all tasks. The color in each box represents the correlation coefficient between the corresponding factors. Asterisks inside the box represent statistical significance (* is p < 0.05, ** is p < 0.005, etc).

In the above analysis, a correlation between RAN score and a given factor could arise spuriously because it is correlated with a third factor. To assess the unique contribution of each factor, we therefore performed a partial correlation analysis, wherein we calculated the partial correlation between RAN score and each factor X after removing the contribution of all other factors (Figure 4B). This revealed only two factors that had a significant contribution to the RAN score: recognition accuracy (r = 0.46, p < 0.00005) and language of instruction (r = −0.38, p < 0.00005).

We conclude that the RAN score is uniquely associated with letter recognition accuracy.

Correlation between all factors across experiments

Finally, we scrutinized the factors for other interesting correlations. To this end, we calculated all pairwise correlations between these factors. This is depicted as a pairwise correlation matrix in Figure 4C. Some of these correlations are trivial, such as the correlation between age and grade. Others are unsurprising, such as a positive correlation between age & grade with familiarity accuracy and recognition accuracy. There was also a significant correlation between recognition accuracy and medium of instruction (r = 0.21, p < 0.05), which is consistent with earlier finding of children with Kannada language instruction being better at recognizing Kannada letters, presumably due to more exposure or instruction.

General discussion

We tested a large cohort of children with little or no formal reading instruction for their familiarity with letter shapes, knowledge of letter names, and their visual processing of upright and inverted letters. Our main and novel finding is that children can accurately identify the correct orientation of letters even though they could not name them, presumably due to exposure to print in the environment. Moreover, they could find upright letters faster than inverted letters, and this upright letter advantage was uniquely associated with their letter familiarity accuracy rather than letter recognition accuracy. These findings are important for two reasons: First, they identify letter shape familiarity as a novel component of early print knowledge, and challenge the tacit assumption that children learn letter shapes only during formal instruction. Second, they show that some children at the start of formal literacy instruction have an early visual processing advantage for upright letters, which is uniquely associated with letter shape familiarity. Below we discuss our findings in relation to the existing literature.

Letter knowledge as a predictor of early reading skills

Our results are broadly consistent with previous work that has highlighted print letter knowledge as a strong predictor of future reading skills (Clayton et al., 2020; Muter et al., 2004; Nag, 2007; Nag & Snowling, 2012; Noel Foulin, 2005; Shapiro et al., 2013; Sigmundsson et al., 2020; Stephenson et al., 2008). However, these studies do not distinguish between knowledge of letter shape and knowledge of letter names or sounds, as we have done here in a language where letter name and sound are the same. Our results are consistent with the finding that early readers progress from having a coarse discrimination of print (e.g. real vs false fonts) to finer discriminations such as knowing typical letter spacing and eventually the letter strings for word spelling (Levy et al., 2006). However, our results go further to show that children are aware of the shapes of letters before they are able to recognize these letters, indicative of a far deeper knowledge of print than previously observed. We speculate that this familiarity could facilitate reading acquisition and the pace of reading development: children with a large or early advantage for upright letters could become fluent readers. It would also be interesting to characterize the rate at which letter shape familiarity develops and when it gives rise to a visual processing advantage. These are interesting questions for future research.

We have found that the search advantage for upright letters is not associated with children’s RAN (rapid automated naming) scores, a measure that is a robust predictor of future reading (de Jong & van der Leij, 1999; Denckla & Rudel, 1974, 1976; Landerl et al., 2019; Lervåg & Hulme, 2009; McWeeny et al., 2022; Norton & Wolf, 2012; Schatschneider et al., 2004; Sideridis et al., 2016). The lack of association between an established precursor skill (RAN) and our novel letter shape familiarity measure, at first glance, appears to contradict our speculation that the search advantage might jumpstart reading skills. However, it is possible that the search advantage captures unique variance in reading fluency that is not predicted by RAN scores. Indeed, we have previously reported that upright bigram search predicts a unique component of reading fluency beyond that predicted by RAN scores (Agrawal et al., 2022).

It is also important to note that the components of emergent literacy evolve over time. For example, it begins with basic book concept (Chaney, 1992), and general familiarity with written scripts (Levy et al., 2006; Smith & Dixon, 1995). Next, as shown in this study, letter shape familiarity, uniquely associated with the visual processing advantage of single letters, is quite variable in kindergarten children but stabilizes by Grade 1. In a study on older children (Grades 3-5), we found that upright bigram searches, explained reading fluency better than single letter searches (Agrawal et al., 2022). This highlights a gap in understanding the transition from letter-level to bigram-level processing during Grades 1–2. We speculate that reading fluency is initially driven by a visual processing advantage for single letters, and later by bigrams and longer strings, a developmental shift that can be captured during Grades 1–2.

Visual processing advantage for familiar objects

Our results are also consistent with previous reports that readers are faster on searches involving upright letters/bigrams (Agrawal et al., 2020, 2022), and more broadly with the faster search observed for familiar targets (Agrawal et al., 2019; Reicher, 1976; Richards & Reicher, 1978; Wang et al., 1994; Wolfe, 2001). However, these observations were made on adult readers who have extensive training with letters in visual-auditory associations (through reading) and visual-motor routines (through writing). Therefore, the search advantage for upright letters might have come through visual familiarity or through extensive explicit training. By studying children who are just learning to read, we were able to dissociate these two possibilities. Our results show that the visual processing advantage for upright letters is strongly and most consistently associated with letter familiarity rather than with formal letter instruction.

Understanding emergent literacy

There has been a long tradition of considering concepts about print as a key aspect of Emergent Literacy (e.g., Clay, 2015). Concepts about print typically cover constructs such as knowledge of the direction of print, an awareness of the right side up of a book, where the cover page is, what a word in print might look like, and recognizing that sentences end with punctuation markers like a period or question mark. Knowledge of letter name and sound is a second much-studied component. Other component skills of emergent literacy are phonological awareness, vocabulary, morphological awareness and orthographic knowledge (Lin et al., 2019; Nag et al., 2014). Our study provides evidence for an additional, previously underexplored component of emergent literacy: letter shape familiarity, reflecting visual familiarity with letter forms even without explicit recognition, and giving rise to a visual processing advantage during letter search.

Role of additional factors in reading acquisition

A range of contextual and within-child factors have been shown to explain the individual differences in component skills of emergent literacy. Environmental factors such as, books at home, parental involvement, directed reading, and exposure to reading-related activities - all predict future reading skills in children (Connor et al., 2006; Dilnot et al., 2017; Dong et al., 2020; Georgiou et al., 2021; Hortaçsu et al., 1990; Levy et al., 2006; Nag et al., 2024, 2024; Sénéchal et al., 2001; Sénéchal & LeFevre, 2002; Yeo et al., 2014). The home literacy environment plays a very critical role in reading development particularly in high and middle income countries (Nag et al., 2019; Vagh & Nag, 2025). Within-child factors, such as children’s literacy interest, finer motor skills, sentence memory and oral language skill also predict early reading development (Cabell et al., 2011; Carroll et al., 2019; Nag et al., 2014; Share et al., 1984). In addition, multiple emergent literacy skills have been shown to be unique predictors of later language and literacy development, explaining individual differences at the start point and in pace of growth (Duncan et al., 2007; Hamilton et al., 2016; Lin et al., 2019). We propose all of these are important areas for future research to understand what supports letter shape familiarity learning and how this might support later literacy outcomes.

Limitations and Future Directions

While our findings reveal a robust link between letter shape familiarity and visual processing, several limitations remain. We did not directly measure children’s exposure to print at home or in early childhood environments. Future research could examine how much and what kinds of informal print exposure are sufficient to develop letter shape familiarity in the absence of formal instruction. Children’s heterogeneous linguistic backgrounds may also influence their early print experience, for example, some Indian scripts (like Telugu) share visual similarities with Kannada, but we did not assess this in the current study. Lastly, we did not test whether this familiarity predicts future literacy outcomes. Longitudinal studies are needed to determine whether familiarity-based advantages persist and contribute to fluent reading.

Conclusions

Our results add to the growing evidence that reading must be understood as a process involving at its core, both visual as well as language processing (Casco & Prunetti, 1996; Dehaene et al., 2015; Goswami, 2015; Vidyasagar & Pammer, 1999). Reading difficulties are also increasingly being understood as arising from language processing deficits and/or visual processing deficits (Bertoni et al., 2019; Goswami, 2015; Vidyasagar & Pammer, 1999). Our results raise the intriguing possibility that there could be early changes in visual processing that precede the acquisition of reading skills. We therefore propose that both visual and language processing measures are essential to better characterize and more comprehensively understand what contributes to reading acquisition.

Methods

Participants

All children gave written assent before the start of the first session and verbal assent before each experiment, and their parents/guardians gave informed assent/consent to an experimental protocol and learning levels survey approved by the <information redacted for double blind review>.

Participating children were met over 12 twenty-minute sessions over on average four weeks as part of a larger learning levels survey. Experiment 1-3 were conducted in sessions 9-10 and Experiment 4 in either sessions 3 or 4. Experiments were presented as games, with Experiments 1 to 3 using the kapperaaya (King Frog) mascot. A picture card was used with each child to tick off together the completion of each game. All participants had normal or corrected-to-normal vision, and were multilingual, with home and community languages that included Kannada, Tulu, Konkani, Tamil and Telugu. Research associates with post graduate degrees in speech and language or early childhood education conducted the experiments in each school.

We identified 14 schools at 7 different locations spread across 3 districts (Bengaluru Rural, Mysore and Udupi) of Karnataka, India for the study. Of the 14 schools, 8 schools had Kannada as the medium of instruction. In the remaining 6 schools, the medium of instruction was English, but Kannada was taught as a second language.

A total of 300 children (156 female; aged 4.3-7.6 years) participated in this study. These children included 32 from lower kindergarten or LKG (4.7-6.5 years; 12 female), 143 from upper kindergarten or UKG (4.3-6.8 years; 79 female), and 125 children in Grade 1 (5.2-7.6 years; 65 female).

Procedure

Experiments 1-3 were designed using PsychoPy and exported into the online platform Pavlovia for ease of administration and seamless data transfer. Experiments were conducted on laptops placed on a desk with the child seated comfortably on a chair roughly 45 cm away from the screen. To standardize the size at which images were displayed on the screen across monitors, the experimenter adjusted the size of a square displayed on the screen to measure 8 cm (code from https://gitlab.pavlovia.org/Wake/screenscale). The pixel size of the square was then internally used to scale all task images to appear at a fixed physical size regardless of the screen used for the experiment.

Each child performed Experiment 3 (akshara search) first in which they saw 6 unique letters equally often in upright and inverted orientations. After this they performed Experiment 1 (akshara familiarity) where they saw 18 unique letters equally often in upright and inverted orientations. Finally, they participated in Experiment 2 (akshara recognition) in which they saw the same 18 unique letters only in the upright orientation. Experiment 2 was done last since it contained only upright letters. Experiment 4 was conducted following experiment 3, either prior to both Experiments 1 and 2, or exclusively prior to Experiment 2.

Experiment 1. Letter familiarity task

Participants. A total of 285 of the 300 children participated in the task. Not all children participated due to variation in school attendance.

Stimuli: We selected 18 Kannada akshara or letters with a range of letter frequency, and from those taught at both early and late stages in the school curriculum. These were the akshara ದ, ಬ, ಜ, ವ, ನ, ಣ, ಅ, ಆ, ಇ, ಈ, ಖ, ಗ, ರ, ಮ, ೪, , , . These items were displayed in Nirmala UI font (chosen for its uniform stroke width), presented in white against a black background. Letters were scaled such that the main body of each akshara had the same height, with ascending or descending diacritic marks protruding as required. The overall height of the akshara varied from 2.8°-5.1° of visual angle, and width from 2.5°-5.1° visual angle.

Task. The experiment began with a practice block of 10 trials, to acquaint the child with the idea of familiarity. On each trial, two objects were shown on screen separated by a brick wall. One of the objects was always a common object such as a steel bowl, and the other was always an uncommon object such as a microscope. The child was asked to point to whichever object they had seen more often. The experimenter would then press a key (Left-Control or Right-Control) depending on the object pointed by the child. If the child took longer than 8 seconds to point, the trial was aborted and repeated after a random number of trials up to a maximum of 3 such attempts, after which the response was recorded as incorrect. Trials with incorrect responses were not repeated. To make the task more engaging for the child, at the trial start, a frog cartoon character (kapperaayaa, meaning Frog King) was introduced sitting at the bottom of the screen with the choice objects. Correct response were followed by a pleasant tone, while the frog moved across the screen to settle above the chosen object, followed by a short inter-trial interval. Incorrect responses were followed by a flat tone with the frog turning upside down, after which the next trial started after a slightly longer interval.

In the main task, each trial consisted of a single Kannada letter presented as upright and inverted, and the child was asked to point at the shape they had seen more often. Children got positive feedback regardless of their choice. There were 18 unique conditions corresponding to 18 letters, each of which was repeated 4 times (with upright/inverted versions appearing equally often on the left or right), resulting in a total of 72 trials. We gave a short break after 36 trials. Children completed the task within 4-9 minutes (duration, mean ± sd: 6.3 ± 1.2 mins).

Experiment 2: Letter recognition task

Participants. A total of 285 of 300 children participated in this task.

Stimuli. We used the same 18 Kannada letters or akshara as in Experiment 1 and all letters were shown only in the upright orientation.

Task. On each trial, a single akshara was displayed on the screen, and the child was asked to name it. The experimenter then pressed one of 4 keys: ‘c’ for correct identification, ‘e’ for incorrect identification, ‘d’ if the child said he/she did not know the akshara, and ‘n’ if the child gave no response. There was no time out and the next trial started after 500 ms of the response. In case of no response, the experimenter could decide when to proceed to the next trial based on the state of the child (wait time, mean ± sd: 13.7 ± 8.1 s).

The trial started with 3 practice trials in which 3 other Kannada letters were shown (, , ). After the child was made comfortable with the task routine, the main task began. There were 18 trials in which each Kannada letter was shown once. Children completed this task fairly quickly (mean ± sd: 2.0 ± 0.9 mins).

Experiment 3: Letter search task

Participants. A total of 297 children participated in this task.

Stimuli. A total of 6 Kannada letters were used in this experiment (first 6 of the 18 in Experiment 1), in upright and inverted orientations.

Task. Children were introduced to an oddball search game. Each trial began with a fixation cross (0.13°) displayed for 500 ms to orient the participants to the center of the screen, after which two boxes (13.9° × 8.9° each) were shown side by side. Each box contained 4 images arranged in a 2 × 2 grid. To avoid alignment cues from guiding search, each image was jittered randomly by ±0.5° according to a uniform distribution. In one box, all images were identical whereas the other box contained 3 identical items and one oddball item. Children had to identify whether the left or right box had the “different” item and press the “red” key for left or “green” key for right (corresponding to the Left-Control and Right-Control keys which we colored accordingly). In both the practice and main blocks, children received positive feedback (a pleasant tone and the frog appeared above the target image) or negative feedback (a flat tone, a red box appeared around the target image, and an animation showing the frog turning upside down). If no response was made in 8 seconds, the trial was aborted after a random number of other trials for up to 3 times, after which it was considered an incorrect response. Incorrectly responded trials did not repeat.

The task started with a practice block using four colored butterfly images (1.9° × 1.9° each), resulting in 6 possible image pairs (4C2). For each pair, two search tasks were created, one with the first image as the target and the second as the distractor, and vice versa, leading to 12 practice searches. During each trial, children saw two boxes on the left and right sides of the screen, each containing 4 images (in a 2 × 2 grid). One of these 8 images was the target image, while the others were copies of the distractor. The target image appeared randomly in either the left or right box, with equal chance of being in either position. Children had 8 seconds to press the left or right control key, indicating the target’s location.

In the main block, we presented all possible pairs of letters. Since there were 6 upright letters chosen, this yielded 6C2 = 15 possible letter pairs. We created two searches for each pair of letters, with either letter as target, and with the target chosen randomly to be present in either the left or right box. This resulted in 30 search trials, with the left or right box containing a target equally often. Likewise, there were 30 search trials using inverted versions of these letters. Children completed the entire experiment in 5-9 mins (mean ± sd: 7.2 ± 1.6 mins).

Experiment 4 Rapid Automatized Naming (RAN)

Participants. All 300 children were recruited for this task. However, only 241 children could perform the task. The remaining 59 children failed to recognize any digits, and we therefore did not proceed with the task.

Task. We used the Arabic digits 1, 2, 3, and 4 for the task because this set is formally taught across all participating schools. On a printed card, 40 digits (10 repeats of the 4 unique digits) were arranged randomly in a 5 × 8 grid (Figure 1D). Children were instructed to read the digit aloud from left to right along each row, as quickly as they could, and encouraged to use their finger to track their reading down each row. A practice trial preceded the main trial. We measured the time of reading and marked how many digits were read correctly. We observed that even children with zero letter recognition accuracy could perform the RAN task, probably because they had more exposure to basic digits than letters. The RAN score was calculated as the number of correctly read digits divided by the total time a child took to read (digit/second).

Data availability

All data and code required to reproduce the results are publicly available at https://osf.io/v79km/

Acknowledgements

We thank Sanjana Nagendra, Deekshitha Kotian, Pooja Pandith, Rinkle Crasta, Adhvika Shetty and Kala B. for help with data collection and data preparation, and India co-investigator, Prof. Gideon Arulmani. We thank all participating schools and teachers for their support, and the wonderful children for their enthusiastic participation.

This research was supported by the DBT/Wellcome Trust India Alliance Senior Fellowship awarded to SPA (Grant# IA/S/17/1/503081), a UKRI Collective Fund award (to SN) and project partners (including UMN) of the UKRI GCRF Supporting Oral Language Development Project (ES/T004118/1), and through UGC Senior Research Fellowship (to JD).

Additional files

Supplementary figures