The effect of multilingual pedagogies on language aptitude: A study on lower secondary school students based on the LLAMA test

EN The paper presents an explorative study regarding the relation between multilingual pedagogies and language aptitude, a complex construct associated with metalinguistic awareness and cognitive processes, which predict the capacity of learning a foreign language. We administered the LLAMA test to 67 lower secondary school students in Italy, divided into two groups: the target group had been regularly exposed to éveil aux langues (awakening to languages) and translanguaging practices, with explicit focuses on crosslinguistic reflection, for the prior four years; the control group has been exposed to regular monolingual instruction. Results show that the target group achieved a better score in two LLAMA sub-tests (sound-symbol correspondence task and grammatical inferencing task). A composite pattern of correlations and predictors regarding the influence of Italian competence and English competence are also analysed and disentangled. Finally, the paper discusses the implications of these results in terms of research and educational perspectives


Introduction and theoretical framework: Language aptitude and multilingual pedagogies
This paper introduces a preliminary study regarding the effect of multilingual pedagogies on language aptitude. The primary aim of this research is to provide quantitative evidence on the linguistic impact of multilingual pedagogies (Garcıá & Flores, 2012;Marshall, 2022), which can be defined as instructional practices leveraging students' entire language repertoires, even in mainstream monolingual schools. Éveil aux langues ("awakening to languages," Armand et al., 2004;Candelier, 2003;Hélot, 2012), identity text approach (Cummins & Early, 2010), and translanguaging pedagogy (CUNY-NYSIEB, 2020), although conceptually and sometimes also practically different, can be associated to multilingual pedagogies. The benefits of these multilingual approaches have already been validated in terms of positive outcomes in students' empowerment and attitude, and teachers' beliefs and practices (Cummins, 2019;García & Sánchez, 2022;Lory & Armand, 2016). This study applies an experimental quantitative approach in order to investigate whether lower secondary school students engaged in multilingual learning strategies involving their entire language repertoire develop a more accurate language aptitude. Carroll and Sapon (2002, p. 23) described language aptitude as a set of cognitive abilities that are "predictive of how well, relative to other individuals, an individual can learn a foreign language in a given amount of time and under given conditions." According to Carroll (1981), language aptitude is based on four sub-skills: phonetic coding ability (the ability to recognize sounds and learn sound-symbol associations), grammatical sensitivity (the ability to identify syntactic functions), rote learning ability (the ability to recall newly learned words), and inductive language learning ability (the ability to infer rules). These components have been integrated in the MLAT, the Modern Language Aptitude Test, which was initially designed by Carroll and Sapon in 1959. Even if the spread of communicative approaches to language education has raised criticism towards the appropriateness of MLAT, the test has been extensively used in several studies and it continues to shed light on language aptitude and its interaction with different variables (Stansfield & Reed, 2004).
In a recent article, Hyltenstam (2021) provides a review of the research regarding the relations between language aptitude and metalinguistic awareness. The two constructs seem to partially overlap in some components, in particular in language analytic ability, and have a bidirectional influence. Jessner (2006) underlines that "in an increasing number of investigations in multilingualism research, though, the language learning ability or aptitude of bilinguals learning an L3 has been compared with monolinguals learning an L2. And the cognitive advantages which have been shown to develop in multilinguals have been related to an enhanced level of metalinguistic awareness" (p. 65). Thus, conditions like bilingualism or trilingualism can influence the patterns of association between language aptitude and metalinguistic awareness in different ways (El Euch, 2018). While the nature of the relation between language aptitude and metalinguistic awareness is still debatable, other cognitive processes have been clearly associated with language aptitude (Skehan, 2019), in particular individual factors. Working memory, for instance, and its subcomponents, such as the verbal storage system (or "phonological loop") and central executive processes, play an important role in learning new words and retaining verbal information: the phonological loop, in particular, is considered a significant predictor of second language learning (Baddeley, 2003;Yalçın, Çeçen, & Erçetin, 2016).
Language aptitude has been traditionally conceptualized as a stable skill that is neither susceptible to training nor environmentally influenced, but rather incorporated in genetic cognitive abilities (Skehan, 1998). However, the stability assumption represents a contested issue: in the last twenty-years, in fact, several studies have investigated the relation between aptitude and second language acquisition across a variety of instructional settings. Sáfár and Kormos (2008) detected a major effect of intensive language learning in the sub-test measuring phonological sensitivity, which also resulted in a significant increase in the overall language aptitude test score. Li (2015) identifies two broad lines within aptitude research: predictive and interactional. The latter is based on experimental studies which compared different educational treatment. For example, it has been demonstrated that the effectiveness of explicit instruction is more related to aptitude than that of implicit instruction. Singleton (2017), in his review regarding the mutability of language aptitude, reports a series of studies which show the impact of experience and training on language aptitude, but he underlines that some pedagogical interventions might improve language aptitude via the improvement of working memory capacity. He concludes that we need more research to clearly understand the construct of language aptitude and its relations with other factors.
Very few studies have analysed language aptitude in multilingual pedagogies contexts. Candelier (2003Candelier ( , 2017, within the Evlang project based on the éveil aux langues approach, administered a language aptitude test based on discrimination and memorization by listening and on syntax skills. Primary school students were exposed to language awareness activities in languages not officially taught by the school and generally unfamiliar to the children for at least 35 hours. The impact of éveil aux langues activities was confirmed in a large majority of the sample (around 2000 students) for listening skills. The tasks consisted of: 1) listening to a target word, then a series of three words/sentences which could or could not contain the target word; 2) identifying speakers of the same unfamiliar language in oral conversations. However, the deconstruction-reconstruction task concerning syntax (i.e., after observing three utterances written in an unknown language whose translation is provided in the school language, the student had to deduce the syntactic form of a fourth statement) obtained lower results. The study also demonstrated that the number of teaching hours and the intensity of the application of the éveil aux langues approach contribute to a more positive evident effect on language aptitude. Other studies investigated multilingual pedagogies, and especially translanguaging (Garcıá, Johnson, & Seltzer, 2017) in relation with metalinguistic awareness: some of them, assuming a qualitative perspective, underlined that students attending programs in which emergent bilingual students have been allowed to employ their entire language repertoires in cognitively engaging school activities (like reading, analysing, elaborating, and producing content-related texts) enhance interlinguistic comparisons and metalinguistic understandings of different writing systems (Velasco & Fialais, 2018). Other studies, following a more quantitative orientation, showed that both majority and minority students exposed to multilingual instruction leveraging on the students' entire linguistic repertoires have positive outcomes on the development of the ability to reflect on language functions and forms (Hopp, Kieseier, Jakisch, Sturm, & Thoma, 2021). However, most studies related with multilingual pedagogies and translanguaging have focused on beneficial effects in terms of beliefs, attitude, and social inclusion (Carbonara & Scibetta, 2020;Duarte, 2020;Juvonen & Källkvist, 2021). Few studies until now have shown that a multiliteracy approach to education in superdiverse contexts can also lead to positive scholastic achievements. Little and Kirwan (2019), in the course of a long-term project based on multilingual pedagogies in a primary school in Dublin described that, despite the large immigrant-background school population coming from primarily lower socioeconomic backgrounds, the school's standardized test scores in English and mathematics have consistently been at or above the national average. The need to provide for more quantitative results in order to legitimize multilingual pedagogies within monoglossic instruction is slowly spreading, leading to new fields of investigation like literacy and metacognition (Carbonara, Scibetta, & Torregrossa, in press).
In this paper we decided to focus on language aptitude because it allows linguistic reflection to be investigated while avoiding the separate testing of languages by using monolingual standard tests for individual languages. The administration of a language-neutral aptitude test will target the core components of students' ability to reflect on formal aspects of linguistic systems, regardless of their plural or monolingual repertoires, and this could also contribute to obtaining some valuable insight into which components of language aptitude are more sensitive to multilingual instruction.

The study: methods
The aim of the study is to conduct a preliminary investigation to determine whether bilingual and monolingual lower secondary school students who have been continuously exposed to instructional practices based on multilingual pedagogies developed a better language aptitude compared to students attending the regular monolingual program. The context of this research is a lower secondary school in Italy in which several classes implemented translanguaging pedagogy (see Section 2.1). We investigated language aptitude employing the LLAMA test (Meara, 2005; see Section 2.2). The participants (see Section 2.3) are 67 lower secondary school students, divided into two groups according to their exposition to multilingual or monolingual practices in class. Our research question is twofold: 1) Have the students who were exposed to multilingual pedagogies developed a better general language aptitude? If so, in which LLAMA sub-test did they perform better than students attending traditional monolingual instructional settings? 2) Are there other conditions across the two groups that could have an impact on language aptitude (Italian language competence, English language competence, SES)?

The context: the school and the multilingual pedagogies implemented
The study was conducted at a lower secondary school in the town of Serravalle Scrivia, in the province of Alessandria, which is in the Piedmont region. Alessandria has the second highest percentage of students from an immigrant background (17.5%) in the Piedmont region, which is slightly above the national rate (10.3%, according to the Italian Ministry of Education, 2021). Most of the minority immigrant students in Alessandria were born in Italy (69.4%), a condition which can be associated with better competence in the Italian language but also with the potential endangerment of the home language. In Serravalle Scrivia, 21.4% of the residents are foreigners, mostly coming from Morocco, Romania, Albania, Nigeria, Sri Lanka, Ecuador, and India 1 . This demographic situation is reflected in the school population: in the only school in the town, "Martiri della Benedicta," more than 40% of the students come from an immigrant background and are distributed differently across the educational levels and school complexes. A particularly high percentage is attending its kindergarten classes.
Since 2016, the school has belonged to the network of the "L'AltRoparlante" project carried out by the University for Foreigners of Siena (Carbonara & Scibetta, 2020) and ongoing in six schools in different Italian regions. The project promotes learning strategies that are integrated into regular curricular activities based on multilingual pedagogies addressing both immigrant and native Italian students. The "L'AltRoparlante" project combines the pluralistic approaches of the Council of Europe, and more specifically the "éveil aux langues" (Candelier, 2012), with translanguaging pedagogy (García et al., 2017). Several activities have been conducted within the project: annual teacher training; monthly meetings with teachers and educators to design and schedule multilingual activities; meetings with families about bilingualism and multilingual pedagogies; ethnographic and sociolinguistic fieldwork to investigate students' language repertoires, uses, and attitudes; schoolscape reconstruction in order to make home languages, including dialects, more visible in symbolic, functional, and educational terms (Carbonara, 2021). Students' multilingual repertoires are engaged in a variety of activities, according to their age: in lower secondary schools, which was the educational level analysed in this paper, students elaborate subjects' contents through both Italian and their home languages, targeting both lower and higher-order skills like recalling, defining, summarising, classifying, comparing, inferring, making judgments, planning, and producing (Anderson et al., 2001). Students are required to create multilingual products, which are usually multimodal combinations of texts, visual elements, audio-video materials, oral presentations, etc. (Cummins & Early, 2011). Metalinguistic reflection is constantly stimulated drawing from FREPA/CARAP descriptors (Candelier, 2012), which underline the importance of the observation and the analysis of formal characteristics of languages. Students are regularly engaged in explicit reflections regarding different properties of the languages in class and in activities based on making comparisons, identifying regularities and patterns, as well as thinking inductively and deductively in order to infer rules. These skills are usually specifically targeted by aptitude tests, including the LLAMA test, which was adopted for this study.
Several studies throughout the "L'AltRoparlante" project have shown the positive effects of multilingual pedagogies in terms of students' empowerment and motivation (Carbonara & Scibetta, 2020), and attitudes towards multilingualism and intercultural encounters (Bellinzona & Carbonara, 2021). More recently, the first investigation related with linguistic and cognitive aspects provided evidence of the advantages of this approach in narrative skills (Carbonara et al., in press).
Most of the classes of the "Martiri della Benedicta" school of the Serravalle Scrivia school complex, ranging from kindergarten to lower secondary school, have been gradually included in the "L'AltRoparlante" project, while the classes belonging to other two school complexes located in other towns but still formally belonging to the same school, are not participating in the project.

Materials and procedure: the LLAMA test
The study belongs to a more extensive investigation, which included several measures: -a language aptitude test, specifically the LLAMA test (Meara, 2005); -an Italian metalinguistic test, specifically the TAM-2 test (Pinto, Candilera, & Iliceto, 2003); -a vocabulary test, specifically the Italian version of the Primary Mental Abilities test (PMA) (Rubini & Rossi, 1982); -a sociolinguistic questionnaire (Carbonara et al., in press).
All the tests were administered to students and families in April-May 2021. In this preliminary study we will only focus on the data collected with the first instrument, the LLAMA test, and address the individual data provided by the National Institute for the Evaluation of the Education System (INVALSI) for the linguistic and socioeconomic background as well as for each students' competences and grades.
LLAMA is a computer-based test developed by the University of Wales Swansea, articulated in 4 subtests: vocabulary learning (Test B), sound recognition (Test D), sound-symbol correspondence (Test E) and grammatical inferencing (Test F). LLAMA is based on the MLAT test (see Section 1), but over the years the design of the LLAMA test has significantly diverged from MLAT, both in terms of user interface and in the structure of the tests, including the design of a test which was not present in MLAT (Test B) (Rogers et al., 2017).
We were not able to find any publication related with the administration of LLAMA test in Italy before this research. We selected LLAMA test as an effective instrument to investigate language aptitude in a multilingual context because this test is based on an unfamiliar language stemming from Central American languages (Meara, 2005), so neither Italian native students nor bilingual students could have an advantage due to their linguistic repertoires. The test has been examined for validity in different studies: Rogers and colleagues (2017) demonstrated that the LLAMA test is language-neutral, which means that the L1 of the testtakers does not have any effect on test performance. A more recent investigation (Mikawa & DeJong, 2021) claimed that agglutinative language speakers, and Japanese test-takers in particular, are at an advantage in Test D and Test F; however, the present study does not include participants speaking any agglutinative languages. Even if the bilingual participants could achieve better results, there should not be statistically significant differences between the bilingual and monolingual students, and, finally, the test is not influenced by gender, formal education or logic training (Rogers at al., 2016). Bokander and Bylund (2020), however, identified some test-internal deficiencies: Test D items, for instance, can be perceived by test takers as very difficult, leading to guessing. The authors argued that the LLAMA test would benefit from a process of revision to improve its internal validity, which Meara and colleagues actually started to conduct a couple of months after the present study.
In the following paragraphs I will briefly describe the different LLAMA sub-tests. In Test B (Figure 1), there is an initial learning phase, in which the participant has a short time to examine 20 pictures representing different abstract objects and learn their names, which are displayed by clicking on the different pictures. At the end of the learning phase, the computer shows the name of each object one by one, and the participant has to click on the corresponding picture.
In Test D (Figure 2), the test-taker has to listen to a series of short words in an unfamiliar language during the learning phase. In the test phase the participant has to listen to another series of short words and indicate, for each of them, whether the word was already included in the first series or not. Test E (Figure 3) presents 24 syllables, which are transliterated into an unfamiliar alphabet. During the learning phase, the participant can click on the different syllables and listen to the corresponding sound. It is also possible to take written notes. Later, the program plays a series of two-syllables words and displays two possible spellings. The test-taker has to click on the correct one. Finally, in Test F ( Figure 4) the participant firstly uses the time available to click on different buttons showing 20 abstract pictures of different geometric figures combinations, which are matched with corresponding short sentences that describe the picture. The test-taker can take notes and reflect on how the pictures and sentences are related in terms of syntactic and morphological aspects, like word order, singular and plural forms, prepositions, etc. During the testing phase, the computer shows a new set of pictures, with a combination of the previous elements, and two possible sentences describing the scene: the correct one has to be indicated by clicking on the right sentence. For each test the computer offers a result based on a percentage of accuracy (Table 1). For Tests D, E, and F, every mistake is penalised. In order to conduct statistical tests and to analyse the most frequent mistakes, we also adopted a scoring system attributing 1 point to each correct answer in this study. The maximum scores achievable by the participants for each test are 20 for Test B, 30 for Test D, 20 for Test E, and 20 for Test F.  (Rogers et al., 2016) showed that younger participants (10-11 years) performed significantly worse in Test E compared to adult participants and could obtain generally lower results in the other sub-tests. Moreover, Rogers and colleagues (2016) demonstrated that allowing one additional minute in Test E leads to better results, while, on the contrary, one additional minute in Test F produces lower scores. Considering these conditions and our participants, which were between 11 and 12 years old, we decided to allow 3 minutes for the learning phase in Test E and to keep the suggested time span of 5 minutes for Test F. Regarding Test B, Rogers and colleagues (2016) showed that potential additional time does not have any impact on performance, but a younger age can influence the results, so also in this case we decided to allow 3 minutes instead of 2 in the learning phase.
Our hypothesis is that students engaged in multilingual pedagogical activities should display a higher language aptitude compared to their peers who did not have this opportunity, as a result of the language awareness and metalinguistic activities they are used to deal with. We are also interested in investigating whether other variables, like SES and students' competence in Italian and English, which are obtained through the INVALSI scores, might influence the LLAMA test results.

Participants
Sixty-seven students (41 females) from five different classes attending the last year of lower secondary school (M age: 13.7) participated in the study. All students only attended lower secondary school at the "Martiri della Benedicta" school, and most of them had attended the same institute for their primary studies. The target group (TG) (N=32, 12 of them with an immigrant background) had been continuously exposed to multilingual pedagogies within the "L'AltRoparlante" project from the last year of primary school (5 th grade) until the end of lower-middle school, and therefore continuously for 4 years, also through distance learning periods due to the Covid emergency. The students in the TG attend two classes in the school complex of Serravalle Scrivia. The control group (CG) (N=35, six of them from an immigrant background) had been exposed to the traditional monoglossic approach mainly involving the use of Italian only. The students in the CG attended two classes of the two school complexes in the other towns that were not included in the "L'AltRoparlante" project. Both groups were taught three weekly hours of English language and three weekly hours of French language during their lower secondary studies. It should be noticed that the TG initially included two other students from an immigrant background who were absent at the time of data collection, while the CG initially included three other students with an immigrant background who were excluded from the research by the teachers because they reported special educational needs.
The languages spoken by the students from an immigrant background across the two groups including Romanian, Arabic (Moroccan dialect), Spanish, Albanian, Polish, Punjabi, and Ukrainian. We used the data provided by the National Institute for the Evaluation of the Education System (INVALSI) to investigate potential differences between the two groups in terms of socio-economic status (SES) and competence in Italian language and in English reading and listening. We also considered and compared their average school grades (GPA) on a scale from 1 to 10 at the end of the school year. In a scale from low-SES (1) to high-SES (4), the TG has a mean SES of 2.34 (SD=1.03) and the CG has a mean SES of 2.60 (SD=1.21), however by conducting the non-parametric Kruskal-Wallis tests we can observe that this difference is not statistically significant (χ 2 (1) = .872, p = .350). Table 2 shows the descriptive statistics for the INVALSI test results and GPA across the two groups. Even if the mean scores across the four measures are slightly different, there are no statistically significant differences between the two groups in terms of Italian competence (t(65) = -1.22, p = .227), English reading skills (t(65) = -.328, p = .744), English listening skills (t(65) = -.133, p = .894), and GPA (χ 2 (1) = .002, p = .969). We can conclude that the two groups are comparable in terms of SES and Italian and English competence. Consent from the students' parents was collected at the beginning of the study, including privacy authorization forms for minors regarding the recording and reproduction of audio-visual materials. Table 3 shows the descriptive statistics regarding the scores students obtained in the four LLAMA tests across the two groups. We also computed the results in order to obtain the total LLAMA score. The table includes the average percentages of correct answers according to the interpretation of the LLAMA test manual. For Test B (vocabulary learning task) and for Test D (sound recognition task), we can observe that the mean scores of the TG group, exposed to multilingual pedagogies, and the CG, attending a regular monolingual program are very similar. According to the LLAMA interpretation scale, on both tests both groups obtained a good score. For Test E (sound-symbol correspondence task) and Test F (grammatical inferencing task) we notice that the differences between the two groups in scores are more evident: the TG performed better in both tests, even if the two groups fall under the same categories based on the percentages of correct answers of the LLAMA interpretation scale ("Outstandingly good score" for Test E and "Average score" for Test F). The overall score achieved by the TG is slightly higher compared to the CG.

Results
We conducted a series of t-tests in order to understand if there are significant differences in LLAMA tests between the two groups. Firstly, analysing the total LLAMA score, we did not find a statistically significant difference between the two groups (t(65)= 1.08, p = .280). Investigating the single tests, we did not find any statistically significant difference between the TG and the CG also for Test B (t(65) = -.369, p = .713) and for Test D (t(65) = -.079, p = .937), thus we can affirm that students performed equally in the vocabulary learning task and in the sound recognition task. However, the two groups scored significantly differently in Test F (t(65) = 2.19, p = .032, Cohen's d = 0.54). Since for Test E the normality assumption was not met (the Shapiro-Wilk Test significance value was greater than 0.05), we conducted a non-parametric test, the independent-samples Mann-Whitney U Test, which resulted to be slightly significant (U = 404, p = .046). Thus, students who conducted learning activities engaging their multilingual repertoires over a long period of time performed significantly better in the sound-symbol correspondence task and in the grammatical inferencing task compared to students who were exposed to mainly monolingual instruction.
Even if the LLAMA test should not be sensitive to the monolingual or bilingual backgrounds of the testtakers, we decide to examine Italian students and students with an immigrant background in terms of their performance in the two LLAMA sub-tests in which we found the TG to be at an advantage. Table 4 displays the descriptive statistics related to Italian students and students with an immigrant background across the two groups in Test E and in Test F. Considering the small number of students in these sub-groups, we decided not to perform any inferential tests. However, we can still notice that, from a mere numeric perspective, the Italian students in the CG obtained a lower score compared to their peers from an Italian background in both Test E and Test F.
We conducted nonparametric Spearman correlation to investigate if SES has an impact on the LLAMA test score and sub-tests scores in the two groups. We used a nonparametric correlation because SES is not normally distributed across both groups. However, we did not find any significant correlation between SES and LLAMA test results for neither the TG nor the CG.
We also computed a series of parametric Person's and nonparametric Spearman correlations within the two groups in order to highlight possible patterns of relations between students' competences in Italian, English Reading and English Listening from the INVALSI results, and LLAMA test scores (Table 5). In the Appendix, we also presented a table (Table S1) including inter-correlations between the LLAMA tests and intercorrelations between the INVALSI tests across the two groups. .572** .000 * Correlation is significant at the 0.05 level. ** Correlation is significant at the 0.01 level.
Results show that for both the TG and the CG the scores obtained in the INVALSI tests in Italian language are correlated with the LLAMA F score and with the Total LLAMA score, and the coefficients indicate slightly higher correlations for the CG. Moreover, for the students of the CG a high score in the INVALSI tests in Italian language is also associated with a better score in TEST E.
Regarding the INVALSI test in English (Reading), for the CG there are significant correlations with each LLAMA sub-test and the Total LLAMA score, while for the TG there is a significant correlation only with the Total LLAMA score and it is quite weaker (r=.397) than the resulting coefficient for the CG (r=.638). Also, for the INVALSI test in English (Listening), for the CG there a significant correlation with every LLAMA sub-test except LLAMA B and with the Total LLAMA score, while for the TG there is a significant correlation with TEST F and with the total score with lower correlation coefficients.
In order to understand these different patterns of relations between the INVALSI test results and the LLAMA test results within the two groups, we conducted a series of multiple linear regressions employing SPSS 26. Assumption of normal distribution of data residuals was assessed observing the P-P plot, while the assumption of homoscedasticity was assessed with scatterplots. Assumption of multicollinearity among predictors was assessed by means of the variance inflation factor (VIF) < 10. Assumption of independence of residuals was assessed by means of the Durbin-Watson test (between 1 and 3) (Field, 2009

Discussion and conclusion
The study investigated language aptitude in lower secondary school students, attending a public school in Italy. The target group (TG) was exposed to a combination of éveil aux langues activities and translanguaging pedagogy implemented at a curricular level for four years, while the control group (CG) attended regular education, which is mainly rooted on a monolingual approach, except for the curricular English and French classes. The research included a larger set of instruments and data which are still being analysed, thus this study represents a preliminary step into the entire investigation. For instance, we still lack more detailed information regarding immigrant students' linguistic background, which can offer a clearer interpretation of their degrees of bilingualism in relation to their language aptitude. Relying on the data offered by the National Institute for the Evaluation of the Education System (INVALSI) we identified students' origin, and we retrieved information regarding SES, Italian language competence, English language competence, and general GPA. We could not find any differences between the two groups in relation to these measures; thus, we can assume that the students are comparable.
In order to analyse students' language aptitude, we adopted the LLAMA test (Meara, 2005). Our study attempted to contribute to the field of language learning aptitude by supporting the view that aptitude may change as a result of training (Singleton, 2017). To the best of our knowledge, this is the first study challenging the stability assumption of language aptitude in a context where multilingual pedagogies have been implemented.
Our first research question concerned whether students exposed to multilingual pedagogies showed a higher language aptitude and in which specific sub-components. In Test B (vocabulary learning task) and in Test D (sound recognition task) we could not find a statistically significant difference between the two groups. For both tests we suppose that other factors could have played a role, in particular individual differences in working memory processes (Yalçın et al., 2016): Test B taps into visual memory while Test D taps into auditory memory and both demand the participant to store information and retrieve it later. Moreover, Test D requires the ability to recognise patterns in spoken language and this can also be influenced by the amount of exposure to different phonological systems. Even if there is multiple use of collective language repertoires in classes where translanguaging pedagogy is applied, and therefore all students listen to texts in unfamiliar languages and are involved in activities concerning languages they do not know, students do not formally acquire other languages. It is probably rather a matter of incidental learning, which probably leaves fewer profound traces from a cognitive point of view than in contexts of bilingual learning or traditional teaching of a foreign language. We should also consider that both groups experimented more than six months of distance learning during the year before the data collection was carried out because of the Covid-19 emergency, and this implied a reduction of multilingual activities, in particular those related with listening, for the TG.
However, in Test E (sound-symbol correspondence task) and in Test F (grammatical inferencing task) we could observe that the TG performed statistically better than the CG. These results meet our expectations: both tests rely less on memory and are grounded on explicit reasoning (Bokander & Bylund, 2020). Test E requires students to understand sound-symbol correspondence and decode the words heard in the testing phase by relying on what was understood regarding the phonological system of the target language. Test F, which was the most difficult in the battery, requires students to compare language patterns in order to deduce, internalise, and finally apply the rules in different combinations to "translate" the sentences. These kinds of strategies are similar to the ones the students in the TG are usually encouraged to employ during metalinguistic focuses that are proposed by teachers to analyse their peers' language systems.
Our second research question was about possible variables influencing language aptitude across the two groups. Socioeconomic background (SES) influences neither the TG's, nor the CG's language aptitudes. Analysing the relations between Italian language competence, English reading competence, and English listening competence, as assessed through the INVALSI tests, we found a variegated pattern of correlations and predictions. Generally, it seems that the CG's English competence is strongly associated with language aptitude, thus we can deduct that students in the CG rely on their English knowledge more than students in the TG. This is particularly evident for Test E, which does not show any relation with Italian language competence or English language competence for the TG, while for the CG it is correlated to each INVALSI result. We can explain this by hypothesizing that, for the CG, the primary source of multilingualism and metalinguistic reflection is represented by the foreign languages studied at school, like English, while for the TG this source is also represented by all the other linguistic inputs provided in a context of pluralistic instruction, and for this reason the English language alone does not play a prominent role. In fact, in conducting a series of stepwise/forward regressions across the two groups, we noticed that English competence (Listening or Reading) is generally a significant predictor of each LLAMA test for the CG, while for the TG English competence is a predictor only for TEST F. Italian language competence is not a significant predictor of any test for either group.
To conclude, we are aware that this study is still exploratory: more data are needed to gain a more insightful view on language aptitude and multilingual pedagogies. We also know that this is not always possible, because of the nature of this educational context: not many schools are able to continuously implement translanguaging pedagogy to an extent which can allow researchers to identify linguistic phenomena. However, the future perspective of research within the domain of multilingual pedagogies should focus more on quantitative results, taking into account different components like literacy development, as well as cognitive and metalinguistic skills, in order to promote a significant role in reversing minority students' underachievement through these kinds of approaches (Cummins, 2021 Stranieri di Siena. She taught Italian as a second language in different schools in Italy and abroad. As a post doctoral scholar, she collaborates with Bilingualism Matters (Siena branch), and with her colleague, Andrea Scibetta, she coordinates the project, "L'AltRoparlante: translanguaging educational practices for the management of the superdiversity in class." Her research interests are teaching Italian as a second language, early language education, CLIL, bilingualism, and translanguaging.