Cristia, Lavechin, Scaff, Soderstrom, Rowland, Räsänen, Bunce, Bergelson
In the
previous decade, dozens of studies involving thousands of children across
several research disciplines have made use of a combined daylong
audio-recorder and automated algorithmic analysis called the LENAⓇ system,
which aims to assess children’s language environment. While the system’s
prevalence in the language acquisition domain is steadily growing, there are
only scattered validation efforts on only some of its key characteristics.
Here, we assess the LENAⓇ system’s accuracy across all of its key measures:
speaker classification, Child Vocalization Counts (CVC), Conversational Turn
Counts (CTC), and Adult Word Counts (AWC). Our assessment is based on manual
annotation of clips that have been randomly or periodically sampled out of
daylong recordings, collected from (a) populations similar to the system’s
original training data (North American English-learning children aged 3-36
months), (b) children learning another dialect of English (UK), and (c)
slightly older children growing up in a different linguistic and
socio-cultural setting (Tsimane’ learners in rural Bolivia). We find
reasonably high accuracy in some measures (AWC, CVC), with more problematic
levels of performance in others (CTC, precision of male adults and other
children). Statistical analyses do not support the view that performance is
worse for children who are dissimilar from the LENAⓇ original training set.
Whether LENAⓇ results are accurate enough for a given research, educational,
or clinical application depends largely on the specifics at hand. We
therefore conclude with a set of recommendations to help researchers make
this determination for their goals.