Taal - Weblog Alex Reuneker

Added two new reference corpora to keyword tool

12 June 2024 — Posted in Taal by Alex

Good news! I have added two new reference corpora to the online keyword analysis tool (https://www.reuneker.nl/files/keyword): one filled with Dutch pop lyrics, and one with Dutch rap/hiphop lyrics. For linguists (and others interested) who want to research Dutch lyrics, this might come in handy.

enter image description here Photo by Calum MacAulay on Unsplash.

For more information on these reference corpora, see Waszink, Reuneker & Van der Wouden (2018). Als ik praat, dan praat ik money: de hiphopste woorden. Neerlandistiek, July 28 2018.

Onderwijsfonds van de Maatschappij der Nederlandse Letterkunde 2024

18 May 2024 — Posted in Taal by Alex

Onlangs is mijn aanvraag voor het Onderwijsfonds van de Maatschappij der Nederlandse Letterkunde 2024 gehonoreerd! Met het toegekende bedrag zullen door studenten korte animaties worden gemaakt waarin specifieke categorieën in de werkwoordspelling in heldere en toegankelijke taal worden uitgelegd, zoals beschreven in het onderstaande stuk uit de aanvraag.

De oefeningen [op Gespeld] worden dagelijks door honderden leerlingen en studenten gemaakt, maar de eerdergenoemde feedback wordt slechts in beperkte mate gelezen. Deze aanvraag richt zich daarom op het vergroten van het leereffect bij de genoemde doelgroep door de ontwikkeling van vijftien korte kennisclips (één per oefencategorie, zoals ‘gebiedende wijs’ of ‘leenwerkwoord verleden tijd’) die de feedback op een aantrekkelijke en laagdrempelige manier aanbieden.

Veel dank aan de Maatschappij der Nederlandse Letterkunde voor deze mooie subsidie. Ben je student in de richting animatie, vormgeving of een aanverwante richting en heb je interesse, neem dan zeker even contact op!

Bug fixes and new feature n-gram generator

05 April 2024 — Posted in Taal by Alex

Unfortunately, due to work on large-file loading, some bugs slipped in, causing the n-gram generator to present incorrect results. Luckily, one of the users attended me to this problem, and the last few days I have fixed a number of related bugs. Atop that, I have implemented a number of checks to prevent really incorrect results in the future.

Finally, I have added n option to remove possessive 's, so now you can choose whether you’d like ‘Harry’s’ to be counted as ‘Harrys’ or ‘Harry’. Some general statistics (word totals, TTR) were added to.

To try the new version, head over to https://www.reuneker.nl/files/ngram.

Digital Humanities Small Grant 2023-2024

31 January 2024 — Posted in Taal by Alex

Recently, I was awarded the Digital Humanities Small Grant 2023-2024 by the Leiden University Centre for Digital Humanities. This grant enables me to appoint two student-assistants to participate in the project, as described in the excerpt from the grant proposal below.

In this interdisciplinary project, combining the disciplines of Dutch Linguistics and Digital Humanities, two student assistants will search, index and read available literature on Dutch verb spelling, and they will use and evaluate methodologies from the domain of Digital Humanities to explore a dataset of 6 million verb-spelling answers collected by the first supervisor through the non-profit website Gespeld.nl since 2013.

I’m grateful to Digital Humanities, look forward to working together with two students on the project and by doing so, I hope to enhance our knowledge of spelling difficulties in Dutch verb spelling using data-driven techniques and big-data statistics on data from Gespeld.

Updates for the N-gram generator

25 January 2024 — Posted in Taal by Alex

Once in a while I receive emails from researchers all over the world with thanks and/or suggestions for the scripts I provide online, such as frequency list and n-grams generators. For this latter tool, I had a nice email conversation with a researcher from overseas, which led to the following enhancements and updates. I really enjoy these kinds of things, so if you have any suggestions or feedback – you know where to find me.

Slight efficiency rewrite of output rendering. (2024-01-26)
Added feature for respecting or ignoring sentence boundaries. (2024-01-25)
Added feature for including or excluding numbers. (2024-01-25)
Added top limits above 1.000 (2.000, 3.000, 4.000, 5.000, 10.000) to respect or ignore sentence boundaries. (2024-01-25)
Added feature for (virtually) unlimited results. (2024-01-22)
Added feature for unigrams. (2024-01-22)