Weblog Alex Reuneker

Linguists running the Singelloop 2024

— Posted in Hardlopen by

On Friday 12 April 2024, 15 linguists from LUCL – the LUCL Runners – ran the Leiden Singelloop. All running linguists completed the +/- 6km course on the ‘singels of Leiden’. Afterwards, we enjoyed each other's company during the annual post-run party at Olga van Marion and Ton van der Wouden’s house along the route.

enter image description here

The LUCL team of 2024

Thanks to LUCL for sponsoring and all colleagues who supported us during the run!

Bug fixes and new feature n-gram generator

— Posted in Taal by

Unfortunately, due to work on large-file loading, some bugs slipped in, causing the n-gram generator to present incorrect results. Luckily, one of the users attended me to this problem, and the last few days I have fixed a number of related bugs. Atop that, I have implemented a number of checks to prevent really incorrect results in the future.

Finally, I have added n option to remove possessive 's, so now you can choose whether you’d like ‘Harry’s’ to be counted as ‘Harrys’ or ‘Harry’. Some general statistics (word totals, TTR) were added to.

To try the new version, head over to https://www.reuneker.nl/files/ngram.

Digital Humanities Small Grant 2023-2024

— Posted in Taal by

Recently, I was awarded the Digital Humanities Small Grant 2023-2024 by the Leiden University Centre for Digital Humanities. This grant enables me to appoint two student-assistants to participate in the project, as described in the excerpt from the grant proposal below.

In this interdisciplinary project, combining the disciplines of Dutch Linguistics and Digital Humanities, two student assistants will search, index and read available literature on Dutch verb spelling, and they will use and evaluate methodologies from the domain of Digital Humanities to explore a dataset of 6 million verb-spelling answers collected by the first supervisor through the non-profit website Gespeld.nl since 2013.

I’m grateful to Digital Humanities, look forward to working together with two students on the project and by doing so, I hope to enhance our knowledge of spelling difficulties in Dutch verb spelling using data-driven techniques and big-data statistics on data from Gespeld.

Updates for the N-gram generator

— Posted in Taal by

Once in a while I receive emails from researchers all over the world with thanks and/or suggestions for the scripts I provide online, such as frequency list and n-grams generators. For this latter tool, I had a nice email conversation with a researcher from overseas, which led to the following enhancements and updates. I really enjoy these kinds of things, so if you have any suggestions or feedback – you know where to find me.

  • Slight efficiency rewrite of output rendering. (2024-01-26)
  • Added feature for respecting or ignoring sentence boundaries. (2024-01-25)
  • Added feature for including or excluding numbers. (2024-01-25)
  • Added top limits above 1.000 (2.000, 3.000, 4.000, 5.000, 10.000) to respect or ignore sentence boundaries. (2024-01-25)
  • Added feature for (virtually) unlimited results. (2024-01-22)
  • Added feature for unigrams. (2024-01-22)

Lezing Engelse leenwerkwoorden VIOT 2024

— Posted in Taal by

Eind januari 2024 presenteer ik op VIOT 2024 (Vereniging Interuniversitair Overleg Taalbeheersing) aan de Universiteit Twente. De lezing gaat over de spelling van (voornamelijk Engelse) leenwerkwoorden, zoals updaten en netflixen. In de lezing presenteer ik kwantitatieve analyses waaruit blijkt dat leenwerkwoorden significant vaker incorrect worden vervoegd dan niet-leenwerkwoorden. Daarnaast tonen de resultaten aan dat een beperkt aantal typen werkwoorden gebaseerd op de uitgang van de stam, het grootste deel van de fouten veroorzaakt.

Voor meer informatie, zie het abstract en VIOT 2024.

Page 1 of 35