Website van Alex Reuneker over taal, hardlopen, wielrennen en reizen

MATTR added to the Lexical Diversity Calculator

— Posted in Taal & Literatuur by

Last week, I implemented the calculation of MATTR (Moving Average TTR) into the Lexical Diversity Calculator. MATTR calculates the mean TTR for successive windows of a text (Covington & McFall, 2010), getting, at least that is the idea, a more stable indication of lexical diversity. While that’s not entirely the case (see Bestgen, 2025), you can still test it at https://www.reuneker.nl/ld.

enter image description here

Photo by Sean Nufer on Unsplash

Next: implementing a compression-rate measure to operationalize text repetiveness for what hopefully becomes a project together with Vivien Waszink!

Improvements to the Lexical Diversity Calculator

— Posted in Taal & Literatuur by

In the last couple of days, I've been implementing various improvements to the Lexical Diversity Calculator. Not only did I fix a problem in the calculation of MTLD, which resulted in numbers that were slightly off, but I've also streamlined the calculations and added the calculation of Moving average TTR (MATTR).

enter image description here

Photo by Siora Photography on Unsplash.

Updates

  • 2025-05-29: Added choice to use natural logarithm or base 10 in calculation Maas's a2, Dugast's U2, and Herdan's C.
  • 2025-05-29: Various improvements to calculations and algorithms; added MATTR.
  • 2025-05-26: Important change to the calculation of MTLD, which was slightly off before due to not averaging the forward and backward algorithm.

Next to this, I'm also working on an R-package to easily calculate several measures of lexical diversity, primarily for a research project I'm envisioning for the near future. Stay tuned! For now, please see the online calculator at https://www.reuneker.nl/ld for the newest version.

Hapax Legomena added to Lexical Diversity tool

— Posted in Taal & Literatuur by

In mailing back and forth with one of the researchers over at the Max Planck Institute, there was some confusion over the use of the term unique words in the Lexical Diversity tool. Unique words are not hapax legomena, which is the term in corpus linguistics for words that only occur once. Unique words are simply types and count up to the number of different words in a text. A word might occur once, twice or twenty times, but in all three cases, it would count as one unique word. This measure is also used for calculating the type-token-ratio. As the researcher was interested in how many words occur only once in a text, I've added this count. You can use the new feature here right away!

enter image description here

Hapax legomena in the Lexical Diversity tool