Website van Alex Reuneker over taal, hardlopen, wielrennen en reizen

Flesch-Kincaid Reading Ease Score (FLES) added to Lexical Diversity Tool

I added the Flesch-Kincaid Reading Ease Score (FRES) to the Lexical Diversity Tool. This metric calculates the difficulty of a texts based on the number of syllables, words and sentences in a text. The lower the score, the more difficult a text is to read. In the screenshot below you'll see the Flesch-Kincaid Reading Ease Score for the text Minder eten weggegooid dan eerdere jaren for a Dutch news text for children (Jeugdjournaal). Compared to the news text for the same topic, but written for Dutch adults, the Flesch-Kincaid Reading Ease Score is much higher (73.41 vs 46.65 respectively).

Flesch-Kincaid Reading Ease Score

Flesch-Kincaid Reading Ease Score

The tool now also list the number of sentences in a text, as well as the average word length in number of syllables, and (optionally) a list of words and their number of syllables. The measure only works for Dutch texts (for now), because the (still imperfect, but good enough) splitting up of words into their syllables is based on the Dutch language. You can check it out now at https://www.reuneker.nl/files/ld.

A last note is that I am contemplating taking readability measures like this out of the Lexical Diversity Tool, because lexical diversity and readability are related, but certainly not the same. The tool also becomes a bit chaotic and cluttered, so the more reason to give readability its own calculator page some day.

Lexical coverage added to Lexical Diversity Tool

I added a measure (somewhat) known as 'lexical coverage' to the Lexical Diversity Tool. This measure represents the percentage of words that occur in a list words from all Dutch newspaper texts in the SoNaR-500 corpus that, together, make up for 77 percent of all tokens in that corpus (although other corpora are used, see Staphorsius, 1994; Kraf, Lentz & Pander Maat, 2011). The higher this percentage, the easier the text, because more words may be supposed to be read before and thus 'known'. Although this definitely says something about the lexical diversity (perhaps indirectly) of a text, it is used primarily to assess the reading difficulty of a text (see also Adolphs & Schmitt, 2003; Van Zeeland & Schmitt, 2013).

Lexical coverage added to Lexical Diversity Tool

Lexical coverage added to Lexical Diversity Tool

Because I have used of the (Dutch newspaper subcorpus of the) SoNaR-500 as a reference corpus, the measure only works for Dutch texts – for now at least. Although the implementation is still a bit rough, it is workable and correct, but be aware it is still in development.