Website van Alex Reuneker over taal, hardlopen, wielrennen en reizen

Dealing with 'zero counts' in keyword analysis

One problem with keyword analysis is that the target corpus will likely include words that do not occur in the reference corpus. In calculating various measures of keyness, this would result in a division by zero, which is mathematically impossible, as far as I know. The default way of dealing with this is to assign words that do not occur in the reference corpus a frequency of 0.5, but this introduces the risk of a result in which such keywords dominate the top positions, because their keyness is inflated.

To remedy this problem, I have added an option to the Keyword Analysis Tool which let's you choose to either go with the default of assigning a 0.5 frequency to 'zero counts', or to simply discard them from all calculations, resulting in keywords that have a minimal frequency of 1 in the reference corpus.

Dealing with 'zero counts' in the Keyword Analysis Tool

Dealing with 'zero counts' in the Keyword Analysis Tool

There is no real wrong or right way to do this, but at least now you have a choice. Have fun!

Oliebollencross 2025, Koplopers Delft

  in Sport
 

Ook dit jaar deed ik mee aan de Oliebollencross in Delft. Door het flink drogere weer dit jaar verwachtte ik wat minder glibberen en glijden dan voorgaande jaren, maar zeker na een paar ronden is het parcours op bepaalde stukken toch altijd nog lekker zompig.

De Oliebollencross in Delft is een leuke, goed bezette wedstrijd die net als voorgaande jaren – geen 'graaiflatie' hier – bewijst dat sporten niet duur hoeft te zijn: 3 euro inschrijfgeld en dat is inclusief oliebol na afloop. Koffie erbij voor een euro. Fijn dat dat ook gewoon nog kan in 2025!

Voor de lange cross stonden er zo’n 180 mannen en vrouwen aan de start en het was, zoals bij elke cross, lekker dringen geblazen.

Oliebollencross in Delft, 2025

Oliebollencross in Delft, 2025

Omdat ik train voor de Zestig van Texel, liep ik de dag voor de cross een duurloop van 32 kilometer. Het was, ook qua planning thuis, gewoon niet anders. Ik had me daarom voorgenomen deze cross lekker ontspannen te lopen. Een cross is sowieso een goede krachttraining, maar bovenal leuk om te doen, zeker aan het einde van het jaar met een oliebol in het vooruitzicht. Ik voelde duidelijk mijn benen van de dag ervoor en al in het begin dacht ik 'dit is een prima tempo, harder hoeft het niet.' Die gedachte kon ik volhouden en ik heb nergens het gevoel gehad in het rood of zelfs maar oranje te lopen. Ik heb gewoon genoten van een mooie cross met leuke mensen.

De oliebol na afloop met Marlies, met wie ik bij de cross had afgesproken, smaakte goed. De jaren ervoor sloeg ik die altijd over en dat zal deels de eetstoornis zijn geweest, maar ik vond het ook wel een gedoetje om van de finish nog een kilometer terug naar het clubhuis te moeten. Ik ben blij dat we dat gisteren toch gedaan hebben, want het was erg gezellig en de eerste oliebol van het jaar, natuurlijk met rozijnen en poedersuiker, was heerlijk.

Uiteindelijk liep ik de lange cross (9,2 kilometer) in 38:51 en dat was vorig jaar, met een veel zwaarder (natter) parcours 38:03. Net als bij de Kustmarathon lukte het tijdens de wedstrijd prima om geen tijds- of positiedoelen te stellen, maar toch kwam ook nu achteraf gedachten op als 'het had wel harder gekund' en 'waarom heb je die en die niet ingehaald'. Geen helpende gedachten, maar wegdrukken werkt ook niet of zelfs averechts. Gewoon maar laten zo dus en nu lekker terugkijken op de mooie, sportieve jaarafsluiting die de Oliebollencross elke keer weer is.

The Evenings, day 8: 29 December 1946/2025

Today is day eight, on which Frits suffers from a hangover from the night before, in which ad fundum proved to be a favourite exclamation. Frits wakes up 'with a mouth dry as cork' and takes upon himself to get out of bed, wash up and be done with it, but he falls asleep again. Given Frits' hangover, let's take it easy today and look briefly at a little comparison between Reve's The Evenings and the American coming-of-age novel The Catcher in the Rye by J.D. Salinger, with which Reves novel is sometimes compared. It won't be a detailed comparison, but just some quick linguistic/lexical facts.

The Catcher in the Rye

The Catcher in the Rye (image by Britannica.com)

The Evenings consists of 92.680 words, of which 7.017 are unique, resulting in a type-token-ratio of 0.08. (Again, you can use the Lexical Diversity Calculator to calculate these and more figures yourself.) The Catcher in the Rye is a bit shorter, with 73.629 words of which 4.511 are unique, providing us with a type-token-ratio of 0.06. The mean word lenght in the former novel is 4.11 letters, in the latter it is 3.90. The average sentence length in The Evenings of 10.38 words is however lower than that of the The Catcher in the Rye with 11.03 words. Interestingly, using the Lempel-Ziv-Welch algorithm (Welch, 1984), we see identical compression rates, namely 0.81 on a scale from 0 (no repetition) to 1 (maximum repetition), meaning both novels contain an equal amount of repetition.

Have fun reading today (and go a bit easy on Frits...).

The Evenings, day 7: 28 December 1946/2025

Day seven of The Evenings. Seven comes after six, and that rings of 'six-seven', children's word of the year of 2025 in The Netherlands! You can read about that at NOS Jeugdjournaal. Think of it what you will, but let's use this interesting word as a gateway into today's chapter.

Photograph by Haberdoedas on Unsplash

Photograph by Haberdoedas on Unsplash

The 67th word – which you can easily find using the Lexical Diversity Calculator – is mother, with which Frits' mother ends the following note she left for him.

Dear Frits. I don’t know where Father is. I have gone to Annetje’s. I will be home around eleven. There is pea soup, and you can take a piece of meat if you like. Just fry some potatoes along with the onions. Until then. Mother.

It really is a quite touching note, as Frits' parents clearly have relational issues, and while Frits really doesn't speak nicely of his parents, he does seem affected by their quarrels. His mother leaving for the day is not the first time, and I think it makes Frits' isolation the more apparent.

So what about the 67th sentence? You can find it using the Sentence Length Tool. Here it is: 'He came to a canal where a sand barge lay at anchor.' Frits just listened a bit to the radio and doesn't know what to do next, being home alone. He decides to go rest a bit, as not to be 'drowsy this evening.' He hears children playing, dozes of and dreams of a ship with a funeral cross. When he finally wakes up at five thirty, the chapter reads, his pillow is wet with tears.

The 67th sentence in chapter 7 of The Evenings

The 67th sentence in chapter 7 of The Evenings

While it is easy to critize Frits for being distant, sarcastic and very judging all the time, this chapter always reminds me to feel sorry for him too. The situation with his parents at home is less than stellar, and after not finishing school, Frits seems to be one of the few who hasn't moved on in life, his former school friends studying, being in relationships, et cetera. I think Frits' sarcastic comments, and his reliance on silly anecdotes for social interactions reflect his loneliness and his sense of it all being meaningless.

Returning to six-seven, Kristel Doreleijers, linguist at the Meertens Instituut, explains to NOS Jeugdjournaal that 'it's really a word of this year, but it doesn't have much meaning [...]'. Maybe that's what also lacks from Frits' life at this point: some meaning, something to rival the hopelesness of feeling not to have a purpose in life.

The Evenings, day 6: 27 December 1946/2025

Christmas has passed and it's day six of The Evenings. Today's chapter begins not with Frits waking up, but right in the middle of the afternoon, with Frits being at the office, where the lights have to be turned on at a quarter past three, because it was already getting dark outside.

Early darkness on day six of The Evenings

Early darkness on day six of The Evenings. Photograph by Fons Heijnsbroek on Unsplash.

Yesterday, we looked at hapax legomena, or words that appear only once in a text. So, why not look at dis legomena today? Dis legomena are words that occur exactly twice in a text, and most often, they are a lot less frequent than hapaxes. For instance, today's chapter hosts 925 hapaxes, and 'only' 255 dis legomena. One that caught my eye, was oem, of which I thought it might be part of those weird little songs Frits sometimes hums or sings. We'll look into those later, as it may be interesting to see how those are translated, but we'll stick to oem for now. The word appears twice and it actually is the name of a lady in the following little story Frits finds in Viktor's book.

He leafed on. 'Janet: the gentlemen will surely remember,' he read, 'the case of the lady Oem, whose cat had died. I can, to my great satisfaction, report that her recovery is complete. Thanks to a remarkable course of treatment which I, in this case, applied. My treatment of Miss Oem consisted of giving her a new cat.'

Another word occuring exactly twice is lipreader, which draws the attention to quite a funny bit about sitting next to a lipreader in the movie theatre. There are, according to Frits, two types. First, there are 'the extroverts, who laugh and explain things to those sitting beside them. Those are truly terrible.' But there's a worse type of lipreader, 'the ones who read the subtitles out loud. Aye-aye, Jesus Christ, what an abomination.' I find these small observations very funny, because they both ring true, and are things you could think yourself, but would not readily say out loud.

Movie theatre

Movie theatre. Photograph by Jake Hills on Unsplash.

Have fun reading again today!

Pagina 11 of 70