Weblog Alex Reuneker

Bug fixes and new feature n-gram generator

- Posted in Taal by - Permalink

Unfortunately, due to work on large-file loading, some bugs slipped in, causing the n-gram generator to present incorrect results. Luckily, one of the users attended me to this problem, and the last few days I have fixed a number of related bugs. Atop that, I have implemented a number of checks to prevent really incorrect results in the future.

Finally, I have added n option to remove possessive 's, so now you can choose whether you’d like ‘Harry’s’ to be counted as ‘Harrys’ or ‘Harry’. Some general statistics (word totals, TTR) were added to.

To try the new version, head over to https://www.reuneker.nl/files/ngram.