On this page, you can generate a wordlist from any text. The result is displayed on the webpage and can be downloaded as a tab-delimited text file for further processing.
Please take note of the pre-processing (i.e. before calculation) done here:
- All punctuation (e.g. !?.,-) is removed, as are other non-standard characters (e.g. \/*, non-UTF-8 quotes).
- Text between square and angle brackets (e.g. [some text], <p>) is removed.
- All numerical characters (0-9) are removed.
- All tabs, newlines (breaks) double spaces et cetera are removed. Remaining spaces are used as word boundaries and they are not counted.
- All letters are converted to lower case (so 'Speak', 'SPEAK' and 'speak' are treated as one type and three tokens).
|Total words||...||Total number of words|
|Processing time||...||Yes, a script like this takes only milliseconds.|
|Download wordlist||...||Tab-delimited text file|