Keyword analysis

You can use this page to extract keywords from a (for now Dutch) text. This site was made by Alex Reuneker. For questions, see contact details at http://www.reuneker.nl. If you use this site for your research, please cite it as follows.

Reuneker, A. (2020). Keyword analysis. Retrieved ..., from https://www.reuneker.nl/files/wordlist.


Steps

  1. Copy a text (from a website, a book, a larger corpus, et cetera).
  2. Paste the text into the input area below. You don't have to remove weird characters, tags, white spaces and new lines — the script does it for you.
  3. Set the number of results wanted (or leave at 50) and choose whether you'd like numbers removed (default). Most importantly, choose your reference corpus, either CONDIV for Dutch, or BNC for English.
  4. Click 'Extract keywords' and wait a bit.

Input and settings

Choose preferred settings, or leave at default.

Paste a text to analyze below.


Results

Results will be presented here after you clicked 'generate wordlist'...

About

The keyword analysis functions used were written using Vanilla Javascript, and your text is not uploaded to any server. Your computer itself (better, your browser) does all the work. Small texts are processed very quickly. Longer texts take a bit longer. Getting the keywords from one of the Dutch Bible translations (730.738 words) took my laptop five seconds with multiple tabs and other applications open. Keyword analysis requires a reference corpus and the specific reference corpus used can have severe impact on results.

The Dutch reference corpus used on this page consists of the NRC- and Telegraaf-newspaper sections of the CONDIV-corpus (see Grondelaers et al. 2000). To keep file-size manageable, only words with a frequency higher than 1 were included, which can impact results. The English reference corpus used on this page consists of the written part of the British National Corpus (BNC) (see Adam Kilgarriff's page). To keep file-size manageable, only words with a frequency of 25 or higher were included, which can impact results.

If you use this page for your research, I advise you to refer to both this page and to the CONDIV-corpus.

Pre-processing

Please take note of the pre-processing (i.e. before calculation) done here:

Updates