Automatic Word of the Day

Written by Ben Wendt

Since 2007 I’ve been keeping up my own personal word of the day blog. I started it back in the heyday of google reader. In those days I was subscribed to several word of the day blogs and wanted to do one of my own. I’ve gone through phases of posting more and less. Some years I posted every day, and toward the late teens I had single digits of posts for several years running.

It’s fun. My favorite words to add are ones that I come across while reading something, and I either really like the word or have to look it up. Words that arise organically have a personal touch that is appropriate for blogging.

But in the years where I posted every day, I mixed and matched between organic words and wikipedia crawling. I forget the exact criteria, but I had a userscript that I would run in a browser that would look for interesting words, those being wikipedia articles that matched several criteria:

  • Single word title, with optional parentheses.
  • No proper nouns
  • Probably some kind of length limit.
  • There were probably other criteria.

It would be interesting to share what I was doing with javascript in those days, but alas, whatever that user script did, it’s lost to the sands of time. Being totally honest I think it was on a work computer, and this was back in the days when a work computer was a big box that sat in an office, not a laptop in your own home.

My enthusiasm for the project waned over the years, but never left. So last year I was looking to revive the blog. I thought an interesting way to automate it would be picking random words from the dictionary with in a given word frequency range.

from wordfreq import word_frequency
from PyDictionary import PyDictionary
from english_words import english_words_set
from random import choice
from wiktionaryparser import WiktionaryParser
import wikipedia
from datetime import datetime

dictionary=PyDictionary()
english_words = list(english_words_set)
parser = WiktionaryParser()

def find_interesting_word(max_freq=5e-07, min_freq=7e-10):
    freq = 2 * max_freq
    while freq < min_freq or freq > max_freq:
        word = choice(english_words)
        freq = word_frequency(word, 'en')
    return word, freq

Here you see that an interesting word is one whose frequency lies in the range 7e-10 to 5e-07. I found these bounds by trial-and-error, and even so most of the output words aren’t great, so pick a bunch of them:

def find_interesting_words(num=7):
    return [find_interesting_word() for i in range(num)]

Example:

[('incubate', 2.88e-07),
 ('rattail', 3.24e-08),
 ('recondite', 5.37e-08),
 ('okra', 4.27e-07),
 ('headdress', 4.57e-07),
 ('Grosset', 8.13e-08),
 ('hyperbola', 1.07e-07)]

Aside from the proper noun “Grosset,” these are basically all suitable for “words of the day.” But a word of the day isn’t just a word, you also need a definition, and I liked to include a picture. I found a function that grabs images for a given query from wikipedia, and use PyDictionary and WiktionaryParser to give definitions:

def li(x):
  return f"<li><i>{x['partOfSpeech']}</i>: {' '.join(x['text'])}</li>"

def html_of(word):
    out_code = f'<p><b>' \
      '<a href="https://en.wiktionary.org/wiki/{word}">{word}</a>' \
      '</b></p>'
    meaning = dictionary.meaning(word)
    if meaning:
        for (typ, defs) in meaning.items():
            out_code += f'<p><i>{typ}</i>'
            lis = ''.join(['<li>' + x + '</li>' for x in defs])
            out_code += '<ul>' + lis + '</ul>'
            out_code += '</p>'
    else:
        meaning = parser.fetch(word)
        out_code += '<ul>'
        for defs in meaning:
            lis = [li(x) for x in defs['definitions']]
            out_code += ''.join(lis) + '</ul>'
        out_code += '</ul>'
    wikiimage = get_wiki_image(word)
    if wikiimage:
        out_code += f'<p><img width=720 rel="{word}" src="{wikiimage}" /></p>'
    return out_code

I see a few issues here, but this was meant as a proof-of-concept, so that’s fine. After running a few tests, I found I was getting results like this:

europium

Noun a bivalent and trivalent metallic element of the rare earth group

This is fine, but I feel it’s a bit soul-less. Had this worked out I would have liked to add blogger API access, including scheduling and tagging, but I wasn’t happy enough with the results. After reflecting on what I wanted the blog to be, I decided to just make a better effort of manually posting things, and I’ve been doing a much better job since last year.