Idris Raja

NPR Will Shortz Puzzle: 12/26/2010

In hacking, NPR, word_puzzle on January 5, 2011 at 11:26 pm

This week’s puzzle was a fun one.

Name a famous American from the past who has seven letters in his or her last name. Take the last two letters, plus the first four letters, in that order, and you’ll name that person’s profession. Who is it?

There are several parts to this puzzle and it’s easiest to tackle each one individually.

1) Think of a famous American whose last name is seven letters long.     This part is very broad, and there are possibly thousands or tens of thousands of names that can fit this description. My brain certainly can’t easily categorize people into groups based on length of last name.

2) Take his / her last name, and move the last two letters in front of the first four letters and you have the person’s profession.

Again, moving the letters is difficult for my brain, and I couldn’t think of an easy way of doing this part in reverse: thinking of a profession, and then forming the last name to see if it matches a known famous American.

So, of course and as usual, I’ll use Python, command-line fu, Wikipedia, and an English word list to solve this puzzle.

Step 1 – We’re looking for a famous American, and therefore someone whose name is in Wikipedia. I assume that if you don’t have an English Wikipedia entry, you aren’t a famous American. In our Wikipedia list, underscores replaced spaces, into the form firstname_lastname. I ignore all entries that don’t have at least one underscore, as the entry is unlikely to be a name.

def has_underscore_middle(s):
    """returns true if string 's' has underscore not including end or beginning"""
    s = s[1:-1]
    return s.find('_') > -1

Next, I test the last word in the entry to see if it has seven letters. It’s possible that a name like John F. Kennedy, Jr. would be incorrectly eliminated because of the suffix Jr., but I choose to ignore such edge cases for now. If our method doesn’t yield a solution, we can go back and possibly make this refinement.

def has_right_length(s, l = 7):
    """return True if string 's' is correct length 'l'=7"""
    return len(s) == l

Next, I test to see that all characters in the last name are ASCII characters. Wikipedia entries can have grammar marks, numbers, foreign characters, Unicode characters, etc., all of which are unlikely to be contained in the last name of a famous American. Again, this assumption can be modified if this approach doesn’t yield a solution.

def has_ascii_letters_only(s):
    """return True if string 's' is made up only of ascii letters"""
    for char in s:
        if char not in string.ascii_letters:
            return False
    return True

Step 2 – Once we have a list of Wikipedia entries that look like names and have the correct last name length, we are ready to see if moving the letters around in the last name return a profession. I don’t have a list of professions, but I do have a list of all words in the English dictionary. This list from WordNet is specifically called the crossword dictionary as it is all valid entries in English crosswords. There are about 110,000 words in this list and I assume that the list of professions is a subset of this dictionary.

With that assumption, I can move the letters around in the last name and see if it is in the dictionary. If yes, I output the word from the dictionary, the last name, and the original Wikipedia entry onto one line in the output file.

def get_from_underscore(s):
    """accepts string 's' that is expected to have "_"
    return from "_" on, exclusive"""
    reverse = s[::-1]
    loc_ = reverse.find('_')
    return s[-loc_:]

def mix_last_name(s):
    """Takes a seven letter string 's' and creates new 6 letter string
    with last 2 letters of 's' + first four letters of 's'
    Ex: "abcdefg" --> "fgabcd" """
    return s[-2:] + s[:4]

Running Step 1 and Step 2 takes approximately 40 seconds. Now I need to manually scan the list of 1,507 entries and see if I can find a profession.

The file is unsorted, which I could fix in Python but instead I’ll quickly run the command line function sort and do

sort -o output_sort.txt output.txt

which sorts output.txt by the first letter in the line and puts the result in output_sort.txt.

Scanning through the list, we quickly get lucky and see several lines which found ‘author’ as the word in the dictionary. There are several Wikipedia entries that are variations on the name Henry David Thoreau, certainly a famous American and a likely answer for an NPR contest.

This solution required a few assumptions, and some a priori knowledge such as a general knowledge of professions and famous Americans, but otherwise everything else was outsourced to the computer and Wikipedia.

You can see the full code here at github.

Looking forward to next week’s puzzle which looks like a lot of fun, as we’ll have to find a way to find a two-word synonym for a single word, which will require a thesaurus and possibly more.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: