Idris Raja

Hacking another NPR word problem

In hacking, word_puzzle on December 28, 2010 at 1:44 am

Last week’s puzzle: Name a city in the United States that ends in the letter S. The city is one of the largest cities in its state. Change the S to a different letter and rearrange the result to get the state the city is in. What are the city and state?

Now this one is admittedly pretty easy. We know that we need a state with 7 letters, and after finding this very convenient list on Wikipedia, I was able to scan the table and see that the answer is Yonkers, New York in less than a minute. But what if we had a list of hundreds of states? How could we ‘hack’ this puzzle?

First step is to grab that wikipedia page and save it as a file locally. Many ways to do this, my favorite way from the command line:

wget “’_largest_cities_by_population”

That will save the file in the local directory. I remaned the file to city.html.

The next step is to extract the information in the table. Load the page in Google Chrome and then right click anywhere on the table and choose ‘Inspect Element’. We can see that the state and city information is contained in a table with class = wikitable, and the individual rows are in tr tags.

After opening the page in a Python file and parsing it with the functions etree.parse and the etree.HTMLParser from the lxml module, We can use two XPath’s to get at this information.

xps1 = “//table[@class=”wikitable”]//tr’ and

xp2s = ‘.//td//a/text()’

The first captures all the tr tags in the table, and the second allows us to capture all the state and city information in the rows. From here it’s trivial, as we can simply see which states are seven letters long and have cities that are seven letters long – 13 total. If that was too many, we would need test to see which of those pairs share 6 of 7 letters.

I’ll leave that for now as next week’s puzzle seems a bit more challenging.


