Idris Raja

Archive for January, 2011|Monthly archive page

NBA Champs Team Age

In github, hacking, python, R on January 31, 2011 at 7:15 am

For a while now I’ve wanted to start mucking with the free open source software (FOSS) stats and graphing program R. I needed a dataset to mess around, so I scraped the season box scores for every NBA regular season from 1950 to 2010 from

I used Python to calculate an average age for each team by weighting each player’s age by the proportion of minutes played compared to the team’s total minutes for the regular season. For calculation purposes I use the player’s integer age on February 1st of the season.

The interesting thing about this data is to isolate the championship teams and see the age trends for the different dynasties. From the 1950s to the 1990s, each team gets older as it continues to win championships. This is not surprising, as each dynasty has a core of star players who get a year older each season. These core players play the majority of minutes, and heavily weight the average age. Both role players and bench players are usually recycled, with older ones being replaced by younger ones. This is why we can see that the average age goes up by less than a year for each consecutive season.

The Celtics Dynasty of the 1950s and 1960s starts in 1957 and goes to 1969, a stretch of 13 seasons where they won 11 championships, beating the Lakers in the Finals seven times. The Celtics were 27.0 in 1957, and 30.4 in 1969. Their 1969 team was the oldest ever to win a championship until the last two Michael Jordan Bulls’ teams of 1997 and 1998 who were the oldest ever.

The Larry Bird Celtics won the championship three times between 1981 and 1986, a period where they aged from 27.0 to 29.3. The Showtime Lakers won 5 times between 1980 and 1988, where they aged from 26.2 to 28.9. The Lakers’ increase wasn’t consistent, most likely due to the decreasing minutes of Kareem Abdul Jabbar who was 40 in 1988, and one of the oldest players ever to provide a meaningful contribution to his team (18.2 ppg, 10.9 rebs in 1988) at that age.

The Bulls teams of the 1990s won three-peats between 1991-1993, and again in 1996-1998 after Jordan’s first retirement and comeback. The first Bulls championship team of 1991 was young, just 26.9. In 1992 and 1993, they were 27.6 and 28.0 respectively. Starting with their second three-peat in 1996, they were already one of the oldest championship teams ever. I don’t think the Bulls could have won that second three-peat if not for the year and a half Jordan took off from basketball and ‘rested’ his legs playing baseball. It is inconceivable that Jordan could have played a near decade of 100 plus games and still manage to win six championships. In retrospect, when examining his age and Bulls’ age during that second three-peat, his first retirement was an ingenious move. The only other player on all six championship teams was Scottie Pippen, who didn’t take any time off in the 1990s, and who nearly carried the Bulls sans Jordan to the Eastern Conference Finals in 1994. During the championship run of the 1990s, Pippen managed to average 79 games started a season until the 1998 championship season, where age and fatigue finally caught up with him and he only started 44 games.

The Bulls broke up after the 1998 season, and the entire city of Chicago vilified general manager Jerry Krause for not bringing back the team nucleus of Coach Phil Jackson, Michael Jordan and Scottie Pippen. The Bulls were probably too old and tired to win again in 1999, but no one knew that 1999 would be a lockout-shortened 50 game season. The shortened season and the extra three months of rest would have been exactly what the very old, hypothetical 1999 Bulls needed for a chance for the first 4-peat since the 1960s Celtics.

The San Antonio Spurs won four times between 1999 and 2007, and if we exclude the anomalous 1999 lockout season, they show the same pattern of getting older.

The only exception so far is the three-peat Shaq and Kobe Lakers of the early 2000s, who actually got younger. The two-time defending championship Lakers of 2009-2010 have aged from 27.4 to 28.4, and are 30.3, more than a year older from 2010 through 57 games of the 2011 season. Out of the top eight Lakers players as measured by minutes played, 7 of them are 30 or over. This year looks like the last gasp of the current Lakers team and Kobe Bryant, with Phil Jackson set to retire and the nucleus (Bryant, Gasol, Odom, Fisher, and Artest) all 30 or over.

For the R code used for this graph and the full final season box scores for all teams, click here.

NPR Will Shortz word puzzle 1/16/2011, Solution on GitHub!

In github, hacking, NPR, python, word_puzzle on January 22, 2011 at 12:12 am

This week’s puzzle:

From listener Mike Shteyman of Reisterstown, Md.: Take the first seven letters of the alphabet, A through G, change one of these letters to another letter that is also either A, B, C, D, E, F or G. Rearrange the result to spell a familiar seven-letter word. What word is it?

This puzzle was an easy one to solve with Python and I don’t think I could have solved it the old-fashioned way of ‘only’ using my brain.

I’ve been attemting to solve these puzzles for a few months now, and I’ve had to write certain functions over and over, so I took the time this week to consolidate some of the more common functions I’ve used into a utility file,

I also went through the awesome Git tutorial Git Immersion, with much thanks to EdgeCase Software Artisans and Jim Weirich. With a decent grasp of the basics of Git, I decided to put all the code up on GitHub here. From now on I’ll use git for any coding I do, and I’ll probably end up hosting a bunch of it as publicly available code on GitHub.

Back to this week’s puzzle – the first step is to create all possible strings that start with ‘abcdefg’ and then swap one of those letters with a letter from the same string. We replace each character (7 total in ‘adcdefg’) with 6 possible replacements (any character expect the original one), and have a total of 42 (6*7) strings to test for anagrams.

Each of the 42 strings has one letter repeated twice. The number of arrangements/anagrams for a string of 7 characters with one repeat is 7!/2! = 2,520.

idris@idris-laptop:~/work/npr_puzzles$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.factorial(7) / math.factorial(2)

So there are 2,520 * 42 = 105,840 anagrams to look up in the dictionary for existence, which takes less than a second to execute.

The only answer that showed up was “feedbag”, which you can get by replacing the “c” in “abcdefg” with an “e” which gets you ‘abedefg’ which can be rearranged to “feedbag.”

The puzzle said the answer should be “a familiar seven-letter word.” I suppose “feedbag” qualifies if you, like puzzle submitter Mike Shteyman, hail from the unincorporated Maryland town of Reisterstown.

Looking forward to see if the answer is correct, next week’s puzzle, and sharing more code on GitHub!

NPR Will Shortz Puzzle: 12/26/2010

In hacking, NPR, word_puzzle on January 5, 2011 at 11:26 pm

This week’s puzzle was a fun one.

Name a famous American from the past who has seven letters in his or her last name. Take the last two letters, plus the first four letters, in that order, and you’ll name that person’s profession. Who is it?

There are several parts to this puzzle and it’s easiest to tackle each one individually.

1) Think of a famous American whose last name is seven letters long.     This part is very broad, and there are possibly thousands or tens of thousands of names that can fit this description. My brain certainly can’t easily categorize people into groups based on length of last name.

2) Take his / her last name, and move the last two letters in front of the first four letters and you have the person’s profession.

Again, moving the letters is difficult for my brain, and I couldn’t think of an easy way of doing this part in reverse: thinking of a profession, and then forming the last name to see if it matches a known famous American.

So, of course and as usual, I’ll use Python, command-line fu, Wikipedia, and an English word list to solve this puzzle.

Step 1 – We’re looking for a famous American, and therefore someone whose name is in Wikipedia. I assume that if you don’t have an English Wikipedia entry, you aren’t a famous American. In our Wikipedia list, underscores replaced spaces, into the form firstname_lastname. I ignore all entries that don’t have at least one underscore, as the entry is unlikely to be a name.

def has_underscore_middle(s):
    """returns true if string 's' has underscore not including end or beginning"""
    s = s[1:-1]
    return s.find('_') > -1

Next, I test the last word in the entry to see if it has seven letters. It’s possible that a name like John F. Kennedy, Jr. would be incorrectly eliminated because of the suffix Jr., but I choose to ignore such edge cases for now. If our method doesn’t yield a solution, we can go back and possibly make this refinement.

def has_right_length(s, l = 7):
    """return True if string 's' is correct length 'l'=7"""
    return len(s) == l

Next, I test to see that all characters in the last name are ASCII characters. Wikipedia entries can have grammar marks, numbers, foreign characters, Unicode characters, etc., all of which are unlikely to be contained in the last name of a famous American. Again, this assumption can be modified if this approach doesn’t yield a solution.

def has_ascii_letters_only(s):
    """return True if string 's' is made up only of ascii letters"""
    for char in s:
        if char not in string.ascii_letters:
            return False
    return True

Step 2 – Once we have a list of Wikipedia entries that look like names and have the correct last name length, we are ready to see if moving the letters around in the last name return a profession. I don’t have a list of professions, but I do have a list of all words in the English dictionary. This list from WordNet is specifically called the crossword dictionary as it is all valid entries in English crosswords. There are about 110,000 words in this list and I assume that the list of professions is a subset of this dictionary.

With that assumption, I can move the letters around in the last name and see if it is in the dictionary. If yes, I output the word from the dictionary, the last name, and the original Wikipedia entry onto one line in the output file.

def get_from_underscore(s):
    """accepts string 's' that is expected to have "_"
    return from "_" on, exclusive"""
    reverse = s[::-1]
    loc_ = reverse.find('_')
    return s[-loc_:]

def mix_last_name(s):
    """Takes a seven letter string 's' and creates new 6 letter string
    with last 2 letters of 's' + first four letters of 's'
    Ex: "abcdefg" --> "fgabcd" """
    return s[-2:] + s[:4]

Running Step 1 and Step 2 takes approximately 40 seconds. Now I need to manually scan the list of 1,507 entries and see if I can find a profession.

The file is unsorted, which I could fix in Python but instead I’ll quickly run the command line function sort and do

sort -o output_sort.txt output.txt

which sorts output.txt by the first letter in the line and puts the result in output_sort.txt.

Scanning through the list, we quickly get lucky and see several lines which found ‘author’ as the word in the dictionary. There are several Wikipedia entries that are variations on the name Henry David Thoreau, certainly a famous American and a likely answer for an NPR contest.

This solution required a few assumptions, and some a priori knowledge such as a general knowledge of professions and famous Americans, but otherwise everything else was outsourced to the computer and Wikipedia.

You can see the full code here at github.

Looking forward to next week’s puzzle which looks like a lot of fun, as we’ll have to find a way to find a two-word synonym for a single word, which will require a thesaurus and possibly more.