My other life as Programmer: July 2011

Wednesday, July 27, 2011

Python: converting names to surname, firstname format.

we want to convert names like

Jose P. Rizal to Rizal, Jose P.
Jose P. Rizal Jr. to Rizal Jr., Jose P.
Ma. Jose P. Rizal III to Rizal III, Ma. Jose P.

Here is Python code to perform the conversion. I am wondering if one can write a more elegant version, where by elegant, it does the conversion in a faster, shorter way!

"""
file     lastfirstname.py
author   Ernesto P. Adorio, PhD.
desc     Converts name to last name, first name format.
version  0.0.1 july 27, 2011
"""

def makestr(sarray, lastindex):
    """
   
    """
    output = ""
    for s in sarray[:lastindex]:
       output += (" " + s )
    return output

def surnamefirstname(name):
    """
    Translates a name in firstname surname to
          surname, firstname format.
    Complications arise because surname can include "Jr." or romanized numbers "III"
    """
    parts = name.split()
    n = len(parts)
    if n == 2:
       # easiest case
       return parts[1] + ", " + parts[0]
    elif n > 2:
       if parts[-1].lower() in ["jr.", "jr"]:
          return parts[-2] +" Jr.," + makestr(parts, -2)
       elif parts[-1].lower() in ["ii", "iii", "iv", "v"]:
          return parts[-2] + " " +parts[-1].upper()+ ","  + makestr(parts, -2)
       else:
          return parts[-1] + "," + makestr(parts, -1)
 
print surnamefirstname("Ma. Jose P. Rizal III")

Post you own better version in the comments. We shall learn together to improve our Python skills.

Friday, July 8, 2011

A beginning Python programming challenge: replace slow array.index() calls.

For simplicity, consider the following arrays:




i      0  1   2  3  4  5  6  7  8  9  10

order1 6  9   7 10  5  8 3  2  4   0   1

order2 7  6  10  8  5  9 1  4  2   0   3

Now consider the following Python snippet of code:

myindex = range(11)

for i in range(11):

    myindex[i] = order2[order1.index[i]]

As an example consider a current i value of 0. The value of index1[0] is 6. The position of 6 in order2 is of 6 in index2 array is 1(first element has index0). Therefore myindex[0] = 1.

On the other hand, the value of index1[1] is 9. The ordinal position of 9 in index2 is 5. Hence myindex[1] = 5.

Problem 1. Can you replace the code without using dictionaries and the index function?
You may not modify inplace the order1 and order2 arrays. You may create an additional extra array.

Problem 2. Rewrite the code to use a dictionary. This is much easier.

Imagine your are doing the for loop of 1,000,000 array items! Have fun!

Sunday, July 3, 2011

Python: Generating all subsets of size k for n objects.

In our main blog, we presented a routine for generating all subsets of set, expressed as an array or vector of indices to the n objects (0 to n-1).See Generating all subsets of a set with n objects In this article we present an "odometer" method of generating all subsets of specific size k where k < n. For example, for n = 5 and k = 3, we expect the function to return the set containing the following vectors: [0,1,2], [0, 1,3], [0,1,4], [0,2,3] and so on up to [2,3,4]. Here is our Python code.



# -*- coding: utf-8 -*-



"""

File      ksubsets.py

Author    Ernesto P. Adorio, Ph.D.

UPDEPP, UP Clarkfield  

Universty of the Philippines Extension Program in Pampanga

Desc      generates subsets (as indices) with k elements from a set of size n.

Version   0.0.1  2011.07.03

"""





def  ksubsets (n,  k):

"""

generates the list of subsets (as index array) of n elements

chosen k at a time. Combinations actually!

"""

#sanity check

if n <0 or k <0 or n - k < 0:
        return None
    N = range(k)
    retval = [N[:]]
    print "initial N=", N
    
    #update current N vector.
    level = k-1
    while level >=0:

d = N[level]

#test if element is already  at maximum value.

if d < n - k + level:  
            # No, increment the value.  
            N[level]+= 1
            #repair trailing elements.
            for i in range(level+1,  k):
                N[i] = N[i-1] + 1
            #append newly generated subset.
            retval.append(N[:])
            level = k-1
        else:
            # position N[level] already at maximum value!
            level -= 1
    return retval
    
for i,  subset in enumerate(ksubsets(5,  3)):
    print i,  subset

When the above program is run, it outputs the expected results:

0 [0, 1, 2]
1 [0, 1, 3]
2 [0, 1, 4]
3 [0, 2, 3]
4 [0, 2, 4]
5 [0, 3, 4]
6 [1, 2, 3]
7 [1, 2, 4]
8 [1, 3, 4]
9 [2, 3, 4]

How does the code work? First it initializes the first member of the set of subsets as
[0, 1, ..., k-1]. Next, starting from the rightmost (level = k-1)position it checks whether the current element is at maximum. If it is not it is incremented and any trailing positions adjusted to be 1 greater than at the preceding position. Otherwise the level is decremented and the process repeats.

Exercise: write a generator version of the above code. We will present the solution on the next revision.

The itertools module has a combinations function. After importing, one simply writes:
list(combinations (range(5), 3)).. Python just makes it TOO easy!

Pages