A complication in computing ranks is that of processing ties. An approach is to replace any tied scores with the mean of all ranks. For example if 4 values v1,v2,v3,v4 have ranks 5,6,7,8, then each value should have an associated rank of (5+6+7+8)/4.
Here is an outline for converting raw scores to ranks:
1. Create a list L containing tuples of the form (v, r,c) for value, row, column for each element in the input raw scores matrix.
2. Sort the temporaty list L in ascending order.
3. Process the list L to fill in the output Ranks matrix,
Here is our python code for outputting ranks values, complete with an example from BerensonÅ› Basic Business Statistics, page 493.(We do have a local copy of the book) for testing the routine.
""" File matrix_scores_to_ranks.py Author Dr. Ernesto P. Adorio U.P. Clarkfield ernesto.adorio@gmail.com """ def matrix_scores_to_ranks(M, break_ties = 1, start_rank = 1, ZTOL = 1.0e-5): """ Args M - a list of lists of values. break_ties - 1 if ties need special processing, otherwise zero start_rank - 0 or 1 ZTOL - equality comparison of two real numbers. Return value A list of lists of ranks of the values. """ R=[] L=[] for i, row in enumerate(M): R.append([0] * len(row)) for j, v in enumerate(row): L.append((v, i, j)) L.sort() if break_ties != 1: for k, tup in enumerate(L): v, i, j = tup if break_ties != 1: R[i][j] = k + start_rank else: prevv = L[0][0] starti = 0 endi = 0 for k, tup in enumerate(L): v, i, j = tup if abs(v- prevv) < ZTOL: # a tie? endi = k else: # print previous tied scores! rank = sum(range(starti, endi+1))/(endi-starti + 1.0) + start_rank for i in range(starti, endi+1): ii = L[i][1] jj = L[i][2] R[ii][jj] = rank prevv = v starti = endi = k # Final processing of any remaining ties. rank = sum(range(starti, endi+1))/(endi-starti + 1.0) + start_rank for i in range(starti, endi+1): ii = L[i][1] jj = L[i][2] R[ii][jj] = rank return R #Berenson p. 493. if __name__ == "__main__": M = [[ 18.5, 24.0, 17.2, 19.9, 18.0], [ 26.3, 25.3, 24.0, 21.2, 24.5], [20.6, 25.2, 20.8, 24.7, 22.9], [25.4, 19.9, 22.6, 17.5, 20.4]] R = scores_to_ranks(M, break_ties = 1, start_rank = 1) for row in R: print rowWhen the above program runs, it outputs and matches with the results in the textbook.
[4.0, 13.5, 1.0, 5.5, 3.0] [20.0, 18.0, 13.5, 10.0, 15.0] [8.0, 17.0, 9.0, 16.0, 12.0] [19.0, 5.5, 11.0, 2.0, 7.0]The above is for a matrix. We will post array_scores_to_ranks() with the next article posting. Of course, the output must be transposed to read properly. start_rank may be set to any starting value, usually a 1, but there may be occasion when a starting zero rank may be appropriate. The above code is written for clarity. We may speed it up later, but there is no incentive to do so. We think most of the running time is spent on the sort, and Python sort routine is already fast enough! Feb. 18, 2011: I need to install the Syntax Highlighter addin immediately! Done. I refer a blogger user to Coding Freak There are other strategies in breaking ties. See Python, statistics: Ranking a single array of raw scores and we will update the above code someday.
No comments:
Post a Comment