Saturday, February 26, 2011

Python: a high level word count program

Introduction


Python allows the beginning programmer to use powerful built-in powerful routines, freeing the programmer from imlementing on his own (usually error prone versions)! Here, we use Python A dictionarie, convenient string splitting methods to do the non-trivial word counting problem. Non-trivial in the sense it does more than printing a message "Hello World" to the screen.

the Python code


"""
"""
file     wordcount.py
author   Ernesto P. Adorio
         U.P.Clarkfield
version  0.0.1
desc     counts words, lines
         lines are setoff by new line character, "\n"
         quoted strings are stripped off of quotes.   
"""

def wc(textfilename):
    D = {}
    nc = 0 # number of characters.
    nw = 0 # number of words.
    nl = 0 # number of lines.
    for line in open(textfilename, "r").readlines():
        # print line #decomment for debugging.
        nc += len(line)
        words = line.split()
        for word in words:
            if word in D:
               D[word] += 1
            else:
               D[word] = 1
            nw += 1
        nl += 1
    return nc, nw, nl


if __name__ == "__main__":
   print wc("wordcount.py")

Notice the code's compactness. In a lower level language such as C, the code will be far longer and complicated. The essence of the code is:



read in all lines and store in an array.

process each line.
add length of line to number of characters.
split the line into words, which are tokens separated by whitespaces.
add number of words in line the number of total words.
update the line number.



When the program is fed the code for itself, it printed the number of c
haracters, words and lines as


$ python wordcount.py
(762, 104, 36)


As a check the wc utility produces ( wc wordcount.py) in reverse order


36 104 762



Issues?


There may be instances when the number of lines is off by one. This happens if the last line does not contains a terminating line feed character.
Another thing is that the code may be feed an extremely large file. If the working memory is small, it may run out of space! This problem is easy to fix and we will revise the above initial working code later.

Tuesday, February 22, 2011

A return to C programming: Hello world example

I took up C programming at Camp Aguinaldo in the mid 80's and got a certificate for it. but I never earned a centavo for my C programming skills in my later work since the oppurtunity to program was non-existent, instead I opted to enrol in an MS Applied Math, Major in Computer Science after work.

C may be considered a higher level assembly language as the power constructs in assembly are also available in C; concepts such as pointers, unrestricted gotos, and the like. In fact a C compiler worth its salt should be able to generate assembly output!

For the hello world example in C, we try to write in an undiciplined manner and see if we can get away it. As usual we write a function to do the actual writing of a message "hello world!" to be closer in spirit to the previous Javascript example. Save the following to a file "hello.c" without the quotes.

void showMessage(char *str)
{
   puts(str);
}

main()
{
    showMessage("Hello World");
}


Every C program has a main function. The difference between a function and procedure is that the former exlicitly returns a value while a procedure does not and should be declared of type void. Now lets try to run the hello.c code in the console: type

gcc -Wall hello.c

Notice the -Wall flag to output any warnings encountered in processing the source file.
Here is the output from my Ubuntu 10.04 terminal (console):


gcc -Wall hello.c
hello.c: In function ‘showMessage’:
hello.c:4: warning: implicit declaration of function ‘puts’
hello.c: At top level:
hello.c:8: warning: return type defaults to ‘int’
hello.c: In function ‘main’:
hello.c:10: warning: control reaches end of non-void function
toto@toto-laptop:~/Blogs/my-other-life-as-programmer/C$


In good C programs, all functions must be declared and return types should be explicit. Here the compiler complains that puts was implicitly declared and that the main function did not return any value! and yet in spite of these warnings, you will find that the gcc outputte an executible file hello. To run this ececutible, issue ./hello. The program outputs


toto@toto-laptop:~/Blogs/my-other-life-as-programmer/C$ ./a.out
Hello World
toto@toto-laptop:~/Blogs/my-other-life-as-programmer/C$


Now a good C programmer tries to remove all warnings. So lets do some modifications to be in good terms with the gcc compiler.


#include 

void showMessage(char *str)
{
   puts(str);
}

int main() # explicit declaration.
{
    showMessage("Hello World");
    return 0;
}


We have included the standard header "stdio.h", gcc knows where to find it (in "/usr/include") and this erases the warning on puts. We also explicitly declare the main function as type int and specifically return a zero (the usual value for successful execution.) The compilation is now successful without any complaints from the compiler.


toto@toto-laptop:~/Blogs/my-other-life-as-programmer/C$ gcc -Wall hello.c
toto@toto-laptop:~/Blogs/my-other-life-as-programmer/C$


So the minor corrections removed all warnings. Now lets do some extra activities for fun.



  1. Create assembler listing
  2. Type gcc -S hello.c this outputs hello.s
            .file   "hello.c"
    
            .text
    .globl showMessage
            .type   showMessage, @function
    showMessage:
    .LFB0:
            .cfi_startproc
            pushq   %rbp
            .cfi_def_cfa_offset 16
            movq    %rsp, %rbp
            .cfi_offset 6, -16
            .cfi_def_cfa_register 6
            subq    $16, %rsp
            movq    %rdi, -8(%rbp)
            movq    -8(%rbp), %rax
            movq    %rax, %rdi
            call    puts
            leave
            ret
            .cfi_endproc
    .LFE0:
            .size   showMessage, .-showMessage
            .section        .rodata
    .LC0:
            .string "Hello World"
            .text
    .globl main
            .type   main, @function
    main:
    .LFB1:
            .cfi_startproc
            pushq   %rbp
            .cfi_def_cfa_offset 16
            movq    %rsp, %rbp
            .cfi_offset 6, -16
            .cfi_def_cfa_register 6
            movl    $.LC0, %edi
            call    showMessage
            movl    $0, %eax
            leave
            ret
            .cfi_endproc
    .LFE1:
            .size   main, .-main
            .ident  "GCC: (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5"
            .section        .note.GNU-stack,"",@progbits
    
    
    The assembler file is one reason most programmers (run away) do not like assembly language! ha ha. Unless of course if the paying job calls for using assmbly (or assembler) langauge. ! But this feature of the gcc to ouput assembler will come in handy when we program a Forth language translator.
  3. Seek help in using gcc
  4. Try man gcc. You will get an eyeful to the useful flags. Better yet, try the free forums or buy a book specifically on the gcc.

Monday, February 21, 2011

Hello World in Javascript

I am a python fan but I have to come to grips with javascript since I am also programming an online solver site. Javascript can be processed by any browser unless the javascript engine has been turned off by the user,perhaps due to security concerns.

Here we show a simple html file with embedded javascript code to write a message "Hello World!".This example is only useful in the sense that the aspiring javascript coder will be familiar with the standard practices of writing Java script. We will later introduce more complex and useful examples.


Now open your editor, copy manually or just cut and paste the following contents.

<html>
<head>
<title> Hello World Example</title>

<script language="javascript"> 
function showMessage(message)
{
   document.getElementById('location').innerHTML= message;
}
</script>
</head>

<body onload="showMessage('Hello World!')">
<h1>A first Javascript</h1>

A message to one and all:

<span id="location"></span>

</body>
</html>

Lets get this to work. Fire up your browser and enter the url, (uniform resource locator), of the file
In my case, it was "file:///home/toto/Blogs/my-other-life-as-programmer/javascript/hello-world.html"

The output window should look like this:

Note that you do not really need Javascript just to print a message which can be read by all browsers!
A pure html will do, but in the above code, (based on Holzer's XML bible), the use of function is illustrated. So it is a little complicated.

The function is declared inside the html header <head> block by the xml delimiters <script language="javascript">.... function here!.... </script>.When your browser encounters this script block, it does not right away execute it but parses the contents, translates it into a form for faster execution and saves it for recall.

Notice that the body of the function will find a named div block called "location" and it will insert
any message specifid by function's input argument. The function is triggered when the browser loads the html page (the body block). When it encounters the inline <span id="location" > block, the function replaces this block by message "Hello World".

The main behavioral difference of a span and div block is that a line break is always inserted by the div block. whereas the span is inline.

For more information, consult the introductory javascript chapter in


Steven Holzner: "AJAX Bible", John Wiley, 2007.

The book book has a fast pace introduction to Javascript, for a more relaxing pace, perhaps the following is more suitable:

Tom Negrino, Dori Smith: "JavaScript for the World Wide Web", Peachpit Press,2004.

I have both books, bought at Booksale. I consider them bargains at this time, costing me 350 and 200 pesos respectively

.






javascript:void(0)

Friday, February 18, 2011

Python: Converting a matrix of raw scores to ranks.

There are instances where we have to convert a matrix of raw scores to ranking scores, for example in Statistics where we have to compute the Kruskal-Wallis statistics of a block of values. Although we have written a Kruskal Wallis routine previously, we shall write an independent function to return ranks based on raw scores.

A complication in computing ranks is that of processing ties. An approach is to replace any tied scores with the mean of all ranks. For example if 4 values v1,v2,v3,v4 have ranks 5,6,7,8, then each value should have an associated rank of (5+6+7+8)/4.

Here is an outline for converting raw scores to ranks:
1. Create a list L containing tuples of the form (v, r,c) for value, row, column for each element in the input raw scores matrix.
2. Sort the temporaty list L in ascending order.
3. Process the list L to fill in the output Ranks matrix,

Here is our python code for outputting ranks values, complete with an example from BerensonÅ› Basic Business Statistics, page 493.(We do have a local copy of the book) for testing the routine.

"""
File    matrix_scores_to_ranks.py
Author  Dr. Ernesto P. Adorio
        U.P. Clarkfield
        ernesto.adorio@gmail.com
"""


def matrix_scores_to_ranks(M, break_ties = 1, start_rank = 1, ZTOL = 1.0e-5):
    """
    Args
       M          - a list of lists of values.
       break_ties - 1 if ties need special processing, otherwise zero
       start_rank - 0 or 1
       ZTOL       - equality comparison of two real numbers.
    Return value
       A list of lists of ranks of the values. 
    """
    R=[]
    L=[]
    for i, row in enumerate(M):
        R.append([0] * len(row))
        for j, v in enumerate(row):
    L.append((v, i, j))
    L.sort()

    if break_ties != 1:
       for k, tup in enumerate(L):
          v, i, j = tup
   if break_ties != 1:   
     R[i][j] = k + start_rank
    else:
        prevv = L[0][0]    
        starti = 0 
        endi   = 0 
        for k, tup in enumerate(L):
            v, i, j = tup
            if abs(v- prevv) < ZTOL: # a tie?
               endi = k
            else:
               # print previous tied scores!
               rank = sum(range(starti, endi+1))/(endi-starti + 1.0) + start_rank

               for i in range(starti, endi+1):
                   ii = L[i][1]
                   jj = L[i][2]
                   R[ii][jj] = rank
               prevv = v
               starti = endi = k   
        # Final processing of any remaining ties.
        rank = sum(range(starti, endi+1))/(endi-starti + 1.0) + start_rank
        for i in range(starti, endi+1):
            ii = L[i][1]
            jj = L[i][2]
            R[ii][jj] = rank
    return R

#Berenson p. 493.

if __name__ == "__main__":
  M = [[ 18.5, 24.0, 17.2, 19.9, 18.0],
     [ 26.3, 25.3, 24.0, 21.2, 24.5],
     [20.6,  25.2, 20.8, 24.7, 22.9],
     [25.4,  19.9, 22.6, 17.5, 20.4]]

  R = scores_to_ranks(M, break_ties = 1, start_rank = 1)

  for row in R:
     print row


When the above program runs, it outputs and matches with the results in the textbook.
[4.0, 13.5, 1.0, 5.5, 3.0] [20.0, 18.0, 13.5, 10.0, 15.0] [8.0, 17.0, 9.0, 16.0, 12.0] [19.0, 5.5, 11.0, 2.0, 7.0]
The above is for a matrix. We will post array_scores_to_ranks() with the next article posting. Of course, the output must be transposed to read properly. start_rank may be set to any starting value, usually a 1, but there may be occasion when a starting zero rank may be appropriate. The above code is written for clarity. We may speed it up later, but there is no incentive to do so. We think most of the running time is spent on the sort, and Python sort routine is already fast enough! Feb. 18, 2011: I need to install the Syntax Highlighter addin immediately! Done. I refer a blogger user to Coding Freak There are other strategies in breaking ties. See Python, statistics: Ranking a single array of raw scores and we will update the above code someday.

Saturday, February 12, 2011

Hello World in assembler (Ubuntu 64 bit)

We copy the following code, dated sep. 17, 2007, from an old daniweb discussion about using nasm.

global _start

section .data
 hello db "Hello, World!", 10
 length equ $-hello

section .text

_start:
 mov eax, 4  ; write to file
 mov ebx, 1  ; STDOUT handle
 mov ecx, hello ; our message
 mov edx, length ; size of message
 int 80h   ; execute the syscall

 xor ebx, ebx  ; send 0 as 'exit code'
 mov eax, 1  ; terminate process
 int 80h   ; execute the syscall


The only familiarity I had with assembler was way way back in 1992 when I was taking up my MS Applied Math, Major in Computer Science. We learned and have already forgotten how to do it directly in assembly, but we were also smart to use the ability of the Turbo C compiler to output assembler(!), using flag --S. Those were the days of mental exhiliration of seing what you typed is transformed into binary code and run as a tiny program.

section .data
 hello db "Hello, World!", 10
 length equ $-hello

Here we allocate a sequence of bytes, with a label hello to hold the string Hello World! followed by a line feed.(Carriage Return is coded as 13). Next we store the length as the byte difference of the current address and the starting address of the string.

mov eax, 4  ; write to file
 mov ebx, 1  ; STDOUT handle
 mov ecx, hello ; our message
 mov edx, length ; size of message
 int 80h   ; execute the syscall

The code above is just an assembler sequence of making system calls to the OS. EAX, EBX, ECX and EDX are extended registers of the CPU. The 4 stored in EAX means to write to a file, and the 1 in EBX indicates that it is the stdout device (the console screen). In Ubuntu/Linux, all input ouput devices are abstracted as files! The code int 80h is an interrupt system call. The routine for handling the 80h service will find the parameters stored the used registers.


xor ebx, ebx  ; send 0 as 'exit code'
 mov eax, 1  ; terminate process
 int 80h   ; execute the syscall

The last three assembler lines makes a clean exit to the console or whatever caller.

Now let us "compile" the human readable to a more machine readable object code. We call the nasm assembler with the incantation

nasm -f elf hello.asm


The output is hello.o. If you are curious, we can use hexdump to view the hello.o file


hexdump -C hello.o
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 40 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |@.......4.....(.|
00000030 07 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 |................|
00000070 03 00 00 00 00 00 00 00 60 01 00 00 0e 00 00 00 |........`.......|
00000080 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00000090 07 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |................|
000000a0 70 01 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 |p...............|
000000b0 10 00 00 00 00 00 00 00 0d 00 00 00 03 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 90 01 00 00 31 00 00 00 |............1...|
000000d0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000000e0 17 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 d0 01 00 00 70 00 00 00 05 00 00 00 06 00 00 00 |....p...........|
00000100 04 00 00 00 10 00 00 00 1f 00 00 00 03 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 40 02 00 00 1f 00 00 00 |........@.......|
00000120 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000130 27 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 |'...............|
00000140 60 02 00 00 08 00 00 00 04 00 00 00 02 00 00 00 |`...............|
00000150 04 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 |................|
00000160 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a 00 00 |Hello, World!...|
00000170 b8 04 00 00 00 bb 01 00 00 00 b9 00 00 00 00 ba |................|
00000180 0e 00 00 00 cd 80 31 db b8 01 00 00 00 cd 80 00 |......1.........|
00000190 00 2e 64 61 74 61 00 2e 74 65 78 74 00 2e 73 68 |..data..text..sh|
000001a0 73 74 72 74 61 62 00 2e 73 79 6d 74 61 62 00 2e |strtab..symtab..|
000001b0 73 74 72 74 61 62 00 2e 72 65 6c 2e 74 65 78 74 |strtab..rel.text|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001e0 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
00000210 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 |................|
00000220 11 00 00 00 0e 00 00 00 00 00 00 00 00 00 f1 ff |................|
00000230 18 00 00 00 00 00 00 00 00 00 00 00 10 00 02 00 |................|
00000240 00 68 65 6c 6c 6f 2e 61 73 6d 00 68 65 6c 6c 6f |.hello.asm.hello|
00000250 00 6c 65 6e 67 74 68 00 5f 73 74 61 72 74 00 00 |.length._start..|
00000260 0b 00 00 00 01 02 00 00 00 00 00 00 00 00 00 00 |................|
00000270


The hello.o file, at 624 bytes is actually larger than the original 366 bytes asm file.

But hello.o is not executible! To make it one, we have to invoke the ld (or linker loader)
as follows: ld -o hello hello.o

This creates the hello ELF excutible hello. But unfortunately, what we get instead is the error


$ ld -o hello hello.o
ld: i386 architecture of input file `hello.o' is incompatible with i386:x86-64 output


Of course, the linker loader which was installed by Ubuntu was for out 64 bit AMD Turion powered laptop and it complained that our hello.asm was actually compiled for a 32 bit system!

Just one of those complications in software development. :(

The available output formats, obtained by typing nasm -hf for object files are shown below.


valid output formats for -f are (`*' denotes default):
* bin flat-form binary files (e.g. DOS .COM, .SYS)
ith Intel hex
srec Motorola S-records
aout Linux a.out object files
aoutb NetBSD/FreeBSD a.out object files
coff COFF (i386) object files (e.g. DJGPP for DOS)
elf32 ELF32 (i386) object files (e.g. Linux)
elf ELF (short name for ELF32)
elf64 ELF64 (x86_64) object files (e.g. Linux)
as86 Linux as86 (bin86 version 0.3) object files
obj MS-DOS 16-bit/32-bit OMF object files
win32 Microsoft Win32 (i386) object files
win64 Microsoft Win64 (x86-64) object files
rdf Relocatable Dynamic Object File Format v2.0
ieee IEEE-695 (LADsoft variant) object file format
macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files
macho MACHO (short name for MACHO32)
macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files
dbg Trace of all info passed to output stage
$


We will return to this after a short rest.

We have returned!


nasm -f elf64 hello.asm
This command results in a larger 864 bytes object file. Now the linker-loader does not complain anymore and it outputs a green colored executible file hello at 943 bytes!
If you are interested in making the output file smaller, try the strip command. This results in a smaller 504 bytes file.

Typing ./hello in the terminal resulted in

$ ./hello
Hello, World!


More details for the ELF format: The ELF Object File Format by Dissection

Use synaptic to install nasm and other compiler for other languages or
sudo apt-get install nasm.
The era of downloading source codes and making configuration files has been eased out a bit by the package managers, but you can do it if you need to.

Friday, February 11, 2011

Welcome to my new programming blog

I learned Cobol and C at the National Computer Center in Camp Aguinaldo, but let me tell you, that you can learn programming on your own, only it may take you longer. A former colleague who was into Physics learned Java on his own and now occupies a top position at a banking institution.

I programmed for my MS thesis and PhD dissertation during my graduate days, both times on Crystallographic patterns. I have taught an introductory C programming course for BS Mathematics students.

I know a smattering of languages but my current favorite is Python. Am familiar with Forth,Cobol, C++/C, Pascal and Fortran. If necessary, I can program in Perl, Lisp and other inscrutable languages.

Programming involves an inquisitive, ordered and trained mind. Hard to believe but programing is applied Mathematics! Though janitors in India are supposed to know Java Enterprise programming, let us be productive in the days ahead in this blog. The blog will be a repository of ideas and daily experience in computer programming for the PC and other devices and for the web.

And we may go into independent software development when the time arises.