Saturday, February 12, 2011

Hello World in assembler (Ubuntu 64 bit)

We copy the following code, dated sep. 17, 2007, from an old daniweb discussion about using nasm.

global _start

section .data
 hello db "Hello, World!", 10
 length equ $-hello

section .text

_start:
 mov eax, 4  ; write to file
 mov ebx, 1  ; STDOUT handle
 mov ecx, hello ; our message
 mov edx, length ; size of message
 int 80h   ; execute the syscall

 xor ebx, ebx  ; send 0 as 'exit code'
 mov eax, 1  ; terminate process
 int 80h   ; execute the syscall


The only familiarity I had with assembler was way way back in 1992 when I was taking up my MS Applied Math, Major in Computer Science. We learned and have already forgotten how to do it directly in assembly, but we were also smart to use the ability of the Turbo C compiler to output assembler(!), using flag --S. Those were the days of mental exhiliration of seing what you typed is transformed into binary code and run as a tiny program.

section .data
 hello db "Hello, World!", 10
 length equ $-hello

Here we allocate a sequence of bytes, with a label hello to hold the string Hello World! followed by a line feed.(Carriage Return is coded as 13). Next we store the length as the byte difference of the current address and the starting address of the string.

mov eax, 4  ; write to file
 mov ebx, 1  ; STDOUT handle
 mov ecx, hello ; our message
 mov edx, length ; size of message
 int 80h   ; execute the syscall

The code above is just an assembler sequence of making system calls to the OS. EAX, EBX, ECX and EDX are extended registers of the CPU. The 4 stored in EAX means to write to a file, and the 1 in EBX indicates that it is the stdout device (the console screen). In Ubuntu/Linux, all input ouput devices are abstracted as files! The code int 80h is an interrupt system call. The routine for handling the 80h service will find the parameters stored the used registers.


xor ebx, ebx  ; send 0 as 'exit code'
 mov eax, 1  ; terminate process
 int 80h   ; execute the syscall

The last three assembler lines makes a clean exit to the console or whatever caller.

Now let us "compile" the human readable to a more machine readable object code. We call the nasm assembler with the incantation

nasm -f elf hello.asm


The output is hello.o. If you are curious, we can use hexdump to view the hello.o file


hexdump -C hello.o
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 40 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |@.......4.....(.|
00000030 07 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 |................|
00000070 03 00 00 00 00 00 00 00 60 01 00 00 0e 00 00 00 |........`.......|
00000080 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00000090 07 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |................|
000000a0 70 01 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 |p...............|
000000b0 10 00 00 00 00 00 00 00 0d 00 00 00 03 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 90 01 00 00 31 00 00 00 |............1...|
000000d0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
000000e0 17 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 d0 01 00 00 70 00 00 00 05 00 00 00 06 00 00 00 |....p...........|
00000100 04 00 00 00 10 00 00 00 1f 00 00 00 03 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 40 02 00 00 1f 00 00 00 |........@.......|
00000120 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000130 27 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 |'...............|
00000140 60 02 00 00 08 00 00 00 04 00 00 00 02 00 00 00 |`...............|
00000150 04 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 |................|
00000160 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a 00 00 |Hello, World!...|
00000170 b8 04 00 00 00 bb 01 00 00 00 b9 00 00 00 00 ba |................|
00000180 0e 00 00 00 cd 80 31 db b8 01 00 00 00 cd 80 00 |......1.........|
00000190 00 2e 64 61 74 61 00 2e 74 65 78 74 00 2e 73 68 |..data..text..sh|
000001a0 73 74 72 74 61 62 00 2e 73 79 6d 74 61 62 00 2e |strtab..symtab..|
000001b0 73 74 72 74 61 62 00 2e 72 65 6c 2e 74 65 78 74 |strtab..rel.text|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001e0 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
00000210 0b 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 |................|
00000220 11 00 00 00 0e 00 00 00 00 00 00 00 00 00 f1 ff |................|
00000230 18 00 00 00 00 00 00 00 00 00 00 00 10 00 02 00 |................|
00000240 00 68 65 6c 6c 6f 2e 61 73 6d 00 68 65 6c 6c 6f |.hello.asm.hello|
00000250 00 6c 65 6e 67 74 68 00 5f 73 74 61 72 74 00 00 |.length._start..|
00000260 0b 00 00 00 01 02 00 00 00 00 00 00 00 00 00 00 |................|
00000270


The hello.o file, at 624 bytes is actually larger than the original 366 bytes asm file.

But hello.o is not executible! To make it one, we have to invoke the ld (or linker loader)
as follows: ld -o hello hello.o

This creates the hello ELF excutible hello. But unfortunately, what we get instead is the error


$ ld -o hello hello.o
ld: i386 architecture of input file `hello.o' is incompatible with i386:x86-64 output


Of course, the linker loader which was installed by Ubuntu was for out 64 bit AMD Turion powered laptop and it complained that our hello.asm was actually compiled for a 32 bit system!

Just one of those complications in software development. :(

The available output formats, obtained by typing nasm -hf for object files are shown below.


valid output formats for -f are (`*' denotes default):
* bin flat-form binary files (e.g. DOS .COM, .SYS)
ith Intel hex
srec Motorola S-records
aout Linux a.out object files
aoutb NetBSD/FreeBSD a.out object files
coff COFF (i386) object files (e.g. DJGPP for DOS)
elf32 ELF32 (i386) object files (e.g. Linux)
elf ELF (short name for ELF32)
elf64 ELF64 (x86_64) object files (e.g. Linux)
as86 Linux as86 (bin86 version 0.3) object files
obj MS-DOS 16-bit/32-bit OMF object files
win32 Microsoft Win32 (i386) object files
win64 Microsoft Win64 (x86-64) object files
rdf Relocatable Dynamic Object File Format v2.0
ieee IEEE-695 (LADsoft variant) object file format
macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files
macho MACHO (short name for MACHO32)
macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files
dbg Trace of all info passed to output stage
$


We will return to this after a short rest.

We have returned!


nasm -f elf64 hello.asm
This command results in a larger 864 bytes object file. Now the linker-loader does not complain anymore and it outputs a green colored executible file hello at 943 bytes!
If you are interested in making the output file smaller, try the strip command. This results in a smaller 504 bytes file.

Typing ./hello in the terminal resulted in

$ ./hello
Hello, World!


More details for the ELF format: The ELF Object File Format by Dissection

Use synaptic to install nasm and other compiler for other languages or
sudo apt-get install nasm.
The era of downloading source codes and making configuration files has been eased out a bit by the package managers, but you can do it if you need to.

1 comment: