-
Notifications
You must be signed in to change notification settings - Fork 4
dwelch67/lc3
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
 |  | |||
Repository files navigation
This is a work in progress.... This repository is for learning assembly language programming, and other things using the little computer 3 instruction set. http://en.wikipedia.org/wiki/LC-3 --------------------------------------------------------------------- This document is targetted toward average programmers. C experience is not specifically required, but enough general programming knowledge to the point that you can read C programs is required as C will be used as a bridge between your programming knowledge and assembly language and the other topics discussed. The tools are all written in C so the ability to compile and run them is required. They are written to be very portable and will hopefully compile and run on your system. --------------------------------------------------------------------- What assembly language is not: Assembly language is not, generally, the language you use for device drivers, nor is the primary language for bare metal program or other forms of low level programming. Mostly C is used for such things but other languages work as well. Yes some assembly is required for some things, but it is way less than one percent of the overall code, in general. And yes some folks use it heavily for fun or to prove a point or other reasons. Assembly language is not a unversal language for all processors in the sense that there is one assembly language and one instruction set. This instruction set should really not take more than an evening to learn for an average programmer (any other language C, Python, whatever), the things that might cause a hang up would/should cause a similar hang up in any other language (C, Python, Java, whatever) like bitwise logical operations. Learning this lc-3 instruction set does not mean you can then run off and program in assembly language for your phone or computer. Those devices use a different instruction set and if you want to know the assembly language for those you need to learn that separately. The good side of this is that each new instruction set you learn is much easier than the last, much easier than learning multiple high level languages (C, Python, Java, whatever). Generally assembly languages are not standardized, what does conform to a hard standard is the instruciton set, the machine code, the bits that make the processor do things. If there is more than one assembler (the program, like a compiler, that converts the assembly language source code into something lower level, usually machine code) for a particular instruciton set, each author may choose to implement a completely different "language" as in the keywords and other syntax that make up the programming language. The GNU assembler is well known for messing up instruction sets. In this case (lc-3) I am not going to conform to the assembly language supported by the inventors text book. What assembly language is: Assembly language is a programming language like C or Python or Java. Assembly language instructions generally attempt to have a one to one relationship with machine code instructions. Another way to say this or think about this is that often a single line of assembly language results in a single machine code instruction. There are of course exceptions, directives used to communicate something to the assembler. High level languages C, Python, Java are converted from the text file form into something lower using a compiler. The term compiler is not normally used for assembly language, instead the term assembler is used. Unfortunately, it is not uncommon to see the term assembler used as a reference to the language not the tool used to convert the language into machine code. --------------------------------------------------------------------- The little computer 3, lc-3, is an educational instruction set, it is certainly possible to turn it into hardware and do useful things with it but it is better served as an educational tool. A stepping stone up to larger more complicated instruction sets. This tutorial and repository will attempt to show you not only how to get started in assembly language programming, but it will do this by showing you the machine code, showing you at least one instruction set simulator, an assembler and a disassembler. Most of the assembly language lessons are delivered in a way where you are expected to type them in yourself, assemble and run them with the simulator and get the same results as the tutorial. You of course could cut and paste them instead of typing but then you leave a hole in your education. The typing and debugging side of programming using these tools and this language. You will need appendix A from this book: http://highered.mcgraw-hill.com/sites/0072467509/ The direct link is here: http://highered.mcgraw-hill.com/sites/dl/free/0072467509/104653/PattPatelAppA.pdf Let me know if these links dont work or if this material becomes unavailable. ---------------------------------------- Lesson 0: Building and testing the tools These tools were written to be portable, so they should build and work on most systems. Windows, Linux, Apple, etc. If you dont already have a C compiler either a gcc or clang/llvm based compiler, pre-built binaries, should be easy to come by. The Makefile shows the command lines for each of the tools gcc : gcc -o lc3sim lc3sim.c readhex.c gcc -o lc3asm lc3asm.c gcc -o lc3dis lc3dis.c readhex.c gcc -o vsim vsim.c readhex.c clang: clang -o lc3sim lc3sim.c readhex.c clang -o lc3asm lc3asm.c clang -o lc3dis lc3dis.c readhex.c clang -o vsim vsim.c readhex.c If you have make then the default > make should build using gcc/gnu, if you use clang then > make clang will build using the clang/llvm compiler. To verify the tools work > ./lc3asm test.asm (on windows it might be in the form .\lc3asm.exe instead) > ./lc3asm test.asm label0000: [0x00003003] [top] 0 label0001: [0x00000005] [top] 1 0x3000: 0x9040 0x3001: 0x5060 0x3002: 0x1027 0x3003: 0xF067 0x3004: 0x103F 0x3005: 0x0BFD 0x3006: 0xF025 > ./lc3dis test.asm.hex 0x3000: 0x9040 not r0,r1 0x3001: 0x5060 and r0,r1,#0x0000 0x3002: 0x1027 add r0,r0,#0x0007 0x3003: 0xF067 trap 0x67 0x3004: 0x103F add r0,r0,#0xFFFF 0x3005: 0x0BFD brnp 0x3003 ; -3 0x3006: 0xF025 trap 0x25 > ./lc3sim test.asm.hex show 0x0007 show 0x0006 show 0x0005 show 0x0004 show 0x0003 show 0x0002 show 0x0001 halt Assuming test.asm has not been changed the output from these three tools should match the above. ---------------------------------------- Lesson 1: Getting started Reminder, don't cut and paste, type the following program into a text editor, using the commands from lesson one (lc3asm, lc3sim) assemble and run the program. ;---- lesson 1 ---- add r0,r0,#1 trap 0x67 add r0,r0,#2 trap 0x67 trap 0x25 ;---- lesson 1 ---- ./lc3asm lesson1.asm 0x3000: 0x1021 0x3001: 0xF067 0x3002: 0x1022 0x3003: 0xF067 0x3004: 0xF025 ./lc3sim lesson1.asm.hex show 0x1235 show 0x1237 halt What does it do? Here is where we dive in. First, registers, registers in the context of processors are similar to variables in a programming language, except there are a fixed number of them. Different processor architectures have a different number of registers. In general some registers are considered general purpose and some special purpose, again different architectures have different mixes of special and general and it is not uncommon to have some special registers that can be used or accessed as general purpose registers. The lc3 architecture has 8 registers, named r0,r1,r2,...,r7. They are all considered general purpose, but a couple do have a special purpose as well. R6 is used as the stack pointer (The stack and stack pointer will be covered in a future lesson). R7 is used as a return address for function/subroutine calls (a future lesson). Now in general processors do not actually halt, you turn them on, they start up based on some known rules, and then run forever. When you run a program on your computer, that program may start and finish but the operating system keeps running. Even if it shows 0% cpu usage that means applications, the operating system keeps running, it has an idle loop that measures how much time is the operating system idling vs running other software. It is still running. There are some microcontrollers, that are designed to hibernate or go into some sort of low power mode that has a fast wake up, like pressing the button on your television remote control. For educational purposes this processor does have a halt command so that programs like the one above can run briefly then halt. It is not uncommon for a processor architecture to have a specific instruction for making system calls. System calls meaning a place where the operating system in a loose sense of the term can wait for applications to ask for something from the operating system printing, file I/O, etc. The lc3 architecture uses the trap instruction for system calls. The format of the trap instruction is trap vector where vector is a number from 0 to 255 The vector 0x25 has been defined in the lc3 documentation as halt. My simulator also uses that trap to halt. I have added my own trap, number 0x67 to print the contents of register r0, as a debugging and demonstration tool. For my simulators these are done using a back door a non-realistic trick in the simulator and not actual lc3 code talking to virtual lc3 hardware to perform this action. Both actions (0x25 and 0x67) could be done using lc3 virtual hardware and lc3 software. There are two forms of the add instruction add rd,sr1,sr2 add rd,sr1,#immed5 The first of the two performs the operation rd = sr1 + sr2, where rd, sr1, and sr2 are registers. Any of the 8 registers can be used in any of these three places, you do not have to repeat any registers add r1,r2,r3 ; r1 = r2 + r3 but you could if you wanted to add r1,r1,r1 ; r1 = r1 + r1 The other form has two registers and an immediate add rd,sr1,#immed5 ; rd = sr1 + immed immed5 is a sign extended 5 bit immediate. The term immediate means a constant that is encoded in the instruction itself. We will cover the instruction encoding later, for now understand that this form of the add instruction has 5 bits for an immediate value and the most significant of those bits is sign extended to create a 16 bit number. 0xxxx = 000000000000xxxx 1xxxx = 111111111111xxxx Where x can be any combination of bits from 0000 to 1111. 0000000000000000 0000000000000001 0000000000000010 ... 0000000000001101 0000000000001110 0000000000001111 1111111111110000 1111111111110001 1111111111110010 ... 1111111111111101 1111111111111110 1111111111111111 0x0000 to 0x000F and 0xFFF0 to 0xFFFF. If not already you should strengthen your understanding of twos complement this has nothing really to do with assembly language as it is used in many/all languages. In this case 0x0000 to 0x000F 0 to 15 0xFFF0 to 0xFFFF -16 to -1 Larger constants will be covered in another lesson. With hardware, unless it is well defined, never assume the hardware is initialized to anything. General purpose registers, and memory are two places in particular where you should not make assumptions. My lc3sim.c simulator as of this writing resets all the registers to the value 0x1234, but dont assume that will always be the case. In order to not overly complicate the example, this example does not manage the initialization of r0 properly, it just uses it as is. The purpose of this example is to demonstrate the add immediate so it works despite the initial value of r0. At this point we should be ready to walk through the code. add r0,r0,#1 this means r0 = r0 + 1 trap 0x67 this is a system call specific to this simulator it prints out the contents of r0. The output of lc3sim below, the first show line 0x1235 is the result of this trap 0x67 instruction. add r0,r0,#2 This performs the operation r0 = r0 + 2. if r0 were 0x1235 as we see from the prior trap 0x67, then r0 = 0x1235 + 2 = 0x1237 trap 0x67 outputs the current state of r0, which is now 0x1237 as expected trap 0x25 this halts the simulation shown as a halt below ./lc3sim lesson1.asm.hex show 0x1235 show 0x1237 halt ---------------------------------------- Lesson 2: Unconditional Branch ;---- lesson 2 ---- trap 0x67 add r0,r0,#1 brnzp skip add r0,r0,#2 skip: trap 0x67 trap 0x25 ;---- lesson 2 ---- ./lc3asm lesson.asm label0000: [0x00000002] [skip] 1 label0001: [0x00003004] [skip] 0 0x3000: 0xF067 0x3001: 0x1021 0x3002: 0x0E01 0x3003: 0x1022 0x3004: 0xF067 0x3005: 0xF025 ./lc3sim lesson.asm.hex show 0x1234 show 0x1235 halt Some assembly languages use the word branch, some use jump, some use both. From a programmers perspective processor instructions execute linearly in the order written by the programmer, until you hit a branch or jump which causes execution to conditionally or unconditionally jump to a new location in the code. In C for example a = 7; if(b == 12) { a++; } c = a + 10; there are two possible paths through this code a = 7; c = a + 10; or a = 7; a++; c = a + 10; The a++; is conditionally executed, otherwise it is skipped, the code jumps over the contents in the brackets and executes what is after. Code like this might not have much value on the surface, but it demonstrates an unconditional branch, the a++; is never executed, the execution jumps over and continues after the if statement. a = 7; if(0) { a++; } c = a + 10; trap 0x67 start off by seeing what r0 is right now, in this case 0x1234 (remember never assume its power on state). add r0,r0,#1 r0 = r0 + 1 = 0x1235 brnzp skip For the lc3 processor this is an unconditional branch. Why it is unconditional and what the nzp mean are for a future lesson. What it means in this form is "no matter what jump to the code starting at/after the label skip: add r0,r0,#2 IF EXECUTED this code would perform r0 = r0 + 2 = 0x1235 + 2 = 0x1237 skip: This is a label, for my lc3asm assembly language a label cannot start with a number, cannot have spaces, cannot be too long, and ends with a colon. The unconditional branch above is branching/jumping the code execution to this label. Basically the add r0,r0,#2 is being passed over and not executed. trap 0x67 print out the current state of r0, we see 0x1235 so that confirms that the add r0,r0,#2 was not executed trap 0x25 halt the simulation ---------------------------------------- Lesson 3: Logical operations AND and NOT ;---- lesson 3 ---- trap 0x67 add r1,r0,#0 not r0,r1 trap 0x67 and r0,r0,r1 trap 0x67 add r0,r0,#7 trap 0x67 trap 0x25 ;---- lesson 3 ---- ./lc3asm lesson.asm 0x3000: 0xF067 0x3001: 0x1220 0x3002: 0x9040 0x3003: 0xF067 0x3004: 0x5001 0x3005: 0xF067 0x3006: 0x1027 0x3007: 0xF067 0x3008: 0xF025 ./lc3sim lesson.asm.hex show 0x1234 show 0xEDCB show 0x0000 show 0x0007 halt For whatever reason it seems to me that folks get tripped up in bit manipulation and logic operations more in assembly language than in other languages. Most/all of the languages have logic operations as well AND, OR, NOT, XOR, etc and there is little to no difference at the assembly language level. Being a minimalist instruction, lc3 only has NOT and AND operations, OR, XOR, etc have to be derived from those (future lesson). The NOT operation means if the input bit is a 0 the output bit is a 1, if the input bit is a 1, the output bit is a 0. The operation does this on a bit by bit basis each bit in the input register is processed separately from a functional perspective from the other bits in that register. So 0001001000110100 input 1110110111001011 output With pencil and paper first you need to be able to get from hex to binary. I used to keep a strip of paper taped to my monitor with hex and binary for the numbers 0 - 15, it didnt take long to memorize them, for decimal to/from hex I just use a calculator. Once written as binary (the above input number is 0x1234 BTW) then the not operation is trivial where you see a 0 write a 1 where you see a 1 write a 0. The AND operation is also a logical operation but involves two inputs and one output. The truth table is as follows 00 0 01 0 10 0 11 1 The left two columns are the possible combinations of inputs for two bits. The right column is the result of the and operation of those two input bits on a line. Basically if both input bits are ones then the output is a one otherwise the output is a zero. Or another way to say this is anything anded with 0 is 0 00 0 10 0 and anything anded with 1 is itself 01 0 11 1 it commutes, the first operand could be the anything anded with zero 00 0 01 0 or anything anded with one 10 0 11 1 If I perform the operation 0x1234 AND 0x00FF, we look at these per bit which means per column, if both items in the column are a one then the result is a one, otherwise the result is a zero 0001001000110100 0000000011111111 AND ===================== 0000000000110100 so 0x1234 AND 0x00FF = 0x0034 Walking through the program trap 0x67 this shows us what the current state of r0 is (0x1234) add r1,r0,#0 r1 = r0 + 0, basically copy r0 to r1 not r0,r1 perform a bit by bit not operation on r1 the result goes in r0 0001001000110100 r1 1110110111001011 r0 r0 = ~r1 = 0xEDCB trap 0x67 print out r0, and it confirms r0 = 0xEDCB and r0,r0,r1 r0 = r0 AND r1 1110110111001011 r0 0001001000110100 r1 ===================== 0000000000000000 r0 this was intentional, if you think about it, to get a result of 1 out of an and you need both inputs to be a 1. One register is the inverse of the other, no matter what r0 was before all of this started this is guaranteed to never have a condition where there are two ones. trap 0x67 this confirms r0 = 0x0000 add r0,r0,#7 r0 = r0 + 7 = 7 trap 0x67 print out the 0x0007 trap 0x25 halt There will be faster ways to do this in future examples, but this sequence demonstrated a sequence to zero out r0 by anding the not of r0 to itself. ---------------------------------------- Lesson 4: Flags and conditional branch ;---- lesson 4 ---- trap 0x67 add r1,r0,#0 not r0,r1 trap 0x67 and r0,r0,r1 trap 0x67 add r0,r0,#7 trap 0x67 top: add r0,r0,#0xFFFF trap 0x67 trap 0x68 brnp top trap 0x25 ;---- lesson 4 ---- ./lc3asm lesson.asm label0000: [0x00003008] [top] 0 label0001: [0x0000000B] [top] 1 0x3000: 0xF067 0x3001: 0x1220 0x3002: 0x9040 0x3003: 0xF067 0x3004: 0x5001 0x3005: 0xF067 0x3006: 0x1027 0x3007: 0xF067 0x3008: 0x103F 0x3009: 0xF067 0x300A: 0xF068 0x300B: 0x0BFC 0x300C: 0xF025 ./lc3sim lesson.asm.hex show 0x1234 show 0xEDCB show 0x0000 show 0x0007 show 0x0006 psr 0x0001 show 0x0005 psr 0x0001 show 0x0004 psr 0x0001 show 0x0003 psr 0x0001 show 0x0002 psr 0x0001 show 0x0001 psr 0x0001 show 0x0000 psr 0x0002 halt So lesson 3 got us to the point where register 0 was zeroed out then 7 was added to leave r0 with a 7. This example shows a simple count down loop. The instruction set documentation has a plus sign (+) next to many of the instructions. This plus sign indicates the psr (processor state register) flags are changed as a result of that operation. Like the PC in this architecture, the PSR is not directly visible to the programmer. What are flags? First off not all processors use flags, second processors that use flags dont use the same flags or use them in the same way. This processor has an interesting mix of flags not normally seen. This processor has three flags the n flag, when set indicating the result of the operation was negative (the msbit, bit 15, was set, think about twos complement). The z flag which indicates the result was exactly zero and the p flag which indicates the result was positive. From the lc3 instruction set reference: The condition codes are set, based on whether that result, taken as a 16-bit 2’s complement integer, is negative (N = 1; Z, P = 0), zero (Z = 1; N, P = 0), or positive (P = 1; N, Z = 0). All other LC-3 instructions leave the condition codes unchanged. So, basically you can only have one flag set at a time through normal operations, and also you will always have at least one flag set. There is a way to mess with these rules by messing with the psr, that is a separate discussion. For this example only one of the flags is set at a time and as important at least one flag is set. add r0,r0,#0xFFFF We dont have a subtract, we dont need a subtract, in logic a subtract uses the add logic anyway, remember with twos complement to negate a number you invert and add one. If I want to subtract 1 0x0001 one 0xFFFE invert 0xFFFF and add one so subtracting 1 is the same as adding -1. The add immediate allows us to add small negative numbers so add r0,r0,#0xFFFF means r0 = r0 - 1; With my lc3 assembler a label is of the form: top: some text that doesnt start with a zero, has no spaces and ends with a colon. Not unlike a label in C which you would use with a goto if using labels and goto's were not considered a bad thing in C. Despite being bad perhaps you did at least learn how to use them. Will be used pretty much the same way here: top: r0 = r0 - 1; if(r0 != 0) goto top; We saw the add r0,r0,#0xFFFF implements the r0 = r0 - 1; trap 0x67 we know prints out the value of r0 trap 0x68 prints out the psr note that in the reference manual the trap instruction itself DOES NOT modify the flags. If you look at the source we see that this simulator does not actually call other code for the trap 0x67 (that other code COULD modify the flags). Assume here that the flags are not touched these traps are so you can see what is going on, you normally would not have them there nor take the risk that they could be implemented with code that doesnt preserve the psr. So the flags dont change between the add and the brnp brnp top What is brnp? The instruction is really br for branch. It has three optional modifiers one for checking the n bit, one for checking the z bit and one for checking the p bit. We are looking for the result of the subtract to be zero. When the result of the subtract is a zero the n bit is 0, the z bit is 1, the p bit is 0. If you read the description of the br instruciton in the reference manual you will see if ((n AND N) OR (z AND Z) OR (p AND P)) PC = PC‡ + SEXT(PCoffset9); The logic compares the n, z, and p test bits in the instruction with the N, Z, and P flags in the PSR. If any of those comparisions using an and is true then the branch happens. If all three of those comparisons is false, a zero, then the branch does not happen the code falls through. The syntax brnp means test for the n and p flags. We are looking for the case where the z flag is set and the n and p flags are zero. So brnp means we want to branch whenever the z flag is not set, when the z flag is not set either the n or p flag is set and that would result in a branch. The syntax in the assembly language is to specify the label to branch to, just like in C. The assembler will take care of figuring out the number of instructions to jump over and as a result will figure out the machine code for this instruction. We can see from the trap 0x67 (show r0) and trap 0x68 (psr) outputs show 0x0006 psr 0x0001 show 0x0005 psr 0x0001 show 0x0004 psr 0x0001 show 0x0003 psr 0x0001 show 0x0002 psr 0x0001 show 0x0001 psr 0x0001 show 0x0000 psr 0x0002 halt r0 is having one subracted per loop, 6,5,4,3... being positive numbers the p flag is set for the non-zero numbers. When the result is a zero the psr changes to 0x0002 which is the z flag set, the n and p flags are zero. So the brnp does not happen, the code falls through to the trap 0x25 which is a halt. So in summary a brnp means if the n or p flags is set then branch otherwise dont. Another way to say that is if the z flag is not set then branch. Likewise a brnz would mean if the n or z flags are set then branch otherwise dont. Or if the p flag is not set then branch. ---------------------------------------- Lesson 5: The load instruction and more on branching ;---- lesson 5 ---- trap 0x67 ld r0,abc trap 0x67 top: add r0,r0,#0xFFFF trap 0x67 trap 0x68 brz get_out brnzp top get_out: trap 0x25 abc: .word 0x0007 ;---- lesson 5 ---- ./lc3asm lesson.asm label0000: [0x00000001] [abc] 1 label0001: [0x00003003] [top] 0 label0002: [0x00000006] [get_out] 1 label0003: [0x00000007] [top] 1 label0004: [0x00003008] [get_out] 0 label0005: [0x00003009] [abc] 0 0x3000: 0xF067 0x3001: 0x2007 0x3002: 0xF067 0x3003: 0x103F 0x3004: 0xF067 0x3005: 0xF068 0x3006: 0x0401 0x3007: 0x0FFB 0x3008: 0xF025 0x3009: 0x0007 ./lc3sim lesson.asm.hex show 0x1234 show 0x0007 show 0x0006 psr 0x0001 show 0x0005 psr 0x0001 show 0x0004 psr 0x0001 show 0x0003 psr 0x0001 show 0x0002 psr 0x0001 show 0x0001 psr 0x0001 show 0x0000 psr 0x0002 halt This example is very similar to the previous example. The primary differences are this example uses the load (ld) instruction to read a value from memory and place it in a register. And the branching in the loop is a little different, instead of branching if not zero the loop is an infinite loop with an unconditional branch, inside the loop there is a branch if zero to get out of the loop. In C: r0 = 7; top: r0 = r0 - 1; if(r0==0) goto get_out; goto top; get_out: ld r0,abc The assembler makes life easy by letting you use labels for loading or storing values. Load (ld) is used if you are going from memory to a register, basically it is a read from memory. Store (st) is used for going from a register to memory, basically a write. In this case there is a label abc which is followed by a .word directive. For this architecture/assembler a word is defined as 16 bits the .word directive tells the assembler the value that follows needs to be put in the program at this location. Unless you are putting machine code in your program you want to avoid using a .word in an execution path. You normally dont want to execute .word lines. abc: .word 0x0007 The value at abc at assembly time is 0x0007. So basically ld r0,abc In one instruction replaces these steps in the prior two lessons. add r1,r0,#0 not r0,r1 and r0,r0,r1 add r0,r0,#7 Even better the add immediate is very limited in what immediate values can be used. Where a .word entry can hold any bit pattern, for example ld r0,xyz ... xyz: .word 0x1234 Would put 0x1234 in r0, but this is illegal: add r1,r0,#0 not r0,r1 and r0,r0,r1 add r0,r0,#0x1234 It would take several add instructions to get 0x1234 into r0. brz get_out brnzp top get_out: brz means branch if the z flag is set. When r0 contains a 1 and then we subtract 1 (add 0xFFFF) the result is r0 equals zero. At that time the z flag is set, the n and p flags are not set. So a branch if zero will branch, which hops over the brnzp instruction and gets us out of the loop. If the z flag is not set then we execute the brnzp instruction which is also known as an unconditional branch. We assume that at least one of the flags is set at all times so a brnzp means branch if n or z or p is set, which should always be true. So top: ... brnzp top Is an infinite loop brz get_out ... get_out: is a conditional branch that gets us out of the loop. Another way to visualize this program in C is: r0 = 7; while(1) { r0 = r0 - 1; if(r0==0) break; } ---------------------------------------- Lesson 6: The store instruction ;---- lesson 6 ---- ld r0,abc add r0,r0,#0x9 st r0,xyz ld r0,abc trap 0x67 ld r0,xyz trap 0x67 trap 0x25 abc: .word 0x1234 xyz: .word 0xABCD ;---- lesson 6 ---- ./lc3asm lesson.asm label0000: [0x00000000] [abc] 1 label0001: [0x00000002] [xyz] 1 label0002: [0x00000003] [abc] 1 label0003: [0x00000005] [xyz] 1 label0004: [0x00003008] [abc] 0 label0005: [0x00003009] [xyz] 0 0x3000: 0x2007 0x3001: 0x1029 0x3002: 0x3006 0x3003: 0x2004 0x3004: 0xF067 0x3005: 0x2003 0x3006: 0xF067 0x3007: 0xF025 0x3008: 0x1234 0x3009: 0xABCD ./lc3sim lesson.asm.hex show 0x1234 show 0x123D halt from lesson 5 we know that load means from ram to a register, a read, store means from register to ram, a write. ld r0,abc read from memory at/after label abc which is 0x1234, r0 = 0x1234 add r0,r0,#0x9 r0 = r0 + 9 = 0x123D st r0,xyz store r0 to memory at/after label xyz, so the 0xABCD is overwritten with 0x123D ld r0,abc load r0 with the value 0x1234 trap 0x67 show 0x1234 ld r0,xyz load r0 with the value 0x123D trap 0x67 show 0x123D trap 0x25 halt ---------------------------------------- Lesson 7: Load and store indirect and load effective address ;---- lesson 7 ---- lea r0,here st r0,not_here ld r1,abc sti r1,not_here ldi r0,not_here trap 0x67 ld r0,not_here trap 0x67 ld r0,here trap 0x67 trap 0x25 not_here: .word 0x5678 abc: .word 0x1234 here: .word 0xABCD ;---- lesson 7 ---- ./lc3asm lesson.asm label0000: [0x00000000] [here] 1 label0001: [0x00000001] [not_here] 1 label0002: [0x00000002] [abc] 1 label0003: [0x00000003] [not_here] 1 label0004: [0x00000004] [not_here] 1 label0005: [0x00000006] [not_here] 1 label0006: [0x00000008] [here] 1 label0007: [0x0000300B] [not_here] 0 label0008: [0x0000300C] [abc] 0 label0009: [0x0000300D] [here] 0 0x3000: 0xE00C 0x3001: 0x3009 0x3002: 0x2209 0x3003: 0xB207 0x3004: 0xA006 0x3005: 0xF067 0x3006: 0x2004 0x3007: 0xF067 0x3008: 0x2004 0x3009: 0xF067 0x300A: 0xF025 0x300B: 0x5678 0x300C: 0x1234 0x300D: 0xABCD ./lc3sim lesson.asm.hex show 0x1234 show 0x300D show 0x1234 halt lea means load effective address, or even simpler "load address". lea r0,here means load the address of the label here into the register 0. There are a couple of ways we can confirm what is going on not_here: .word 0x5678 abc: .word 0x1234 here: .word 0xABCD The assembler is dumping the machine code: 0x300B: 0x5678 0x300C: 0x1234 0x300D: 0xABCD here is at address 0x300D so lea r0,here results in r0 = 0x300D st r0,not_here this is a normal store, stores the value 0x300D to the word at the label not_here, the 0x5678 that was there is overwritten. ld r1,abc r1 = 0x1234 sti r1,not_here as with a normal load and store store means write, load means read. indrect is like using a pointer in C. unsigned *not_here; not_here=(unsigned *)0x300D; //lea r0,here ; st r0,not_here r1 = 0x1234; *not_here = r1; //sti r1,not_here From the description in the instruction set reference mem[mem[not_here]] = r1 mem[0x300D] = 0x1234 //write to memory at 0x300D ldi r0,not_here store is write, load is read r0 = mem[mem[not_here]] r0 = mem[0x300D] r0 = 0x1234 trap 0x67 show 0x1234 ld r0,not_here r0 = 0x300D trap 0x67 show 0x300D ld r0,here The lea and st combination up front made not_here hold the address of here. So the sti instruction caused the value to be stored to the memory location with the label here. So here contains 0x1234 r0 = 0x1234 trap 0x67 show 0x1234 trap 0x25 halt ---------------------------------------- Lesson 8: Jump ;---- lesson 8 ---- lea r1,skipper jmp r1 ld r0,abc trap 0x67 skipper: ld r0,xyz trap 0x67 trap 0x25 abc: .word 0x1234 xyz: .word 0x4321 ;---- lesson 8 ---- ./lc3asm lesson.asm label0000: [0x00000000] [skipper] 1 label0001: [0x00000002] [abc] 1 label0002: [0x00003004] [skipper] 0 label0003: [0x00000004] [xyz] 1 label0004: [0x00003007] [abc] 0 label0005: [0x00003008] [xyz] 0 0x3000: 0xE203 0x3001: 0xC040 0x3002: 0x2004 0x3003: 0xF067 0x3004: 0x2003 0x3005: 0xF067 0x3006: 0xF025 0x3007: 0x1234 0x3008: 0x4321 ./lc3sim lesson.asm.hex show 0x4321 halt The jump instruction (jmp) is very similar to the unconditional branch. In this architecture the jmp branches to an address held in a register. lea r1,skipper jmp r1 Lea loads the address of skipper into the r1 register. Then the jmp instruction branches or jumps to the address in r1 which is now the address at the label skipper. So this pair of instructions performs an unconditional branch to the label skipper. It is another way of doing brnzp skipper Now why would you use jmp instead of a branch? Well the branch is relative to the pc and has a limited number of instructions it can branch forward or back. Where a register can contain any address (assuming some other method than lea to put an address in that register, as lea is limited in its range as well). So we have jumped over the load of abc and the trap to the skipper label then it is a simple load of xyz (0x4321) and then show that value, then halt. ---------------------------------------- Lesson 9: Jump to subroutine ;---- lesson 9 ---- jsr myfun jsr more_fun trap 0x25 myfun: add r0,r7,#0 trap 0x67 ret more_fun: add r0,r7,#0 trap 0x67 add r3,r7,#0 lea r1,myfun jsrr r1 add r0,r7,#0 trap 0x67 add r7,r3,#0 add r0,r7,#0 trap 0x67 jmp r7 ;---- lesson 9 ---- ./lc3asm lesson.asm label0000: [0x00000000] [myfun] 1 label0001: [0x00000001] [more_fun] 1 label0002: [0x00003003] [myfun] 0 label0003: [0x00003006] [more_fun] 0 label0004: [0x00000009] [myfun] 1 0x3000: 0x4802 0x3001: 0x4804 0x3002: 0xF025 0x3003: 0x11E0 0x3004: 0xF067 0x3005: 0xC1C0 0x3006: 0x11E0 0x3007: 0xF067 0x3008: 0x17E0 0x3009: 0xE3F9 0x300A: 0x4040 0x300B: 0x11E0 0x300C: 0xF067 0x300D: 0x1EE0 0x300E: 0x11E0 0x300F: 0xF067 0x3010: 0xC1C0 ./lc3sim lesson.asm.hex show 0x3001 show 0x3002 show 0x300B show 0x300B show 0x3002 halt Now it is time to call functions. There are a couple of common methods used by instruction sets for performing a call to a subroutine or a function call. If you think about a C program or any high level language when you make a function call, when the function is finished you expect your program to return to run the line after the function call. So what you want to do in assembly is you basically want to perform an unconditional branch but in such a way that you are able to branch back to the instruction after the branch. branch to function return: ... function: ... branch to return: And using brnzp we know how to do that now brnzp function return: ... function: ... brnzp return: Here is the problem though, what if we wanted to call the function more than once? brnzp function return: ... brnzp function another_return: ... function: ... brnzp return: Since we hardcoded the return the second time we wanted to use the function it would not go back to another_return it would go back to return. What use is a function you cannot use more than once? So the lc3 solution is to use one of the general purpose registers in a special purpose way. If you look at the descriptions for jsr the first thing it says is r7 = pc. Now when an instruction is executed the pc contains the address of the next instruction the instruction after the one we are executing. So r7 = pc means r7 will contain the return address for our function call. jsr myfun jsr more_fun ... myfun: add r0,r7,#0 trap 0x67 ret ... jsr myfun means jump to subroutine at the address at label myfun. but before jumping save the address to the next instruction (jsr more_fun) in the register r7. If jsr myfun is at address 0x3000 then jsr more_fun is at address 0x3001, so when we start running the function at myfun r7 will contain the value 0x3001. Myfun starts by moving the value in r7 to r0 (r0 = r7 + 0) then shows the contents of r0. The first output of the simulator is 0x3001 so that is as expected. Now the ret means jump to the address in r7. For lc3 assembly language the instruction ret is a pseudo instruction, the real instruction is jmp r7. The output of the assembler shows this (well you have to know how to read it, which is a future discussion). 0x3005: 0xC1C0 0x3010: 0xC1C0 At this point r7 contains 0x3001, so when we jump to 0x3001 we are basically returning from the subroutine/function thus the name of the pseudo instruction being ret, return. Now we are at the jsr more_fun instruction, we know how this part works we are going to jump to more_fun, but before doing that we save the address if the instruction after the jsr (0x3002 in this case) in r7. Now what we wanted to do was this main() { myfun(); more_fun(); } myfun() { return; } more_fun() { myfun(); return; } We want to nest function calls, have more_fun() call myfun() and then return to more_fun() then return to main(). Here is our problem. jsr more_fun there: more_fun: ; r7 contains the return address there jsr myfun ; this changes r7 to the address here here: ;r7 was modified by the jsr myfun, it still contains the address here ret ; this returns to here, not there! Later we will find out about a more portable and proper solution but for now what if we save the contents of r7 in some other register. jsr more_fun there: more_fun: ; r7 contains the return address there add r3,r7,#0 ; now r3 contains the return address there jsr myfun ; this changes r7 to the address here here: ; r7 contains here, r3 contains there so add r7,r3,#0 ; now r7 contains there ret ; returns to there In this particular case we could have taken a shortcut and saved an instruction: here: ; r7 contains here, r3 contains there so jmp r3 The full more_fun function has a bunch of show calls in it more_fun: add r0,r7,#0 ; contains 0x3002 return address after jsr more_fun r0 = r7 trap 0x67 ; show 0x3002 add r3,r7,#0 ; r3 = r7 = 0x3002 lea r1,myfun ; r1 = myfun jsrr r1 ; this is another form of jump subroutine, instead of an ; address you specify a register that contains the address ; similar to jmp r1 but r7 is modified to the address after ; jsrr which is 0x300B ; note that myfun shows r7 which in myfun is 0x300B add r0,r7,#0 ; we are back from myfun, so r7 should contain the address ; after the jsrr r1 so r0 = 0x300B trap 0x67 ; show 0x300B add r7,r3,#0 ; r7 = r3 = 0x3002 add r0,r7,#0 ; r0 = r7 = 0x3002 trap 0x67 ; show 0x3002 jmp r7 ; this is the same as a ret, jump to what is in r7, 0x3002 And returning after jsr more_fun we simply halt. jsr myfun jsr more_fun trap 0x25 show 0x3001 first call to myfun show 0x3002 beginning of more_fun show 0x300B when more_fun calls myfun, in myfun show 0x300B when myfun returns to more_fun show 0x3002 when we restore r7 from r3 ---------------------------------------- Lesson 11: ldr, str, and labels ;---- lesson 11 ---- lea r1,base ldr r0,r1,#0 trap 0x67 ldr r0,r1,#1 trap 0x67 add r2,r1,#2 ldr r0,r2,#0 trap 0x67 add r2,r2,#1 ldr r0,r2,#0 trap 0x67 trap 0x25 base: .word 0x1111 .word 0x2222 .word 0x3333 .word 0x4444 ;---- lesson 11 ---- ./lc3asm lesson.asm label0000: [0x00000000] [base] 1 label0001: [0x0000300C] [base] 0 0x3000: 0xE20B 0x3001: 0x6040 0x3002: 0xF067 0x3003: 0x6041 0x3004: 0xF067 0x3005: 0x1462 0x3006: 0x6080 0x3007: 0xF067 0x3008: 0x14A1 0x3009: 0x6080 0x300A: 0xF067 0x300B: 0xF025 0x300C: 0x1111 0x300D: 0x2222 0x300E: 0x3333 0x300F: 0x4444 ./lc3sim lesson.asm.hex show 0x1111 show 0x2222 show 0x3333 show 0x4444 halt So we have seen the ld instruction where you specify the register to read the value into and the name of the label (an address) that you want to read from. Ldi uses a level of indirection instead of operating on the address specified, the address specified contains the address to operate on. The third flavor is is ldr. It is like ld but it reads from the address contained in the second register listed. ldr r3,r4,#0 Take the contents of r4, add the offset, use that as an address to read memory, take what you find in memory and put it in register r3. Before diving into the program, understand that a label in assembly language is just a variable that represents an address. An address that the assembler will figure out later, after you have written the program. It is not a function it is not a variable it is an address. lea r1,base Load the address base into r1 ldr r0,r1,#0 here is an ldr r0 = mem[r1+0] since r1 is base r0 = mem[base+0] = 0x1111 trap 0x67 show 0x1111 ldr r0,r1,#1 r0 = mem[r1+1] = mem[base+1] = 0x2222 trap 0x67 show 0x2222 add r2,r1,#2 Now do some math with that address r2 = r1 + 2 = base + 2 ldr r0,r2,#0 r0 = mem[r2+0] = mem[(base+2)+0] = mem[base+2] = 0x3333 trap 0x67 show 0x3333 add r2,r2,#1 r2 = r2 + 1 = (base+2) + 1 = base + 3 ldr r0,r2,#0 r0 = mem[r2+0] = mem[(base+3)+0] = mem[base+3] = 0x4444 trap 0x67 show 0x4444 trap 0x25 halt. Just like ld and st have a relationship, ld is load from memory into a register (a read), st is store from the register into memory (a write). ldr and str, same deal ldr is read from a register based address, str is write the contents of a register to a register based address. ---------------------------------------- Lesson 11: The stack ;---- lesson 11 ---- ld r6,stack_top ld r0,zero jsr fun trap 0x25 fun: trap 0x67 add r6,r6,#0xFFFE str r7,r6,#1 add r1,r0,#1 str r1,r6,#0 add r2,r1,#0xFFF9 brzp skip add r0,r1,#0 jsr fun skip: ldr r0,r6,#0 trap 0x67 ldr r7,r6,#1 add r6,r6,#0x0002 ret zero: .word 0x0000 stack_top: .word 0x6000 ;---- lesson 11 ---- ./lc3asm lesson.asm label0000: [0x00000000] [stack_top] 1 label0001: [0x00000001] [zero] 1 label0002: [0x00000002] [fun] 1 label0003: [0x00003004] [fun] 0 label0004: [0x0000000A] [skip] 1 label0005: [0x0000000C] [fun] 1 label0006: [0x0000300D] [skip] 0 label0007: [0x00003012] [zero] 0 label0008: [0x00003013] [stack_top] 0 0x3000: 0x2C12 0x3001: 0x2010 0x3002: 0x4801 0x3003: 0xF025 0x3004: 0xF067 0x3005: 0x1DBE 0x3006: 0x7F81 0x3007: 0x1221 0x3008: 0x7380 0x3009: 0x1479 0x300A: 0x0602 0x300B: 0x1060 0x300C: 0x4FF7 0x300D: 0x6180 0x300E: 0xF067 0x300F: 0x6F81 0x3010: 0x1DA2 0x3011: 0xC1C0 0x3012: 0x0000 0x3013: 0x6000 ./lc3sim lesson.asm.hex show 0x0000 show 0x0001 show 0x0002 show 0x0003 show 0x0004 show 0x0005 show 0x0006 show 0x0007 show 0x0006 show 0x0005 show 0x0004 show 0x0003 show 0x0002 show 0x0001 halt Time for a break, take a smoke, go to the bathroom, grab a coke or cup of coffee. Then come back. This one might be obvious or might twist your mind. Maybe you didnt understand recursion when they showed it to you in some other language. Maybe it will or wont make sense here. Likewise what is that thing called a stack, I couldnt see it in C or other languages...You are going to see it in assembly. In C you are taught about two kinds of variables. Global and local. Global is a bit more obvious, there is only one, everyone that touches it changes the only one. (Okay lets not get into a local with the same name and all that you know what I mean). Well the compiler when it does its thing, chooses a memory location, a single memory location, where that global will live. All accesses to that global use that memory location. You increment it or add it or subtract, it all to and from the same place. But what about local variables, how does that work? How does the compiler "allocate" some memory just for that function, does it call malloc()? No, that is not how it works. How could it? Malloc() is yet another function which probably has local variables, or at least could if it wanted to. First and foremost, the stack is just memory, plain old memory. What makes it special is that in some instruction sets there are special instructions with general purpose registers with a special feature, kind of like r7 in lc3. Or sometimes the stack pointer is like the pc in lc3 and not directly visible a special register. And sometimes, there are no special stack instructions at all the stack is implemented using a convention. Meaning every one agrees and/or understands that regiter blah will be reserved for use as the stack pointer. Lets take a break from that and look at some recursion. We know from C or other languages that you can call the function you are in, from the function you are in. Hopefully you do it in such a way as to not get into an infinite loop. Here is a simple example of recursion in C #include <stdio.h> #include <stdlib.h> #include <string.h> void fun ( int x ) { int y; printf("show %d\n",x); y = x+1; if(y<7) fun(y); printf("show %d\n",y); } int main ( void ) { fun(0); return(0); } The first call to fun() is with a parameter of 0. The first call to fun, x is zero. We print x (zero). A local variable y is set to x+1 = 0+1 = 1; And that is fine. So now if x < 7 we call fun again with y as the argument, so the second call to 1 is with an x of 1. Lets chart this out x = 0; print 0; y = 1; fun(1); x = 1; print 1; y = 2; fun(2); x = 2; print 2; y = 3; fun(3); x = 3; print 3; y = 4; fun(4); x = 4; print 4; y = 5; fun(5); x = 5; print 5; y = 6; fun(6); x = 6; print 6; y = 7; Now y is not less than 7 so we do not call fun(7), we are in the middle of fun(6); x = 6, y = 7, so we skip calling fun, we go right to print y, which is 7. And we return from fun(6); Now we are back to fun(5) where x = 5 and y = 6 we come back after the call to fun(y) after the if(6<7) fun(5); so we perform the print of y, which is a 6 and return from that function. This repeats until we hit the return from fun(0) which returns back to main. x = 0; print 0; y = 1; fun(1); x = 1; print 1; y = 2; fun(2); x = 2; print 2; y = 3; fun(3); x = 3; print 3; y = 4; fun(4); x = 4; print 4; y = 5; fun(5); x = 5; print 5; y = 6; fun(6); x = 6; print 6; y = 7; print 7; return; print 6; return; print 5; return; print 4; return; print 3; return; print 2; return; print 1; return; We learned from nested calls to functions before (jump to subroutine) that each time a function is called we need to keep a copy of the return value if we are going to call another function, because calling another funciton modifies r7 which holds the return value. Another thing we notice is that for each call to fun there is a new memory location needed to store y for that one call to the function, we could easily change the if(y<7) some larger number, any number and this code should work (so long as we dont "run out" of stack). The first and oldest call to fun() we remember the y and the return value for that call no matter how many have come in between, same for any of the calls. Well think about some note cards or index cards. Say we have a bunch of note cards. When the first call to fun is made, fun(0), we take two note cards on one card we write the return address, the contents of r7, and we place that card on the table. The second card we write the value for y, which is x+1 = 0+1 = 1; We place that second card on top of the first card, we begin to stack up note cards. Now the second call to fun, fun(1), we take two new note cards, the first one we write the return address the contents of r7 and place it on the stack, then we write the value of y = x+1 = 1+1 = 2 on the next note card and place that on the stack. We have added four cards thus far. Third call to fun, fun(2), we write the address in r7 on a card, put it on the stack, write the value of y (3) and put that on the stack. We have added six cards thus far, two for each call to fun(). Now we hit the special call fun(6). We have put 12 cards on the stack two for each value of x 0,1,2,3,4,5, we do the same again and put two more on the stack, the return value, and the value of y which is 7 in this case. We do not call fun() again this time because the rule doesnt match y is not less than 7. Now we can unwind this recursion using the stack. Take the top card off the stack, this contains the value 7 for this call to fun (fun(6)) and printf that value, discard that card. Take the next card off the stack, which is the return address for the call to fun(6), use that address to return to the calling instance. Now take the top card off the stack and print that value, 6, discard. Take the next card off the stack and use that to return. Adding something to a stack is called a push, removing something from a stack is called a push. push return f0; x = 0; print 0; y = 1; push y1; fun(1); push return f1; x = 1; print 1; y = 2; push y2; fun(2); push return f2; x = 2; print 2; y = 3; push y3; fun(3); push return f3; x = 3; print 3; y = 4; push y4; fun(4); push return f4; x = 4; print 4; y = 5; push y5; fun(5); push return f5; x = 5; print 5; y = 6; push y6; fun(6); push return f6; x = 6; print 6; y = 7; push y7; pop y7; print 7; pop f6; return to f6; pop y6; print 6; pop f5; return to f5; pop y5; print 5; pop f4; return to f4; pop y4; print 4; pop f3; return to f3; pop y3; print 3; pop f2; return to f2; pop y2; print 2; pop f1; return to f1; pop y1; print 1; pop f0; return to f0; So long as you have enough blank note cards and have enough room to stack them up, you could do this all day long. What if the first time when fun(0) pushed f0 on the stack there was already a stack of cards on the table, that had been placed by other functions. Would this still work? Yes, so long as we have room that is not a problem. The function pushes two things and pops two things basically it cleans up as it goes and returns with the stack the way it found it. So long as all functions do this, you dont care what is on the stack when you got there, you pile your things on then clean them off when you are done. So the word pointer had been used earlier, stack pointer to be exact. What if the note cards were memory locations? Locations in memory that the program is not using for anything else. And there is a register that contains an address somewhere in that memory. At address j. What if j were at some big address, far away from our program, and each time we pushed something on the stack what that really did was decrement the address in the stack pointer, and then write the new item at that address? If k is the value of the stack pointer when we made the first call to fun, fun(0) in main. j = k; push f0 (j--; *j=f0; j is k-1); print 0; push y1 (j--; *j=y1; j is k-2) fun(1) push f1 (j--; *j=f1; j is k-3); print 1; push y2 (j--; *j=y2; j is k-4) fun(1) push f2 (j--; *j=f2; j is k-5); print 2; push y3 (j--; *j=y3; j is k-6) fun(1) push f3 (j--; *j=f3; j is k-7); print 3; push y4 (j--; *j=y4; j is k-8) fun(1) push f4 (j--; *j=f4; j is k-9); print 4; push y5 (j--; *j=y5; j is k-10) fun(1) push f5 (j--; *j=f5; j is k-11); print 5; push y6 (j--; *j=y6; j is k-12) fun(1) push f6 (j--; *j=f6; j is k-13); print 6; push y7 (j--; *j=y7; j is k-14) pop (y = *j; which is y7 j++; j is k-13) print 7; pop (ret = *j; (f6) j++; j is k-12) pop (y = *j; which is y6 j++; j is k-11) print 6; pop (ret = *j; (f6) j++; j is k-10) pop (y = *j; which is y5 j++; j is k-9) print 5; pop (ret = *j; (f6) j++; j is k-8) pop (y = *j; which is y4 j++; j is k-7) print 4; pop (ret = *j; (f6) j++; j is k-6) pop (y = *j; which is y3 j++; j is k-5) print 3; pop (ret = *j; (f6) j++; j is k-4) pop (y = *j; which is y2 j++; j is k-3) print 2; pop (ret = *j; (f6) j++; j is k-2) pop (y = *j; which is y1 j++; j is k-1) print 1; pop (ret = *j; (f6) j++; j is k-0) and when finished j is equal to k. No matter what k was we end up where we started. Each function call ends up where it started, so long as the function has a balanced number of pushes and pops. yes, there are some systems where the stack "grows up" meaning the address gets larger when you push things on the stack. And there are systems where a push you write to memory first, then you change the stack pointer. The lc3 has one instruciton in hardware that operates in a certain way, we have not used that instruction yet, but the order of operations for a pop in that instruciton is read from the memory pointed to by the stack pointer (r6) then increment the stack pointer one location. So we need to be consistent. I have let the cat out of the bag, r6 is another general purpose register that also has a second life as a special purpose register, because it is used specifically by hardware as such. So we might as well use it all of the time as the stack pointer. We dont have to but that is another story. About this stack growning down or growing up. If you think about the use of memory. Programs in all processors I know of execute in increasing address order. If there is no branch the next instruction is the current instruction plus the size of the current instruction. In the lc3 architecture the next instruction, if no branch is taken, is at current instruction plus one. So code grows upward. When a program is compiled or assembled your program consumes some amount of memory starting at the entry point and going upward. There is a finite amount of memory, we have consumed some chunk of it for our program and some chunk of it for our variables, what is left is called the heap. The heap of memory left over. The heap is not something that has a direct hardware connection, there is no heap pointer or anything like that. the compiler or in assembly language the programmer must manage memory and knows where the end of the program is and the heap begins. Now C functions like malloc() have the desire to start at the beginning of the heap and grow upward as well. What you want to do is have your stack start at the top of memory, the top of the heap a high address in the heap, and grow downward. This is a quick way to make efficient use of the heap. Now it is possible to get into trouble. If you do too many mallocs, or at least consume too much memory using mallocs, and you have a lot of local variables, lots of stack consumption, then the software that manages the malloc may give you some memory that the stack grows into, causing your malloced data and/or your stack to get corrupt, leading to hopefully a crash or worse some strange behavior. Again this is not a rule, but a convention. Naturally with a malloc there is the free, and the portion of the heap at the bottom which is used by malloc() and free() can get fragmented. Say you malloc 100 bytes, then malloc 200 bytes, then free the first 100 bytes and malloc 300 bytes. Well there was only space for 100 at the bottom. the malloc of 300 bytes cant go there we need to put it after the malloc of 200 bytes. We have this unused hole of 100 bytes now right in our heap. Managing mallocs and allocs is an artform of itself, decades of research and solutions, not unlike file systems. For a function to be really useful it needs to have a "calling convetion" both the caller and the callee need to agree on what rules to use for passing parameters. There are two often used solutions for passing parameters, one is stack based the other is register based. Now in most systems there is often much more stack than the number of registers and if this were a C calling convetion, used by a C compiler we have to be able to support up to some number X (32?) parameters to be compatible with the standard. That is likely more registers than you have spare in your processor. So often in a register based calling convention registers are used for the first N parameters then the stack is used after that. For our very simple fun() function the rule will be the paramter will be passed to the funciton in register r0. The function can use and destroy r0-r2, if you think about it you dont want to say have an important value in a register r4 for example, and when you call some function that function uses r4 to do something and messes up your value. there are a couple of solutions for this, one is to save the value on the stack with a push before calling some other function and restoring it from the stack when you return from that function. Another is to make a rule about what registers have to put back they way they were found when the function was called. So lets try to examine the lesson, this lesson is a direct, human, hand, compiled version of the C program shown above. Before main can start the program starts by initializing the stack pointer. The stack is dynamic in nature but before we can use it the first time we need to point it at some high address. ld r6,stack_top In order to call fun(0) we need to follow our convention and place the parameter being passed in in r0. Then call the function ld r0,zero jsr fun When the call to the function returns, halt. trap 0x25 Now our function. fun: trap 0x67 show r0 whatever it may be add r6,r6,#0xFFFE 0xFFFE is a negative 2. This is called making a stack frame, by analyzing the function before writing it we know there are two things we want on the stack. We could a) push and pop as we go or b) adjust the stack pointer one time for the whole function, allocating all the locations we need, and then referencing into this allocated memory using stack pointer + offset addressing. This program does the latter. Each function call we need to store the return value and we need a location for storing the local variable y. Subracting 2, r6 = r6 - 2 is the same as allocating two locations for this function. str r7,r6,#1 save the return address at mem[r6+1] add r1,r0,#1 compute y = x + 1; x is the passed in parameter which is in register r0. str r1,r6,#0 save the value y on the stack at address mem[r6+0] add r2,r1,#0xFFF9 brzp skip A little math here, we want to compare with 7 usually that is done with a subtract and then look at the flags. We dont have a subtract we have an add instruciton, so we want to add with a negative 7. To negate a number in twos complement, invert and add one so 0x0007 inverted is 0xFFF8 plus one is 0xFFF9. so 0xFFF9 is a negative y As a programmer we know this function will have values of y that start at 1 and increase by 1 for every call so compute y - 7 and see what happens when we get to 7. 0x0001+0xFFF9 = 0xFFFA 0x0002+0xFFF9 = 0xFFFB 0x0003+0xFFF9 = 0xFFFC 0x0004+0xFFF9 = 0xFFFD 0x0005+0xFFF9 = 0xFFFE 0x0006+0xFFF9 = 0xFFFF 0x0007+0xFFF9 = 0x0000 0x0008+0xFFF9 = 0x0001 0x0009+0xFFF9 = 0x0002 0x000A+0xFFF9 = 0x0003 So when y = 1 through 6 the result is a negative number, for those cases 1 to 6, y is less than 7 so we want to call the function again recursively. When y = 7 we do not want to call the function. As written this program does not have this issue but what if we call the function with the number 7 or a number larger than 7. When called with a 7 then y = 8, and 8+0xFFF9 is a positive number. So for the case x = 6 y = 7 which is not less than 7 the result of this add is zero and the zero flag will be set. For values of y that are greater than 7 the result of this add is a positive number and the p flag will be set. Think of this is less than 7 call fun() is the same as if greater than or equal to 7 then dont call fun(). equal to 7 the subtract of 7 is a zero and numbers greater than 7 the subtract of 7 is a positive number so this implementation uses a brzp, branch if zero or positive and skip over the call to fun add r0,r1,#0 jsr fun If we execute fun, then we execute fun by passing the parameter y, we computed y by taking r0 and adding 1 and putting the result in r1 this is not the only way you could do this you could optimize this a little and save an instruction, but that is how it was done here. so per the calling convention y needs to be in r0, so the add r0,r1,#0 copies r1 to r0. Then we call the function fun. For either the case of returning from a call to fun or the case where we skipped over the call to fun, we land at the address skip skip: ldr r0,r6,#0 trap 0x67 The next section of code after skip is to load into r0 the contents of the stack at mem[r6+0], if you look above we are using r6+0 to keep the value y. So r0 gets the value of y computed at the beginning of this function call, and the trap 0x67 displays that value. ldr r7,r6,#1 The return address was stored in memory at mem[r6+1], lets get that saved value off of the stack. add r6,r6,#0x0002 And just before we return lets put the stack back the way we found it by adding the 2 that we subtracted. Remove the "stack frame" that we created or allocated. ret return to the caller So in general the stack is used when you need some temporary (relative term) storage. Unless there is a specific reason (rare), you should use it in a balanced manner, just like matching up your malloc()s and free()s you should have the same number of pushes and pops in the same context of code. Some programming languages compilers had the convention that the caller would set up the stack and the callee would clean it up. C compilers and others use an easier to maintain and understand convention of the caller cleans up after itself and the callee cleans up after itself. ---------------------------------------- Lesson 12: A simpler function ;---- lesson 12 ---- ld r6,stack_top ld r0,zero add r1,r0,#3 add r6,r6,#0xFFFF ;push r1 str r1,r6,#0 ;push r1 add r1,r0,#2 add r6,r6,#0xFFFF ;push r1 str r1,r6,#0 ;push r1 add r1,r0,#1 add r6,r6,#0xFFFF ;push r1 str r1,r6,#0 ;push r1 jsr sum trap 0x67 add r0,r1,#0 trap 0x67 ldr r0,r6,#0 add r6,r6,#3 trap 0x67 trap 0x25 sum: add r6,r6,#0xFFFE str r0,r6,#0 str r1,r6,#1 ldr r0,r6,#2 ldr r1,r6,#3 add r0,r0,r1 ldr r1,r6,#4 add r0,r0,r1 str r0,r6,#2 ldr r1,r6,#1 ldr r0,r6,#0 add r6,r6,#0x0002 ret stack_top: .word 0x6000 zero: .word 0x0000 ;---- lesson 12 ---- ./lc3asm lesson.asm label0000: [0x00000000] [stack_top] 1 label0001: [0x00000001] [zero] 1 label0002: [0x0000000B] [sum] 1 label0003: [0x00003013] [sum] 0 label0004: [0x00003020] [stack_top] 0 label0005: [0x00003021] [zero] 0 0x3000: 0x2C1F 0x3001: 0x201F 0x3002: 0x1223 0x3003: 0x1DBF 0x3004: 0x7380 0x3005: 0x1222 0x3006: 0x1DBF 0x3007: 0x7380 0x3008: 0x1221 0x3009: 0x1DBF 0x300A: 0x7380 0x300B: 0x4807 0x300C: 0xF067 0x300D: 0x1060 0x300E: 0xF067 0x300F: 0x6180 0x3010: 0x1DA3 0x3011: 0xF067 0x3012: 0xF025 0x3013: 0x1DBE 0x3014: 0x7180 0x3015: 0x7381 0x3016: 0x6182 0x3017: 0x6383 0x3018: 0x1001 0x3019: 0x6384 0x301A: 0x1001 0x301B: 0x7182 0x301C: 0x6381 0x301D: 0x6180 0x301E: 0x1DA2 0x301F: 0xC1C0 0x3020: 0x6000 0x3021: 0x0000 ./lc3sim lesson.asm.hex show 0x0000 show 0x0001 show 0x0006 halt This is a simpler task, no recursion. This time we are going to implement this code: signed sum ( signed a, signed b, signed c ) { return(a+b+c); } void main ( void ) { printf("0x%04X",sum(1,2,3)); } So we are going to put our human compiler hats on and think this through. This time we are going to use a different calling convention. This one is going to be purely stack based. - caller cleans up its use of the stack - callee cleans up its use of the stack - parameters are passed on the stack in reverse order such that when entering the function the top of the stack contains the first parameter - return values if any are returned on the top of the stack - no registers other than r7 are to be modified by the function. With stacks that grow down, you often need to stand on your head to look at them. What is called the "top of the stack" is the first item on the stack when visualized as a growing up type stack, a stack of note cards on a table. You can either stand on your head or you can write the addresses in ascending order rather than descending. That is what we will do [ a ] <-- top of stack after we prepare the stack to call sum [ b ] [ c ] [ ] <-- top of stack before we prepared the stack to call sum This picture shows the memory locations on the stack. Where the stack is before we start preparing the stack for a call to sum and after we are ready to jsr to sum. This is also the picture we see right after we enter the call to sum. Sum of course as needed can add more things to the stack. We are defining a push on the stack in C as sp--; *sp=item_pushed; The sp in lc3 is r6 so add r6,r6,#0xFFFF ; subtract one str rx,r6,#0 ; push a register rx on the stack. rx can be r0-r7 Start by initializing the stack pointer, somewhere in high(er) memory than the program. And initialize r0 so it is a known value. ld r6,stack_top ld r0,zero stack looks like this, we dont know what x is but we dont want to mess it up. [ x ] <-- r6 add r1,r0,#3 make r1 hold the value 3 (r1 = r0 + 3 = 0 + 3 = 3) add r6,r6,#0xFFFF stack looks like this [ ] <-- r6 [ x ] str r1,r6,#0 stack now looks like this [ 3 ] <-- r6 [ x ] add r1,r0,#2 add r6,r6,#0xFFFF [ ] <-- r6 [ 3 ] [ x ] str r1,r6,#0 [ 2 ] <-- r6 [ 3 ] [ x ] add r1,r0,#1 add r6,r6,#0xFFFF [ ] <-- r6 [ 2 ] [ 3 ] [ x ] str r1,r6,#0 [ 1 ] <-- r6 [ 2 ] [ 3 ] [ x ] Now the stack is ready, the three parameters for the sum function have been prepared, we can now call the sum function jsr sum So we know what the stack looks like right now and we know that r7 contains our return value. Because sum does not call any other functions basically neither jsr nor jsrr are used, and we dont mess with r7 we can leave r7 alone, we dont have to save it. If we were to call another function within sum we would need to save r7. We are not allowed to modify any registers, since the add instruction requires registers what do we do? Well we use the stack for temporary storage to preserve the state of some registers, use those registers, the restore the state of those registers before we return. It will take a minimum of two registers to perform these add operations so we need to save two registers. This lesson uses r0 and r1, so the first task is to save them. We could individually push each register on the stack which is two instructions per push as we saw above. Or we could make stack frame simply move the stack pointer the amount we need to move it in one step. Basically three instructions to do what would be four instructions if we pushed each item separately. sum: [ 1 ] <-- r6 [ 2 ] [ 3 ] [ x ] add r6,r6,#0xFFFE ; subtract 2 from r6 [ ] <-- r6 + 0 [ ] <-- r6 + 1 [ 1 ] <-- r6 + 2 [ 2 ] <-- r6 + 3 [ 3 ] <-- r6 + 4 [ x ] <-- r6 + 5 str r0,r6,#0 [ save r0 ] <-- r6 + 0 [ ] <-- r6 + 1 [ 1 ] <-- r6 + 2 [ 2 ] <-- r6 + 3 [ 3 ] <-- r6 + 4 [ x ] <-- r6 + 5 str r1,r6,#1 [ save r0 ] <-- r6 + 0 [ save r1 ] <-- r6 + 1 [ 1 ] <-- r6 + 2 [ 2 ] <-- r6 + 3 [ 3 ] <-- r6 + 4 [ x ] <-- r6 + 5 ldr r0,r6,#2 ;read the first parameter off the stack and put in r0 ldr r1,r6,#3 ;read the second parameter off the stack and put in r1 add r0,r0,r1 ;add the first and second parameters and save in r0 ldr r1,r6,#4 ;read the third parameter off the stack and save in r1 add r0,r0,r1 ;add the sum of the first two parameters and the third parameter str r0,r6,#2 ;save the result on the stack where top will be when we return [ save r0 ] <-- r6 + 0 [ save r1 ] <-- r6 + 1 [ result ] <-- r6 + 2 [ 2 ] <-- r6 + 3 [ 3 ] <-- r6 + 4 [ x ] <-- r6 + 5 ldr r1,r6,#1 ; restore r1 with saved value on stack ldr r0,r6,#0 ; restore r0 with saved value on stack add r6,r6,#0x0002 ; adjust stack pointer to remove stack frame [ result ] <-- r6 + 0 [ 2 ] <-- r6 + 1 [ 3 ] <-- r6 + 2 [ x ] <-- r6 + 3 ret Now we return to the first instruction after the jsr sum. To demonstrate that r0 and r1 were not modified by the function. r0 was a zero going into sum and r1 was a 1, lets show them. trap 0x67 ;show r0 add r0,r1,#0 ; copy r1 to r0 trap 0x67 ; show r0 and the output shows we did not modify them show 0x0000 show 0x0001 at this point the stack still looks like this we, the caller have not cleaned up our stack from the call to sum. [ result ] <-- r6 + 0 [ 2 ] <-- r6 + 1 [ 3 ] <-- r6 + 2 [ x ] <-- r6 + 3 ldr r0,r6,#0 ; read the result off the top of the stack add r6,r6,#3 ; instead of three pops, restore the stack pointer in one shot [ x ] <-- r6 trap 0x67 ; show r0 which contains the result of the addition of 1+2+3 show 0x0006 trap 0x25 halt. Different university programs use this instruction set for eduational purposes including making a compiler. I dont remember what calling convention was used, but because there are only 8 registers of which two are somewhat dedicated to special purposes, r7 for return values and r6 as a stack pointer. It may make sense to use the stack for the function calling convention as shown here. I have not made my own compiler for this architecture so dont know what the tradeoff is between register based and stack based. (register based is usually hybrid, part register then using the stack if you run out of registers). 0000 nzpoooooooooo br[n][z][p] label 0001 dddaaa000bbbb + add rd,ra,rb 0001 dddaaa1iiiiii + add rd,ra,#immed 0010 dddoooooooooo + ld rd,label 0011 sssoooooooooo st rs,label 0100 000aaa0000000 jsrr ra 0100 1oooooooooooo jsr label 0101 dddaaa000bbbb + and rd,ra,rb 0101 dddaaa1iiiiii + and rd,ra,#immed 0110 dddaaaooooooo + ldr rd,ra,#offset 0111 sssaaaooooooo str rs,ra,#offset 1000 0000000000000 rti 1001 dddaaa1111111 + not rd,ra 1010 dddoooooooooo + ldi rd,label 1011 dddoooooooooo + sti rd,label 1100 000aaa0000000 jmp ra 1100 0001110000000 ret (jmp r7) 1101 1101xxxxxxxxx undefined 1110 dddoooooooooo + lea rd,label 1111 0000iiiiiiiii trap immed This is a work in progress....
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published