GitHub - dwelch67/lc3

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Makefile		Makefile
README		README
lc3asm.c		lc3asm.c
lc3dis.c		lc3dis.c
lc3sim.c		lc3sim.c
readhex.c		readhex.c
test.asm		test.asm
vsim.c		vsim.c
Repository files navigation

This is a work in progress....










This repository is for learning assembly language programming, and other
things using the little computer 3 instruction set.

http://en.wikipedia.org/wiki/LC-3

---------------------------------------------------------------------

This document is targetted toward average programmers.  C experience
is not specifically required, but enough general programming knowledge
to the point that you can read C programs is required as C will be used
as a bridge between your programming knowledge and assembly language
and the other topics discussed.

The tools are all written in C so the ability to compile and run them
is required.  They are written to be very portable and will hopefully
compile and run on your system.

---------------------------------------------------------------------

What assembly language is not:

Assembly language is not, generally, the language you use for device
drivers, nor is the primary language for bare metal program or other
forms of low level programming.  Mostly C is used for such things but
other languages work as well.  Yes some assembly is required for some
things, but it is way less than one percent of the overall code, in
general.   And yes some folks use it heavily for fun or to prove a point
or other reasons.

Assembly language is not a unversal language for all processors in the
sense that there is one assembly language and one instruction set.  This
instruction set should really not take more than an evening to learn for
an average programmer (any other language C, Python, whatever), the
things that might cause a hang up would/should cause a similar hang up
in any other language (C, Python, Java, whatever) like bitwise logical
operations.  Learning this lc-3 instruction set does not mean you can
then run off and program in assembly language for your phone or computer.
Those devices use a different instruction set and if you want to know
the assembly language for those you need to learn that separately.  The
good side of this is that each new instruction set you learn is much
easier than the last, much easier than learning multiple high level
languages (C, Python, Java, whatever).

Generally assembly languages are not standardized, what does conform
to a hard standard is the instruciton set, the machine code, the bits
that make the processor do things.  If there is more than one assembler
(the program, like a compiler, that converts the assembly language
source code into something lower level, usually machine code) for a
particular instruciton set, each author may choose to implement a
completely different "language" as in the keywords and other syntax
that make up the programming language.  The GNU assembler is well known
for messing up instruction sets.  In this case (lc-3) I am not going
to conform to the assembly language supported by the inventors text book.

What assembly language is:

Assembly language is a programming language like C or Python or Java.

Assembly language instructions generally attempt to have a one to one
relationship with machine code instructions.  Another way to say this
or think about this is that often a single line of assembly language
results in a single machine code instruction.  There are of course
exceptions, directives used to communicate something to the assembler.

High level languages C, Python, Java are converted from the text file
form into something lower using a compiler.  The term compiler is not
normally used for assembly language, instead the term assembler is used.
Unfortunately, it is not uncommon to see the term assembler used as a
reference to the language not the tool used to convert the language into
machine code.

---------------------------------------------------------------------

The little computer 3, lc-3, is an educational instruction set, it is
certainly possible to turn it into hardware and do useful things with
it but it is better served as an educational tool.  A stepping stone
up to larger more complicated instruction sets.

This tutorial and repository will attempt to show you not only how to
get started in assembly language programming, but it will do this
by showing you the machine code, showing you at least one instruction
set simulator, an assembler and a disassembler.

Most of the assembly language lessons are delivered in a way where you
are expected to type them in yourself, assemble and run them with the
simulator and get the same results as the tutorial.  You of course could
cut and paste them instead of typing but then you leave a hole in your
education.  The typing and debugging side of programming using these
tools and this language.

You will need appendix A from this book:

http://highered.mcgraw-hill.com/sites/0072467509/

The direct link is here:

http://highered.mcgraw-hill.com/sites/dl/free/0072467509/104653/PattPatelAppA.pdf

Let me know if these links dont work or if this material becomes
unavailable.

----------------------------------------
Lesson 0: Building and testing the tools

These tools were written to be portable, so they should build and work
on most systems.  Windows, Linux, Apple, etc.

If you dont already have a C compiler either a gcc or clang/llvm based
compiler, pre-built binaries, should be easy to come by.  The Makefile
shows the command lines for each of the tools

gcc :
    gcc -o lc3sim lc3sim.c readhex.c
    gcc -o lc3asm lc3asm.c
    gcc -o lc3dis lc3dis.c readhex.c
    gcc -o vsim vsim.c readhex.c


clang:
    clang -o lc3sim lc3sim.c readhex.c
    clang -o lc3asm lc3asm.c
    clang -o lc3dis lc3dis.c readhex.c
    clang -o vsim vsim.c readhex.c

If you have make then the default

> make

should build using gcc/gnu, if you use clang then

> make clang

will build using the clang/llvm compiler.

To verify the tools work

> ./lc3asm test.asm
(on windows it might be in the form .\lc3asm.exe instead)

> ./lc3asm test.asm
label0000: [0x00003003] [top] 0
label0001: [0x00000005] [top] 1
0x3000: 0x9040
0x3001: 0x5060
0x3002: 0x1027
0x3003: 0xF067
0x3004: 0x103F
0x3005: 0x0BFD
0x3006: 0xF025

> ./lc3dis test.asm.hex
0x3000: 0x9040 not r0,r1
0x3001: 0x5060 and r0,r1,#0x0000
0x3002: 0x1027 add r0,r0,#0x0007
0x3003: 0xF067 trap 0x67
0x3004: 0x103F add r0,r0,#0xFFFF
0x3005: 0x0BFD brnp 0x3003 ; -3
0x3006: 0xF025 trap 0x25

> ./lc3sim test.asm.hex
show 0x0007
show 0x0006
show 0x0005
show 0x0004
show 0x0003
show 0x0002
show 0x0001
halt

Assuming test.asm has not been changed the output from these three tools
should match the above.

----------------------------------------
Lesson 1: Getting started

Reminder, don't cut and paste, type the following program into a text
editor, using the commands from lesson one (lc3asm, lc3sim) assemble
and run the program.

;---- lesson 1 ----
add r0,r0,#1
trap 0x67
add r0,r0,#2
trap 0x67
trap 0x25
;---- lesson 1 ----

./lc3asm lesson1.asm
0x3000: 0x1021
0x3001: 0xF067
0x3002: 0x1022
0x3003: 0xF067
0x3004: 0xF025

./lc3sim lesson1.asm.hex
show 0x1235
show 0x1237
halt

What does it do?

Here is where we dive in.  First, registers, registers in the context of
processors are similar to variables in a programming language, except
there are a fixed number of them.  Different processor architectures
have a different number of registers.  In general some registers are
considered general purpose and some special purpose, again different
architectures have different mixes of special and general and it is not
uncommon to have some special registers that can be used or accessed
as general purpose registers.

The lc3 architecture has 8 registers, named r0,r1,r2,...,r7.  They are
all considered general purpose, but a couple do have a special purpose
as well.  R6 is used as the stack pointer (The stack and stack pointer
will be covered in a future lesson).  R7 is used as a return address
for function/subroutine calls (a future lesson).

Now in general processors do not actually halt, you turn them on,
they start up based on some known rules, and then run forever.  When
you run a program on your computer, that program may start and finish
but the operating system keeps running.  Even if it shows 0% cpu usage
that means applications, the operating system keeps running, it has an
idle loop that measures how much time is the operating system idling
vs running other software.  It is still running.  There are some
microcontrollers, that are designed to hibernate or go into some sort
of low power mode that has a fast wake up, like pressing the button on
your television remote control.   For educational purposes this processor
does have a halt command so that programs like the one above can run
briefly then halt.  It is not uncommon for a processor architecture to
have a specific instruction for making system calls.  System calls
meaning a place where the operating system in a loose sense of the term
can wait for applications to ask for something from the operating system
printing, file I/O, etc.  The lc3 architecture uses the trap instruction
for system calls.  The format of the trap instruction is

trap vector

where vector is a number from 0 to 255

The vector 0x25 has been defined in the lc3 documentation as halt.  My
simulator also uses that trap to halt.  I have added my own trap, number
0x67 to print the contents of register r0, as a debugging and
demonstration tool.  For my simulators these are done using a back door
a non-realistic trick in the simulator and not actual lc3 code talking
to virtual lc3 hardware to perform this action.  Both actions (0x25 and
0x67) could be done using lc3 virtual hardware and lc3 software.

There are two forms of the add instruction

add rd,sr1,sr2
add rd,sr1,#immed5

The first of the two performs the operation rd = sr1 + sr2, where rd,
sr1, and sr2 are registers.  Any of the 8 registers can be used in any
of these three places, you do not have to repeat any registers

add r1,r2,r3 ; r1 = r2 + r3

but you could if you wanted to

add r1,r1,r1 ; r1 = r1 + r1

The other form has two registers and an immediate

add rd,sr1,#immed5 ; rd = sr1 + immed

immed5 is a sign extended 5 bit immediate.  The term immediate means a
constant that is encoded in the instruction itself.  We will cover the
instruction encoding later, for now understand that this form of the
add instruction has 5 bits for an immediate value and the most
significant of those bits is sign extended to create a 16 bit number.

0xxxx = 000000000000xxxx
1xxxx = 111111111111xxxx

Where x can be any combination of bits from 0000 to 1111.

0000000000000000
0000000000000001
0000000000000010
...
0000000000001101
0000000000001110
0000000000001111

1111111111110000
1111111111110001
1111111111110010
...
1111111111111101
1111111111111110
1111111111111111

0x0000 to 0x000F and 0xFFF0 to 0xFFFF.

If not already you should strengthen your understanding of twos complement
this has nothing really to do with assembly language as it is used in
many/all languages.  In this case

0x0000 to 0x000F 0 to 15
0xFFF0 to 0xFFFF -16 to -1

Larger constants will be covered in another lesson.

With hardware, unless it is well defined, never assume the hardware
is initialized to anything.  General purpose registers, and memory are
two places in particular where you should not make assumptions.  My
lc3sim.c simulator as of this writing resets all the registers to the
value 0x1234, but dont assume that will always be the case.  In order
to not overly complicate the example, this example does not manage the
initialization of r0 properly, it just uses it as is.  The purpose of
this example is to demonstrate the add immediate so it works despite
the initial value of r0.

At this point we should be ready to walk through the code.

add r0,r0,#1

this means r0 = r0 + 1

trap 0x67

this is a system call specific to this simulator it prints out the
contents of r0.  The output of lc3sim below, the first show line 0x1235
is the result of this trap 0x67 instruction.

add r0,r0,#2

This performs the operation r0 = r0 + 2.  if r0 were 0x1235 as we see
from the prior trap 0x67, then r0 = 0x1235 + 2 = 0x1237

trap 0x67

outputs the current state of r0, which is now 0x1237 as expected

trap 0x25

this halts the simulation shown as a halt below

./lc3sim lesson1.asm.hex
show 0x1235
show 0x1237
halt

----------------------------------------
Lesson 2: Unconditional Branch

;---- lesson 2 ----
trap 0x67
add r0,r0,#1
brnzp skip
add r0,r0,#2
skip:
trap 0x67
trap 0x25
;---- lesson 2 ----

./lc3asm lesson.asm
label0000: [0x00000002] [skip] 1
label0001: [0x00003004] [skip] 0
0x3000: 0xF067
0x3001: 0x1021
0x3002: 0x0E01
0x3003: 0x1022
0x3004: 0xF067
0x3005: 0xF025

./lc3sim lesson.asm.hex
show 0x1234
show 0x1235
halt

Some assembly languages use the word branch, some use jump, some use
both.  From a programmers perspective processor instructions execute
linearly in the order written by the programmer, until you hit a branch
or jump which causes execution to conditionally or unconditionally
jump to a new location in the code.  In C for example


a = 7;
if(b == 12)
{
  a++;
}
c = a + 10;

there are two possible paths through this code

a = 7;
c = a + 10;

or

a = 7;
a++;
c = a + 10;

The a++; is conditionally executed, otherwise it is skipped, the code
jumps over the contents in the brackets and executes what is after. Code
like this might not have much value on the surface, but it demonstrates
an unconditional branch, the a++; is never executed, the execution jumps
over and continues after the if statement.

a = 7;
if(0)
{
  a++;
}
c = a + 10;


trap 0x67

start off by seeing what r0 is right now, in this case 0x1234 (remember
never assume its power on state).

add r0,r0,#1

r0 = r0 + 1 = 0x1235

brnzp skip

For the lc3 processor this is an unconditional branch.  Why it is
unconditional and what the nzp mean are for a future lesson.  What it
means in this form is "no matter what jump to the code starting at/after
the label skip:

add r0,r0,#2

IF EXECUTED this code would perform r0 = r0 + 2 = 0x1235 + 2 = 0x1237

skip:

This is a label, for my lc3asm assembly language a label cannot start
with a number, cannot have spaces, cannot be too long, and ends with
a colon.  The unconditional branch above is branching/jumping the code
execution to this label.  Basically the add r0,r0,#2 is being passed
over and not executed.

trap 0x67

print out the current state of r0, we see 0x1235 so that confirms that
the add r0,r0,#2 was not executed

trap 0x25

halt the simulation

----------------------------------------
Lesson 3: Logical operations AND and NOT

;---- lesson 3 ----
trap 0x67
add r1,r0,#0
not r0,r1
trap 0x67
and r0,r0,r1
trap 0x67
add r0,r0,#7
trap 0x67
trap 0x25
;---- lesson 3 ----

./lc3asm lesson.asm
0x3000: 0xF067
0x3001: 0x1220
0x3002: 0x9040
0x3003: 0xF067
0x3004: 0x5001
0x3005: 0xF067
0x3006: 0x1027
0x3007: 0xF067
0x3008: 0xF025

./lc3sim lesson.asm.hex
show 0x1234
show 0xEDCB
show 0x0000
show 0x0007
halt

For whatever reason it seems to me that folks get tripped up in bit
manipulation and logic operations more in assembly language than in
other languages.  Most/all of the languages have logic operations as well
AND, OR, NOT, XOR, etc and there is little to no difference at the
assembly language level.

Being a minimalist instruction, lc3 only has NOT and AND operations,
OR, XOR, etc have to be derived from those (future lesson).  The NOT
operation means if the input bit is a 0 the output bit is a 1, if the
input bit is a 1, the output bit is a 0.  The operation does this
on a bit by bit basis each bit in the input register is processed
separately from a functional perspective from the other bits in that
register.  So

0001001000110100 input
1110110111001011 output

With pencil and paper first you need to be able to get from hex to
binary.  I used to keep a strip of paper taped to my monitor with hex and
binary for the numbers 0 - 15, it didnt take long to memorize them, for
decimal to/from hex I just use a calculator.  Once written as binary
(the above input number is 0x1234 BTW) then the not operation is trivial
where you see a 0 write a 1 where you see a 1 write a 0.

The AND operation is also a logical operation but involves two inputs
and one output.  The truth table is as follows

00 0
01 0
10 0
11 1

The left two columns are the possible combinations of inputs for two
bits.  The right column is the result of the and operation of those
two input bits on a line.  Basically if both input bits are ones then
the output is a one otherwise the output is a zero.  Or another
way to say this is

anything anded with 0 is 0

00 0
10 0

and anything anded with 1 is itself

01 0
11 1

it commutes, the first operand could be the anything anded with zero

00 0
01 0

or anything anded with one

10 0
11 1

If I perform the operation 0x1234 AND 0x00FF, we look at these per
bit which means per column, if both items in the column are a one then
the result is a one, otherwise the result is a zero

0001001000110100
0000000011111111 AND
=====================
0000000000110100

so 0x1234 AND 0x00FF = 0x0034

Walking through the program

trap 0x67

this shows us what the current state of r0 is (0x1234)

add r1,r0,#0

r1 = r0 + 0, basically copy r0 to r1

not r0,r1

perform a bit by bit not operation on r1 the result goes in r0

0001001000110100  r1
1110110111001011  r0

r0 = ~r1 = 0xEDCB

trap 0x67

print out r0, and it confirms r0 = 0xEDCB

and r0,r0,r1

r0 = r0 AND r1

1110110111001011  r0
0001001000110100  r1
=====================
0000000000000000  r0

this was intentional, if you think about it, to get a result of 1 out of
an and you need both inputs to be a 1.  One register is the inverse of
the other, no matter what r0 was before all of this started this is
guaranteed to never have a condition where there are two ones.

trap 0x67

this confirms r0 = 0x0000

add r0,r0,#7

r0 = r0 + 7 = 7

trap 0x67

print out the 0x0007

trap 0x25

halt

There will be faster ways to do this in future examples, but this
sequence demonstrated a sequence to zero out r0 by anding the not of
r0 to itself.


----------------------------------------
Lesson 4: Flags and conditional branch

;---- lesson 4 ----
trap 0x67
add r1,r0,#0
not r0,r1
trap 0x67
and r0,r0,r1
trap 0x67
add r0,r0,#7
trap 0x67
top:
  add r0,r0,#0xFFFF
  trap 0x67
  trap 0x68
  brnp top
trap 0x25
;---- lesson 4 ----

./lc3asm lesson.asm
label0000: [0x00003008] [top] 0
label0001: [0x0000000B] [top] 1
0x3000: 0xF067
0x3001: 0x1220
0x3002: 0x9040
0x3003: 0xF067
0x3004: 0x5001
0x3005: 0xF067
0x3006: 0x1027
0x3007: 0xF067
0x3008: 0x103F
0x3009: 0xF067
0x300A: 0xF068
0x300B: 0x0BFC
0x300C: 0xF025
./lc3sim lesson.asm.hex
show 0x1234
show 0xEDCB
show 0x0000
show 0x0007
show 0x0006
psr 0x0001
show 0x0005
psr 0x0001
show 0x0004
psr 0x0001
show 0x0003
psr 0x0001
show 0x0002
psr 0x0001
show 0x0001
psr 0x0001
show 0x0000
psr 0x0002
halt

So lesson 3 got us to the point where register 0 was zeroed out then
7 was added to leave r0 with a 7.  This example shows a simple count
down loop.  The instruction set documentation has a plus sign (+)
next to many of the instructions.  This plus sign indicates the psr
(processor state register) flags are changed as a result of that operation.
Like the PC in this architecture, the PSR is not directly visible to
the programmer.

What are flags?  First off not all processors use flags, second processors
that use flags dont use the same flags or use them in the same way.
This processor has an interesting mix of flags not normally seen.  This
processor has three flags the n flag, when set indicating the result
of the operation was negative (the msbit, bit 15, was set, think about
twos complement).  The z flag which indicates the result was exactly
zero and the p flag which indicates the result was positive.

From the lc3 instruction set reference:

The condition codes are set, based on whether that result, taken as a
16-bit 2’s complement integer, is negative (N = 1; Z, P = 0),
zero (Z = 1; N, P = 0), or positive (P = 1; N, Z = 0).
All other LC-3 instructions leave the condition codes unchanged.

So, basically you can only have one flag set at a time through normal
operations, and also you will always have at least one flag set.  There
is a way to mess with these rules by messing with the psr, that is a
separate discussion.  For this example only one of the flags is set
at a time and as important at least one flag is set.

  add r0,r0,#0xFFFF

We dont have a subtract, we dont need a subtract, in logic a subtract
uses the add logic anyway, remember with twos complement to negate
a number you invert and add one.  If I want to subtract 1

0x0001  one
0xFFFE  invert
0xFFFF  and add one

so subtracting 1 is the same as adding -1.  The add immediate allows
us to add small negative numbers so

  add r0,r0,#0xFFFF

means r0 = r0 - 1;

With my lc3 assembler a label is of the form:

top:

some text that doesnt start with a zero, has no spaces and ends with
a colon.  Not unlike a label in C which you would use with a goto if
using labels and goto's were not considered a bad thing in C.  Despite
being bad perhaps you did at least learn how to use them.  Will be used
pretty much the same way here:

top:
r0 = r0 - 1;
if(r0 != 0) goto top;

We saw the add r0,r0,#0xFFFF implements the r0 = r0 - 1;

trap 0x67 we know prints out the value of r0
trap 0x68 prints out the psr

note that in the reference manual the trap instruction itself DOES NOT
modify the flags.  If you look at the source we see that this simulator
does not actually call other code for the trap 0x67 (that other code
COULD modify the flags).  Assume here that the flags are not touched
these traps are so you can see what is going on, you normally would not
have them there nor take the risk that they could be implemented with
code that doesnt preserve the psr.

So the flags dont change between the add and the brnp

  brnp top

What is brnp?  The instruction is really br for branch.  It has three
optional modifiers one for checking the n bit, one for checking the z
bit and one for checking the p bit.  We are looking for the result of
the subtract to be zero.  When the result of the subtract is a zero the
n bit is 0, the z bit is 1, the p bit is 0.

If you read the description of the br instruciton in the reference manual
you will see

if ((n AND N) OR (z AND Z) OR (p AND P))
PC = PC‡ + SEXT(PCoffset9);

The logic compares the n, z, and p test bits in the instruction with the
N, Z, and P flags in the PSR.  If any of those comparisions using an
and is true then the branch happens.  If all three of those comparisons
is false, a zero, then the branch does not happen the code falls through.

The syntax brnp means test for the n and p flags.  We are looking for
the case where the z flag is set and the n and p flags are zero.  So
brnp means we want to branch whenever the z flag is not set, when the
z flag is not set either the n or p flag is set and that would result
in a branch.

The syntax in the assembly language is to specify the label to branch
to, just like in C.  The assembler will take care of figuring out the
number of instructions to jump over and as a result will figure out the
machine code for this instruction.

We can see from the trap 0x67 (show r0) and trap 0x68 (psr) outputs

show 0x0006
psr 0x0001
show 0x0005
psr 0x0001
show 0x0004
psr 0x0001
show 0x0003
psr 0x0001
show 0x0002
psr 0x0001
show 0x0001
psr 0x0001
show 0x0000
psr 0x0002
halt

r0 is having one subracted per loop, 6,5,4,3... being positive numbers
the p flag is set for the non-zero numbers.  When the result is a zero
the psr changes to 0x0002 which is the z flag set, the n and p flags
are zero.  So the brnp does not happen, the code falls through to the
trap 0x25 which is a halt.

So in summary a brnp means if the n or p flags is set then branch
otherwise dont.  Another way to say that is if the z flag is not set
then branch.   Likewise a brnz would mean if the n or z flags are set
then branch otherwise dont.  Or if the p flag is not set then branch.

----------------------------------------
Lesson 5: The load instruction and more on branching

;---- lesson 5 ----
trap 0x67
ld r0,abc
trap 0x67
top:
  add r0,r0,#0xFFFF
  trap 0x67
  trap 0x68
  brz get_out
  brnzp top
get_out:
trap 0x25

abc: .word 0x0007
;---- lesson 5 ----


./lc3asm lesson.asm
label0000: [0x00000001] [abc] 1
label0001: [0x00003003] [top] 0
label0002: [0x00000006] [get_out] 1
label0003: [0x00000007] [top] 1
label0004: [0x00003008] [get_out] 0
label0005: [0x00003009] [abc] 0
0x3000: 0xF067
0x3001: 0x2007
0x3002: 0xF067
0x3003: 0x103F
0x3004: 0xF067
0x3005: 0xF068
0x3006: 0x0401
0x3007: 0x0FFB
0x3008: 0xF025
0x3009: 0x0007

./lc3sim lesson.asm.hex
show 0x1234
show 0x0007
show 0x0006
psr 0x0001
show 0x0005
psr 0x0001
show 0x0004
psr 0x0001
show 0x0003
psr 0x0001
show 0x0002
psr 0x0001
show 0x0001
psr 0x0001
show 0x0000
psr 0x0002
halt

This example is very similar to the previous example.  The primary
differences are this example uses the load (ld) instruction to read
a value from memory and place it in a register.  And the branching
in the loop is a little different, instead of branching if not zero
the loop is an infinite loop with an unconditional branch, inside
the loop there is a branch if zero to get out of the loop.  In C:

r0 = 7;
top:
  r0 = r0 - 1;
  if(r0==0) goto get_out;
  goto top;
get_out:


ld r0,abc

The assembler makes life easy by letting you use labels for loading or
storing values.  Load (ld) is used if you are going from memory to a
register, basically it is a read from memory.  Store (st) is used for
going from a register to memory, basically a write.  In this case there
is a label abc which is followed by a .word directive.  For this
architecture/assembler a word is defined as 16 bits the .word directive
tells the assembler the value that follows needs to be put in the program
at this location.  Unless you are putting machine code in your program
you want to avoid using a .word in an execution path.  You normally
dont want to execute .word lines.

abc: .word 0x0007

The value at abc at assembly time is 0x0007.  So basically

ld r0,abc

In one instruction replaces these steps in the prior two lessons.

add r1,r0,#0
not r0,r1
and r0,r0,r1
add r0,r0,#7

Even better the add immediate is very limited in what immediate values
can be used.  Where a .word entry can hold any bit pattern, for example

ld r0,xyz
...
xyz:
   .word 0x1234

Would put 0x1234 in r0, but this is illegal:

add r1,r0,#0
not r0,r1
and r0,r0,r1
add r0,r0,#0x1234

It would take several add instructions to get 0x1234 into r0.

  brz get_out
  brnzp top
get_out:

brz means branch if the z flag is set.  When r0 contains a 1 and then
we subtract 1 (add 0xFFFF) the result is r0 equals zero.  At that time
the z flag is set, the n and p flags are not set.  So a branch if zero
will branch, which hops over the brnzp instruction and gets us out of
the loop.  If the z flag is not set then we execute the brnzp instruction
which is also known as an unconditional branch.  We assume that at least
one of the flags is set at all times so a brnzp means branch if n or
z or p is set, which should always be true.  So

top:
  ...
  brnzp top

Is an infinite loop

  brz get_out
  ...
get_out:

is a conditional branch that gets us out of the loop.

Another way to visualize this program in C is:

r0 = 7;
while(1)
{
   r0 = r0 - 1;
   if(r0==0) break;
}


----------------------------------------
Lesson 6: The store instruction

;---- lesson 6 ----
ld r0,abc
add r0,r0,#0x9
st r0,xyz
ld r0,abc
trap 0x67
ld r0,xyz
trap 0x67
trap 0x25

abc: .word 0x1234
xyz: .word 0xABCD
;---- lesson 6 ----

./lc3asm lesson.asm
label0000: [0x00000000] [abc] 1
label0001: [0x00000002] [xyz] 1
label0002: [0x00000003] [abc] 1
label0003: [0x00000005] [xyz] 1
label0004: [0x00003008] [abc] 0
label0005: [0x00003009] [xyz] 0
0x3000: 0x2007
0x3001: 0x1029
0x3002: 0x3006
0x3003: 0x2004
0x3004: 0xF067
0x3005: 0x2003
0x3006: 0xF067
0x3007: 0xF025
0x3008: 0x1234
0x3009: 0xABCD

./lc3sim lesson.asm.hex
show 0x1234
show 0x123D
halt

from lesson 5 we know that load means from ram to a register, a read,
store means from register to ram, a write.

ld r0,abc

read from memory at/after label abc which is 0x1234, r0 = 0x1234

add r0,r0,#0x9

r0 = r0 + 9 = 0x123D

st r0,xyz

store r0 to memory at/after label xyz, so the 0xABCD is overwritten
with 0x123D

ld r0,abc

load r0 with the value 0x1234

trap 0x67

show 0x1234

ld r0,xyz

load r0 with the value 0x123D

trap 0x67

show 0x123D

trap 0x25

halt

----------------------------------------
Lesson 7: Load and store indirect and load effective address

;---- lesson 7 ----
lea r0,here
st r0,not_here
ld r1,abc
sti r1,not_here
ldi r0,not_here
trap 0x67
ld r0,not_here
trap 0x67
ld r0,here
trap 0x67
trap 0x25

not_here: .word 0x5678
abc: .word 0x1234
here: .word 0xABCD
;---- lesson 7 ----

./lc3asm lesson.asm
label0000: [0x00000000] [here] 1
label0001: [0x00000001] [not_here] 1
label0002: [0x00000002] [abc] 1
label0003: [0x00000003] [not_here] 1
label0004: [0x00000004] [not_here] 1
label0005: [0x00000006] [not_here] 1
label0006: [0x00000008] [here] 1
label0007: [0x0000300B] [not_here] 0
label0008: [0x0000300C] [abc] 0
label0009: [0x0000300D] [here] 0
0x3000: 0xE00C
0x3001: 0x3009
0x3002: 0x2209
0x3003: 0xB207
0x3004: 0xA006
0x3005: 0xF067
0x3006: 0x2004
0x3007: 0xF067
0x3008: 0x2004
0x3009: 0xF067
0x300A: 0xF025
0x300B: 0x5678
0x300C: 0x1234
0x300D: 0xABCD

./lc3sim lesson.asm.hex
show 0x1234
show 0x300D
show 0x1234
halt

lea means load effective address, or even simpler "load address".

lea r0,here

means load the address of the label here into the register 0.  There
are a couple of ways we can confirm what is going on

not_here: .word 0x5678
abc: .word 0x1234
here: .word 0xABCD

The assembler is dumping the machine code:

0x300B: 0x5678
0x300C: 0x1234
0x300D: 0xABCD

here is at address 0x300D so lea r0,here results in r0 = 0x300D

st r0,not_here

this is a normal store, stores the value 0x300D to the word at the label
not_here, the 0x5678 that was there is overwritten.

ld r1,abc

r1 = 0x1234

sti r1,not_here

as with a normal load and store store means write, load means read.
indrect is like using a pointer in C.

unsigned *not_here;
not_here=(unsigned *)0x300D; //lea r0,here ; st r0,not_here
r1 = 0x1234;
*not_here = r1; //sti r1,not_here

From the description in the instruction set reference

mem[mem[not_here]] = r1
mem[0x300D] = 0x1234 //write to memory at 0x300D

ldi r0,not_here

store is write, load is read

r0 = mem[mem[not_here]]
r0 = mem[0x300D]
r0 = 0x1234

trap 0x67

show 0x1234

ld r0,not_here

r0 = 0x300D

trap 0x67

show 0x300D

ld r0,here

The lea and st combination up front made not_here hold the address of
here.  So the sti instruction caused the value to be stored to the
memory location with the label here.  So here contains 0x1234

r0 = 0x1234

trap 0x67

show 0x1234

trap 0x25

halt

----------------------------------------
Lesson 8: Jump

;---- lesson 8 ----
lea r1,skipper
jmp r1

ld r0,abc
trap 0x67

skipper:
ld r0,xyz
trap 0x67
trap 0x25

abc: .word 0x1234
xyz: .word 0x4321
;---- lesson 8 ----

./lc3asm lesson.asm
label0000: [0x00000000] [skipper] 1
label0001: [0x00000002] [abc] 1
label0002: [0x00003004] [skipper] 0
label0003: [0x00000004] [xyz] 1
label0004: [0x00003007] [abc] 0
label0005: [0x00003008] [xyz] 0
0x3000: 0xE203
0x3001: 0xC040
0x3002: 0x2004
0x3003: 0xF067
0x3004: 0x2003
0x3005: 0xF067
0x3006: 0xF025
0x3007: 0x1234
0x3008: 0x4321

./lc3sim lesson.asm.hex
show 0x4321
halt

The jump instruction (jmp) is very similar to the unconditional branch.
In this architecture the jmp branches to an address held in a register.

lea r1,skipper
jmp r1

Lea loads the address of skipper into the r1 register.  Then the jmp
instruction branches or jumps to the address in r1 which is now the
address at the label skipper.  So this pair of instructions performs
an unconditional branch to the label skipper.  It is another way of doing

brnzp skipper

Now why would you use jmp instead of a branch?  Well the branch is relative
to the pc and has a limited number of instructions it can branch forward
or back.  Where a register can contain any address (assuming some other
method than lea to put an address in that register, as lea is limited
in its range as well).

So we have jumped over the load of abc and the trap to the skipper label
then it is a simple load of xyz (0x4321) and then show that value, then
halt.

----------------------------------------
Lesson 9: Jump to subroutine

;---- lesson 9 ----
jsr myfun
jsr more_fun
trap 0x25

myfun:
add r0,r7,#0
trap 0x67
ret

more_fun:
add r0,r7,#0
trap 0x67
add r3,r7,#0
lea r1,myfun
jsrr r1
add r0,r7,#0
trap 0x67
add r7,r3,#0
add r0,r7,#0
trap 0x67
jmp r7
;---- lesson 9 ----


./lc3asm lesson.asm
label0000: [0x00000000] [myfun] 1
label0001: [0x00000001] [more_fun] 1
label0002: [0x00003003] [myfun] 0
label0003: [0x00003006] [more_fun] 0
label0004: [0x00000009] [myfun] 1
0x3000: 0x4802
0x3001: 0x4804
0x3002: 0xF025
0x3003: 0x11E0
0x3004: 0xF067
0x3005: 0xC1C0
0x3006: 0x11E0
0x3007: 0xF067
0x3008: 0x17E0
0x3009: 0xE3F9
0x300A: 0x4040
0x300B: 0x11E0
0x300C: 0xF067
0x300D: 0x1EE0
0x300E: 0x11E0
0x300F: 0xF067
0x3010: 0xC1C0

./lc3sim lesson.asm.hex
show 0x3001
show 0x3002
show 0x300B
show 0x300B
show 0x3002
halt

Now it is time to call functions.  There are a couple of common methods
used by instruction sets for performing a call to a subroutine or a
function call.  If you think about a C program or any high level language
when you make a function call, when the function is finished you expect
your program to return to run the line after the function call.

So what you want to do in assembly is you basically want to perform
an unconditional branch but in such a way that you are able to branch
back to the instruction after the branch.

branch to function
return:
...
function:
...
branch to return:

And using brnzp we know how to do that now

brnzp function
return:
...
function:
...
brnzp return:

Here is the problem though, what if we wanted to call the function
more than once?

brnzp function
return:
...
brnzp function
another_return:
...
function:
...
brnzp return:

Since we hardcoded the return the second time we wanted to use the
function it would not go back to another_return it would go back
to return.  What use is a function you cannot use more than once?

So the lc3 solution is to use one of the general purpose registers in
a special purpose way.  If you look at the descriptions for jsr the
first thing it says is r7 = pc.  Now when an instruction is executed
the pc contains the address of the next instruction the instruction
after the one we are executing.  So r7 = pc means r7 will contain
the return address for our function call.

jsr myfun
jsr more_fun
...
myfun:
add r0,r7,#0
trap 0x67
ret
...

jsr myfun means jump to subroutine at the address at label myfun.  but
before jumping save the address to the next instruction (jsr more_fun)
in the register r7.  If jsr myfun is at address 0x3000 then jsr more_fun
is at address 0x3001, so when we start running the function at myfun
r7 will contain the value 0x3001.  Myfun starts by moving the value in
r7 to r0 (r0 = r7 + 0) then shows the contents of r0.  The first output
of the simulator is 0x3001 so that is as expected.  Now the ret means
jump to the address in r7.  For lc3 assembly language the instruction
ret is a pseudo instruction, the real instruction is jmp r7.  The
output of the assembler shows this (well you have to know how to read
it, which is a future discussion).

0x3005: 0xC1C0
0x3010: 0xC1C0

At this point r7 contains 0x3001, so when we jump to 0x3001 we are
basically returning from the subroutine/function thus the name of the
pseudo instruction being ret, return.

Now we are at the jsr more_fun instruction, we know how this part
works we are going to jump to more_fun, but before doing that we save
the address if the instruction after the jsr (0x3002 in this case)
in r7.

Now what we wanted to do was this

main()
{
  myfun();
  more_fun();
}
myfun()
{
  return;
}
more_fun()
{
  myfun();
  return;
}

We want to nest function calls, have more_fun() call myfun() and then
return to more_fun() then return to main().

Here is our problem.

jsr more_fun
there:

more_fun:
; r7 contains the return address there
jsr myfun ; this changes r7 to the address here
here:
 ;r7 was modified by the jsr myfun, it still contains the address here
ret ; this returns to here, not there!

Later we will find out about a more portable and proper solution but
for now what if we save the contents of r7 in some other register.

jsr more_fun
there:

more_fun:
 ; r7 contains the return address there
 add r3,r7,#0
 ; now r3 contains the return address there
 jsr myfun ; this changes r7 to the address here
here:
 ; r7 contains here, r3 contains there so
 add r7,r3,#0 ; now r7 contains there
 ret ; returns to there

In this particular case we could have taken a shortcut and saved an
instruction:

here:
 ; r7 contains here, r3 contains there so
 jmp r3

The full more_fun function has a bunch of show calls in it


more_fun:
add r0,r7,#0 ; contains 0x3002 return address after jsr more_fun r0 = r7
trap 0x67    ; show 0x3002
add r3,r7,#0 ; r3 = r7 = 0x3002
lea r1,myfun ; r1 = myfun
jsrr r1      ; this is another form of jump subroutine, instead of an
             ; address you specify a register that contains the address
             ; similar to jmp r1 but r7 is modified to the address after
             ; jsrr which is 0x300B
             ; note that myfun shows r7 which in myfun is 0x300B
add r0,r7,#0 ; we are back from myfun, so r7 should contain the address
             ; after the jsrr r1 so r0 = 0x300B
trap 0x67    ; show 0x300B
add r7,r3,#0 ; r7 = r3 = 0x3002
add r0,r7,#0 ; r0 = r7 = 0x3002
trap 0x67    ; show 0x3002
jmp r7       ; this is the same as a ret, jump to what is in r7, 0x3002

And returning after jsr more_fun we simply halt.

jsr myfun
jsr more_fun
trap 0x25

show 0x3001  first call to myfun
show 0x3002  beginning of more_fun
show 0x300B  when more_fun calls myfun, in myfun
show 0x300B  when myfun returns to more_fun
show 0x3002  when we restore r7 from r3

----------------------------------------
Lesson 11: ldr, str, and labels

;---- lesson 11 ----
lea r1,base
ldr r0,r1,#0
trap 0x67
ldr r0,r1,#1
trap 0x67
add r2,r1,#2
ldr r0,r2,#0
trap 0x67
add r2,r2,#1
ldr r0,r2,#0
trap 0x67
trap 0x25

base:
.word 0x1111
.word 0x2222
.word 0x3333
.word 0x4444
;---- lesson 11 ----

./lc3asm lesson.asm
label0000: [0x00000000] [base] 1
label0001: [0x0000300C] [base] 0
0x3000: 0xE20B
0x3001: 0x6040
0x3002: 0xF067
0x3003: 0x6041
0x3004: 0xF067
0x3005: 0x1462
0x3006: 0x6080
0x3007: 0xF067
0x3008: 0x14A1
0x3009: 0x6080
0x300A: 0xF067
0x300B: 0xF025
0x300C: 0x1111
0x300D: 0x2222
0x300E: 0x3333
0x300F: 0x4444

./lc3sim lesson.asm.hex
show 0x1111
show 0x2222
show 0x3333
show 0x4444
halt

So we have seen the ld instruction where you specify the register
to read the value into and the name of the label (an address) that
you want to read from.  Ldi uses a level of indirection instead of
operating on the address specified, the address specified contains
the address to operate on.  The third flavor is is ldr.  It is like
ld but it reads from the address contained in the second register listed.

ldr r3,r4,#0

Take the contents of r4, add the offset, use that as an address to read
memory, take what you find in memory and put it in register r3.

Before diving into the program, understand that a label in assembly
language is just a variable that represents an address.  An address
that the assembler will figure out later, after you have written
the program.  It is not a function it is not a variable it is an
address.

lea r1,base

Load the address base into r1

ldr r0,r1,#0

here is an ldr r0 = mem[r1+0] since r1 is base r0 = mem[base+0] = 0x1111

trap 0x67

show 0x1111

ldr r0,r1,#1

r0 = mem[r1+1] = mem[base+1] = 0x2222

trap 0x67

show 0x2222

add r2,r1,#2

Now do some math with that address r2 = r1 + 2 = base + 2

ldr r0,r2,#0

r0 = mem[r2+0] = mem[(base+2)+0] = mem[base+2] = 0x3333

trap 0x67

show 0x3333

add r2,r2,#1

r2 = r2 + 1 = (base+2) + 1 = base + 3

ldr r0,r2,#0

r0 = mem[r2+0] = mem[(base+3)+0] = mem[base+3] = 0x4444

trap 0x67

show 0x4444

trap 0x25

halt.


Just like ld and st have a relationship, ld is load from memory into
a register (a read), st is store from the register into memory (a write).
ldr and str, same deal ldr is read from a register based address, str
is write the contents of a register to a register based address.

----------------------------------------
Lesson 11: The stack

;---- lesson 11 ----
ld r6,stack_top
ld r0,zero

jsr fun
trap 0x25

fun:
trap 0x67
add r6,r6,#0xFFFE
str r7,r6,#1
add r1,r0,#1
str r1,r6,#0
add r2,r1,#0xFFF9
brzp skip
add r0,r1,#0
jsr fun
skip:
ldr r0,r6,#0
trap 0x67
ldr r7,r6,#1
add r6,r6,#0x0002
ret

zero: .word 0x0000
stack_top: .word 0x6000
;---- lesson 11 ----

./lc3asm lesson.asm
label0000: [0x00000000] [stack_top] 1
label0001: [0x00000001] [zero] 1
label0002: [0x00000002] [fun] 1
label0003: [0x00003004] [fun] 0
label0004: [0x0000000A] [skip] 1
label0005: [0x0000000C] [fun] 1
label0006: [0x0000300D] [skip] 0
label0007: [0x00003012] [zero] 0
label0008: [0x00003013] [stack_top] 0
0x3000: 0x2C12
0x3001: 0x2010
0x3002: 0x4801
0x3003: 0xF025
0x3004: 0xF067
0x3005: 0x1DBE
0x3006: 0x7F81
0x3007: 0x1221
0x3008: 0x7380
0x3009: 0x1479
0x300A: 0x0602
0x300B: 0x1060
0x300C: 0x4FF7
0x300D: 0x6180
0x300E: 0xF067
0x300F: 0x6F81
0x3010: 0x1DA2
0x3011: 0xC1C0
0x3012: 0x0000
0x3013: 0x6000

./lc3sim lesson.asm.hex
show 0x0000
show 0x0001
show 0x0002
show 0x0003
show 0x0004
show 0x0005
show 0x0006
show 0x0007
show 0x0006
show 0x0005
show 0x0004
show 0x0003
show 0x0002
show 0x0001
halt

Time for a break, take a smoke, go to the bathroom, grab a coke or
cup of coffee.  Then come back.  This one might be obvious or might
twist your mind.  Maybe you didnt understand recursion when they showed
it to you in some other language.  Maybe it will or wont make sense here.
Likewise what is that thing called a stack, I couldnt see it in C or
other languages...You are going to see it in assembly.

In C you are taught about two kinds of variables.  Global and local.
Global is a bit more obvious, there is only one, everyone that touches
it changes the only one.  (Okay lets not get into a local with the
same name and all that you know what I mean).  Well the compiler when
it does its thing, chooses a memory location, a single memory location,
where that global will live.  All accesses to that global use that
memory location.  You increment it or add it or subtract, it all to
and from the same place.   But what about local variables, how does
that work?  How does the compiler "allocate" some memory just for that
function, does it call malloc()?  No, that is not how it works.  How
could it?  Malloc() is yet another function which probably has local
variables, or at least could if it wanted to.

First and foremost, the stack is just memory, plain old memory.  What
makes it special is that in some instruction sets there are special
instructions with general purpose registers with a special feature,
kind of like r7 in lc3.  Or sometimes the stack pointer is like the pc
in lc3 and not directly visible a special register.  And sometimes, there
are no special stack instructions at all the stack is implemented using
a convention.  Meaning every one agrees and/or understands that regiter
blah will be reserved for use as the stack pointer.

Lets take a break from that and look at some recursion.  We know from
C or other languages that you can call the function you are in, from
the function you are in.  Hopefully you do it in such a way as to not
get into an infinite loop.  Here is a simple example of recursion in C


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void fun ( int x )
{
    int y;

    printf("show %d\n",x);
    y = x+1;
    if(y<7) fun(y);
    printf("show %d\n",y);
}

int main ( void )
{
    fun(0);
    return(0);
}


The first call to fun() is with a parameter of 0.  The first call to
fun, x is zero.  We print x (zero).  A local variable y is set to
x+1 = 0+1 = 1;  And that is fine.  So now if x < 7 we call fun again
with y as the argument, so the second call to 1 is with an x of 1.
Lets chart this out

x = 0; print 0; y = 1; fun(1);
x = 1; print 1; y = 2; fun(2);
x = 2; print 2; y = 3; fun(3);
x = 3; print 3; y = 4; fun(4);
x = 4; print 4; y = 5; fun(5);
x = 5; print 5; y = 6; fun(6);
x = 6; print 6; y = 7;
Now y is not less than 7 so we do not call fun(7), we are in the middle
of fun(6);  x = 6, y = 7, so we skip calling fun, we go right to print
y, which is 7.  And we return from fun(6);  Now we are back to fun(5)
where x = 5 and y = 6 we come back after the call to fun(y) after the
if(6<7) fun(5);  so we perform the print of y, which is a 6 and return
from that function.  This repeats until we hit the return from fun(0)
which returns back to main.

x = 0; print 0; y = 1; fun(1);
x = 1; print 1; y = 2; fun(2);
x = 2; print 2; y = 3; fun(3);
x = 3; print 3; y = 4; fun(4);
x = 4; print 4; y = 5; fun(5);
x = 5; print 5; y = 6; fun(6);
x = 6; print 6; y = 7;
print 7; return;
print 6; return;
print 5; return;
print 4; return;
print 3; return;
print 2; return;
print 1; return;

We learned from nested calls to functions before (jump to subroutine)
that each time a function is called we need to keep a copy of the
return value if we are going to call another function, because calling
another funciton modifies r7 which holds the return value.  Another
thing we notice is that for each call to fun there is a new memory location
needed to store y for that one call to the function, we could easily change
the if(y<7) some larger number, any number and this code should work
(so long as we dont "run out" of stack).  The first and oldest call to
fun() we remember the y and the return value for that call no matter how
many have come in between, same for any of the calls.

Well think about some note cards or index cards.  Say we have a bunch
of note cards.  When the first call to fun is made, fun(0), we take two
note cards on one card we write the return address, the contents of r7,
and we place that card on the table.  The second card we write the value
for y, which is x+1 = 0+1 = 1;  We place that second card on top of the
first card, we begin to stack up note cards.  Now the second call to
fun, fun(1), we take two new note cards, the first one we write the
return address the contents of r7 and place it on the stack, then we write
the value of y = x+1 = 1+1 = 2 on the next note card and place that on
the stack.  We have added four cards thus far.  Third call to fun,
fun(2), we write the address in r7 on a card, put it on the stack,
write the value of y (3) and put that on the stack.  We have added six
cards thus far, two for each call to fun().  Now we hit the special
call fun(6).  We have put 12 cards on the stack two for each value of
x 0,1,2,3,4,5, we do the same again and put two more on the stack, the
return value, and the value of y which is 7 in this case.  We do not
call fun() again this time because the rule doesnt match y is not less
than 7.

Now we can unwind this recursion using the stack.  Take the top card off
the stack, this contains the value 7 for this call to fun (fun(6)) and
printf that value, discard that card.  Take the next card off the stack,
which is the return address for the call to fun(6), use that address
to return to the calling instance.  Now take the top card off the stack
and print that value, 6, discard.  Take the next card off the stack and
use that to return.  Adding something to a stack is called a push,
removing something from a stack is called a push.


push return f0; x = 0; print 0; y = 1; push y1; fun(1);
push return f1; x = 1; print 1; y = 2; push y2; fun(2);
push return f2; x = 2; print 2; y = 3; push y3; fun(3);
push return f3; x = 3; print 3; y = 4; push y4; fun(4);
push return f4; x = 4; print 4; y = 5; push y5; fun(5);
push return f5; x = 5; print 5; y = 6; push y6; fun(6);
push return f6; x = 6; print 6; y = 7; push y7;
pop y7; print 7; pop f6; return to f6;
pop y6; print 6; pop f5; return to f5;
pop y5; print 5; pop f4; return to f4;
pop y4; print 4; pop f3; return to f3;
pop y3; print 3; pop f2; return to f2;
pop y2; print 2; pop f1; return to f1;
pop y1; print 1; pop f0; return to f0;

So long as you have enough blank note cards and have enough room to stack
them up, you could do this all day long.

What if the first time when fun(0) pushed f0 on the stack there was
already a stack of cards on the table, that had been placed by other
functions.  Would this still work?  Yes, so long as we have room that
is not a problem.  The function pushes two things and pops two things
basically it cleans up as it goes and returns with the stack the way
it found it.  So long as all functions do this, you dont care what
is on the stack when you got there, you pile your things on then clean
them off when you are done.

So the word pointer had been used earlier, stack pointer to be exact.
What if the note cards were memory locations?  Locations in memory
that the program is not using for anything else.  And there is a register
that contains an address somewhere in that memory.  At address j.  What
if j were at some big address, far away from our program, and each time
we pushed something on the stack what that really did was decrement the
address in the stack pointer, and then write the new item at that address?
If k is the value of the stack pointer when we made the first call to
fun, fun(0) in main.

j = k;
push f0 (j--; *j=f0; j is k-1);   print 0; push y1 (j--; *j=y1; j is k-2) fun(1)
push f1 (j--; *j=f1; j is k-3);   print 1; push y2 (j--; *j=y2; j is k-4) fun(1)
push f2 (j--; *j=f2; j is k-5);   print 2; push y3 (j--; *j=y3; j is k-6) fun(1)
push f3 (j--; *j=f3; j is k-7);   print 3; push y4 (j--; *j=y4; j is k-8) fun(1)
push f4 (j--; *j=f4; j is k-9);   print 4; push y5 (j--; *j=y5; j is k-10) fun(1)
push f5 (j--; *j=f5; j is k-11);  print 5; push y6 (j--; *j=y6; j is k-12) fun(1)
push f6 (j--; *j=f6; j is k-13);  print 6; push y7 (j--; *j=y7; j is k-14)
pop (y = *j; which is y7 j++; j is k-13) print 7; pop (ret = *j; (f6) j++; j is k-12)
pop (y = *j; which is y6 j++; j is k-11) print 6; pop (ret = *j; (f6) j++; j is k-10)
pop (y = *j; which is y5 j++; j is k-9)  print 5; pop (ret = *j; (f6) j++; j is k-8)
pop (y = *j; which is y4 j++; j is k-7)  print 4; pop (ret = *j; (f6) j++; j is k-6)
pop (y = *j; which is y3 j++; j is k-5)  print 3; pop (ret = *j; (f6) j++; j is k-4)
pop (y = *j; which is y2 j++; j is k-3)  print 2; pop (ret = *j; (f6) j++; j is k-2)
pop (y = *j; which is y1 j++; j is k-1)  print 1; pop (ret = *j; (f6) j++; j is k-0)

and when finished j is equal to k.  No matter what k was we end up where
we started.  Each function call ends up where it started, so long
as the function has a balanced number of pushes and pops.

yes, there are some systems where the stack "grows up" meaning the address
gets larger when you push things on the stack.  And there are systems where
a push you write to memory first, then you change the stack pointer.
The lc3 has one instruciton in hardware that operates in a certain way,
we have not used that instruction yet, but the order of operations for
a pop in that instruciton is read from the memory pointed to by the
stack pointer (r6) then increment the stack pointer one location.  So
we need to be consistent.

I have let the cat out of the bag, r6 is another general purpose register
that also has a second life as a special purpose register, because it
is used specifically by hardware as such.  So we might as well use it
all of the time as the stack pointer.  We dont have to but that is
another story.

About this stack growning down or growing up.  If you think about the
use of memory.  Programs in all processors I know of execute in
increasing address order.  If there is no branch the next instruction
is the current instruction plus the size of the current instruction.
In the lc3 architecture the next instruction, if no branch is taken,
is at current instruction plus one.  So code grows upward.  When a
program is compiled or assembled your program consumes some amount of
memory starting at the entry point and going upward.  There is a finite
amount of memory, we have consumed some chunk of it for our program
and some chunk of it for our variables, what is left is called the
heap.  The heap of memory left over.  The heap is not something that
has a direct hardware connection, there is no heap pointer or anything
like that.  the compiler or in assembly language the programmer must
manage memory and knows where the end of the program is and the heap
begins.  Now C functions like malloc() have the desire to start
at the beginning of the heap and grow upward as well.  What you want
to do is have your stack start at the top of memory, the top of the heap
a high address in the heap, and grow downward.  This is a quick way
to make efficient use of the heap.  Now it is possible to get into
trouble.  If you do too many mallocs, or at least consume too much
memory using mallocs, and you have a lot of local variables, lots of
stack consumption, then the software that manages the malloc may give
you some memory that the stack grows into, causing your malloced data
and/or your stack to get corrupt, leading to hopefully a crash or worse
some strange behavior.  Again this is not a rule, but a convention.
Naturally with a malloc there is the free, and the portion of the heap
at the bottom which is used by malloc() and free() can get fragmented.
Say you malloc 100 bytes, then malloc 200 bytes, then free the first
100 bytes and malloc 300 bytes.  Well there was only space for 100 at
the bottom.  the malloc of 300 bytes cant go there we need to put it
after the malloc of 200 bytes.  We have this unused hole of 100 bytes
now right in our heap.  Managing mallocs and allocs is an artform
of itself, decades of research and solutions, not unlike file systems.

For a function to be really useful it needs to have a "calling convetion"
both the caller and the callee need to agree on what rules to use for
passing parameters.  There are two often used solutions for passing
parameters, one is stack based the other is register based.  Now in
most systems there is often much more stack than the number of registers
and if this were a C calling convetion, used by a C compiler we have
to be able to support up to some number X (32?) parameters to be compatible
with the standard.  That is likely more registers than you have spare
in your processor.  So often in a register based calling convention
registers are used for the first N parameters then the stack is used
after that.  For our very simple fun() function the rule will be the
paramter will be passed to the funciton in register r0.  The function
can use and destroy r0-r2, if you think about it you dont want to say
have an important value in a register r4 for example, and when you call
some function that function uses r4 to do something and messes up your
value.  there are a couple of solutions for this, one is to save the
value on the stack with a push before calling some other function
and restoring it from the stack when you return from that function.
Another is to make a rule about what registers have to put back they
way they were found when the function was called.

So lets try to examine the lesson, this lesson is a direct, human, hand,
compiled version of the C program shown above.  Before main can start
the program starts by initializing the stack pointer.  The stack is
dynamic in nature but before we can use it the first time we need to
point it at some high address.

ld r6,stack_top

In order to call fun(0) we need to follow our convention and place
the parameter being passed in in r0.  Then call the function

ld r0,zero
jsr fun

When the call to the function returns, halt.

trap 0x25

Now our function.

fun:

trap 0x67

show r0 whatever it may be

add r6,r6,#0xFFFE

0xFFFE is a negative 2.  This is called making a stack frame, by analyzing
the function before writing it we know there are two things we want
on the stack.  We could a) push and pop as we go or b) adjust the stack
pointer one time for the whole function, allocating all the locations
we need, and then referencing into this allocated memory using
stack pointer + offset addressing.  This program does the latter.  Each
function call we need to store the return value and we need a location
for storing the local variable y.  Subracting 2, r6 = r6 - 2 is the same
as allocating two locations for this function.

str r7,r6,#1

save the return address at mem[r6+1]

add r1,r0,#1

compute y = x + 1;  x is the passed in parameter which is in register
r0.

str r1,r6,#0

save the value y on the stack at address mem[r6+0]

add r2,r1,#0xFFF9
brzp skip

A little math here, we want to compare with 7 usually that is done
with a subtract and then look at the flags.  We dont have a subtract
we have an add instruciton, so we want to add with a negative 7.
To negate a number in twos complement, invert and add one so
0x0007 inverted is 0xFFF8 plus one is 0xFFF9.  so 0xFFF9 is a negative y
As a programmer we know this function will have values of y that start
at 1 and increase by 1 for every call so compute y - 7 and see what
happens when we get to 7.
0x0001+0xFFF9 = 0xFFFA
0x0002+0xFFF9 = 0xFFFB
0x0003+0xFFF9 = 0xFFFC
0x0004+0xFFF9 = 0xFFFD
0x0005+0xFFF9 = 0xFFFE
0x0006+0xFFF9 = 0xFFFF
0x0007+0xFFF9 = 0x0000
0x0008+0xFFF9 = 0x0001
0x0009+0xFFF9 = 0x0002
0x000A+0xFFF9 = 0x0003
So when y = 1 through 6 the result is a negative number, for those
cases 1 to 6, y is less than 7 so we want to call the function again
recursively.  When y = 7 we do not want to call the function.  As written
this program does not have this issue but what if we call the function
with the number 7 or a number larger than 7.  When called with a 7
then y = 8, and 8+0xFFF9 is a positive number.  So for the case x = 6
y = 7 which is not less than 7 the result of this add is zero and the
zero flag will be set.  For values of y that are greater than 7 the
result of this add is a positive number and the p flag will be set.
Think of this is less than 7 call fun() is the same as if greater than
or equal to 7 then dont call fun().  equal to 7 the subtract of 7 is
a zero and numbers greater than 7 the subtract of 7 is a positive number
so this implementation uses a brzp, branch if zero or positive and skip
over the call to fun


add r0,r1,#0
jsr fun

If we execute fun, then we execute fun by passing the parameter y,
we computed y by taking r0 and adding 1 and putting the result in r1
this is not the only way you could do this you could optimize this
a little and save an instruction, but that is how it was done here.
so per the calling convention y needs to be in r0, so the add r0,r1,#0
copies r1 to r0.  Then we call the function fun.

For either the case of returning from a call to fun or the case where
we skipped over the call to fun, we land at the address skip

skip:
ldr r0,r6,#0
trap 0x67

The next section of code after skip is to load into r0 the contents
of the stack at mem[r6+0], if you look above we are using r6+0 to keep
the value y.  So r0 gets the value of y computed at the beginning of
this function call, and the trap 0x67 displays that value.

ldr r7,r6,#1

The return address was stored in memory at mem[r6+1], lets get that
saved value off of the stack.

add r6,r6,#0x0002

And just before we return lets put the stack back the way we found it
by adding the 2 that we subtracted.  Remove the "stack frame" that we
created or allocated.

ret

return to the caller

So in general the stack is used when you need some temporary (relative
term) storage.  Unless there is a specific reason (rare), you should
use it in a balanced manner, just like matching up your malloc()s and
free()s you should have the same number of pushes and pops in the same
context of code.  Some programming languages compilers had the convention
that the caller would set up the stack and the callee would clean it
up.  C compilers and others use an easier to maintain and understand
convention of the caller cleans up after itself and the callee cleans
up after itself.


----------------------------------------
Lesson 12: A simpler function

;---- lesson 12 ----

ld r6,stack_top
ld r0,zero

add r1,r0,#3
add r6,r6,#0xFFFF ;push r1
str r1,r6,#0      ;push r1

add r1,r0,#2
add r6,r6,#0xFFFF ;push r1
str r1,r6,#0      ;push r1

add r1,r0,#1
add r6,r6,#0xFFFF ;push r1
str r1,r6,#0      ;push r1

jsr sum

trap 0x67
add r0,r1,#0
trap 0x67

ldr r0,r6,#0
add r6,r6,#3
trap 0x67
trap 0x25

sum:
add r6,r6,#0xFFFE
str r0,r6,#0
str r1,r6,#1
ldr r0,r6,#2
ldr r1,r6,#3
add r0,r0,r1
ldr r1,r6,#4
add r0,r0,r1
str r0,r6,#2
ldr r1,r6,#1
ldr r0,r6,#0
add r6,r6,#0x0002
ret

stack_top: .word 0x6000
zero: .word 0x0000
;---- lesson 12 ----


./lc3asm lesson.asm
label0000: [0x00000000] [stack_top] 1
label0001: [0x00000001] [zero] 1
label0002: [0x0000000B] [sum] 1
label0003: [0x00003013] [sum] 0
label0004: [0x00003020] [stack_top] 0
label0005: [0x00003021] [zero] 0
0x3000: 0x2C1F
0x3001: 0x201F
0x3002: 0x1223
0x3003: 0x1DBF
0x3004: 0x7380
0x3005: 0x1222
0x3006: 0x1DBF
0x3007: 0x7380
0x3008: 0x1221
0x3009: 0x1DBF
0x300A: 0x7380
0x300B: 0x4807
0x300C: 0xF067
0x300D: 0x1060
0x300E: 0xF067
0x300F: 0x6180
0x3010: 0x1DA3
0x3011: 0xF067
0x3012: 0xF025
0x3013: 0x1DBE
0x3014: 0x7180
0x3015: 0x7381
0x3016: 0x6182
0x3017: 0x6383
0x3018: 0x1001
0x3019: 0x6384
0x301A: 0x1001
0x301B: 0x7182
0x301C: 0x6381
0x301D: 0x6180
0x301E: 0x1DA2
0x301F: 0xC1C0
0x3020: 0x6000
0x3021: 0x0000

./lc3sim lesson.asm.hex
show 0x0000
show 0x0001
show 0x0006
halt


This is a simpler task, no recursion.  This time we are going to implement
this code:

signed sum ( signed a, signed b, signed c )
{
    return(a+b+c);
}
void main ( void )
{
    printf("0x%04X",sum(1,2,3));
}

So we are going to put our human compiler hats on and think this through.
This time we are going to use a different calling convention.  This one
is going to be purely stack based.
- caller cleans up its use of the stack
- callee cleans up its use of the stack
- parameters are passed on the stack in reverse order such that when
entering the function the top of the stack contains the first parameter
- return values if any are returned on the top of the stack
- no registers other than r7 are to be modified by the function.

With stacks that grow down, you often need to stand on your head to
look at them.  What is called the "top of the stack" is the first
item on the stack when visualized as a growing up type stack, a stack
of note cards on a table.  You can either stand on your head or you
can write the addresses in ascending order rather than descending.  That
is what we will do

[ a ] <-- top of stack after we prepare the stack to call sum
[ b ]
[ c ]
[   ] <-- top of stack before we prepared the stack to call sum

This picture shows the memory locations on the stack.  Where the stack
is before we start preparing the stack for a call to sum and after
we are ready to jsr to sum.  This is also the picture we see right
after we enter the call to sum.  Sum of course as needed can add more
things to the stack.

We are defining a push on the stack in C as sp--; *sp=item_pushed;  The
sp in lc3 is r6 so

add r6,r6,#0xFFFF ; subtract one
str rx,r6,#0      ; push a register rx on the stack.  rx can be r0-r7

Start by initializing the stack pointer, somewhere in high(er) memory
than the program.  And initialize r0 so it is a known value.

ld r6,stack_top
ld r0,zero

stack looks like this, we dont know what x is but we dont want to mess
it up.

[ x ] <-- r6

add r1,r0,#3

make r1 hold the value 3 (r1 = r0 + 3 = 0 + 3 = 3)

add r6,r6,#0xFFFF

stack looks like this

[   ] <-- r6
[ x ]

str r1,r6,#0

stack now looks like this

[ 3 ] <-- r6
[ x ]

add r1,r0,#2

add r6,r6,#0xFFFF

[   ] <-- r6
[ 3 ]
[ x ]

str r1,r6,#0

[ 2 ] <-- r6
[ 3 ]
[ x ]

add r1,r0,#1

add r6,r6,#0xFFFF

[   ] <-- r6
[ 2 ]
[ 3 ]
[ x ]


str r1,r6,#0

[ 1 ] <-- r6
[ 2 ]
[ 3 ]
[ x ]

Now the stack is ready, the three parameters for the sum function have
been prepared, we can now call the sum function


jsr sum

So we know what the stack looks like right now and we know that r7
contains our return value.  Because sum does not call any other functions
basically neither jsr nor jsrr are used, and we dont mess with r7 we can
leave r7 alone, we dont have to save it.  If we were to call another function
within sum we would need to save r7.

We are not allowed to modify any registers, since the add instruction
requires registers what do we do?  Well we use the stack for temporary
storage to preserve the state of some registers, use those registers,
the restore the state of those registers before we return.  It will
take a minimum of two registers to perform these add operations so we
need to save two registers.  This lesson uses r0 and r1, so the first
task is to save them.  We could individually push each register on
the stack which is two instructions per push as we saw above.  Or we
could make stack frame simply move the stack pointer the amount we
need to move it in one step.  Basically three instructions to do what
would be four instructions if we pushed each item separately.

sum:

[ 1 ] <-- r6
[ 2 ]
[ 3 ]
[ x ]

add r6,r6,#0xFFFE ; subtract 2 from r6

[         ] <-- r6 + 0
[         ] <-- r6 + 1
[    1    ] <-- r6 + 2
[    2    ] <-- r6 + 3
[    3    ] <-- r6 + 4
[    x    ] <-- r6 + 5

str r0,r6,#0

[ save r0 ] <-- r6 + 0
[         ] <-- r6 + 1
[    1    ] <-- r6 + 2
[    2    ] <-- r6 + 3
[    3    ] <-- r6 + 4
[    x    ] <-- r6 + 5

str r1,r6,#1

[ save r0 ] <-- r6 + 0
[ save r1 ] <-- r6 + 1
[    1    ] <-- r6 + 2
[    2    ] <-- r6 + 3
[    3    ] <-- r6 + 4
[    x    ] <-- r6 + 5

ldr r0,r6,#2 ;read the first parameter off the stack and put in r0
ldr r1,r6,#3 ;read the second parameter off the stack and put in r1
add r0,r0,r1 ;add the first and second parameters and save in r0
ldr r1,r6,#4 ;read the third parameter off the stack and save in r1
add r0,r0,r1 ;add the sum of the first two parameters and the third parameter
str r0,r6,#2 ;save the result on the stack where top will be when we return

[ save r0 ] <-- r6 + 0
[ save r1 ] <-- r6 + 1
[  result ] <-- r6 + 2
[    2    ] <-- r6 + 3
[    3    ] <-- r6 + 4
[    x    ] <-- r6 + 5

ldr r1,r6,#1 ; restore r1 with saved value on stack
ldr r0,r6,#0 ; restore r0 with saved value on stack
add r6,r6,#0x0002 ; adjust stack pointer to remove stack frame

[  result ] <-- r6 + 0
[    2    ] <-- r6 + 1
[    3    ] <-- r6 + 2
[    x    ] <-- r6 + 3

ret

Now we return to the first instruction after the jsr sum.  To demonstrate
that r0 and r1 were not modified by the function.  r0 was a zero going
into sum and r1 was a 1, lets show them.

trap 0x67    ;show r0
add r0,r1,#0 ; copy r1 to r0
trap 0x67    ; show r0

and the output shows we did not modify them

show 0x0000
show 0x0001

at this point the stack still looks like this we, the caller have not
cleaned up our stack from the call to sum.

[  result ] <-- r6 + 0
[    2    ] <-- r6 + 1
[    3    ] <-- r6 + 2
[    x    ] <-- r6 + 3

ldr r0,r6,#0 ; read the result off the top of the stack
add r6,r6,#3 ; instead of three pops, restore the stack pointer in one shot

[    x    ] <-- r6

trap 0x67  ; show r0 which contains the result of the addition of 1+2+3

show 0x0006

trap 0x25

halt.

Different university programs use this instruction set for eduational
purposes including making a compiler.  I dont remember what calling
convention was used, but because there are only 8 registers of which
two are somewhat dedicated to special purposes, r7 for return values and
r6 as a stack pointer.  It may make sense to use the stack for the
function calling convention as shown here.  I have not made my own
compiler for this architecture so dont know what the tradeoff is between
register based and stack based.  (register based is usually hybrid,
part register then using the stack if you run out of registers).
































0000 nzpoooooooooo     br[n][z][p] label
0001 dddaaa000bbbb +   add rd,ra,rb
0001 dddaaa1iiiiii +   add rd,ra,#immed
0010 dddoooooooooo +   ld rd,label
0011 sssoooooooooo     st rs,label
0100 000aaa0000000     jsrr ra
0100 1oooooooooooo     jsr label
0101 dddaaa000bbbb +   and rd,ra,rb
0101 dddaaa1iiiiii +   and rd,ra,#immed
0110 dddaaaooooooo +   ldr rd,ra,#offset
0111 sssaaaooooooo     str rs,ra,#offset
1000 0000000000000     rti
1001 dddaaa1111111 +   not rd,ra
1010 dddoooooooooo +   ldi rd,label
1011 dddoooooooooo +   sti rd,label
1100 000aaa0000000     jmp ra
1100 0001110000000     ret (jmp r7)
1101 1101xxxxxxxxx     undefined
1110 dddoooooooooo +   lea rd,label
1111 0000iiiiiiiii     trap immed








This is a work in progress....