-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathguide.tex
11250 lines (8925 loc) · 354 KB
/
guide.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper]{article}
\usepackage{hyperref}
\usepackage{listings}
\usepackage{color}
\usepackage{microtype}
\usepackage{graphicx}
\lstdefinelanguage{Rust}%
{
morekeywords={abstract,alignof,as,become,box,%
break,const,continue,crate,do,%
else,enum,extern,false,final,%
fn,for,if,impl,in,%
let,loop,macro,match,mod,%
move,mut,offsetof,override,priv,%
proc,pub,pure,ref,return,%
Self,self,sizeof,static,struct,%
super,trait,true,type,typeof,%
unsafe,unsized,use,virtual,where,%
while,yield},%
sensitive,%
morekeywords=[1]{\$},%
morecomment=[s]{/*}{*/},%
morecomment=[l]//,%
morestring=[b]",%
% morestring=[b]', Unfortunately lifetimes also use this and it
% breaks
}[keywords,comments,strings]
\lstdefinelanguage{assembly}%
{
morekeywords={nop,lui,move,%
j,jal,jr,jalr,%
add,addu,addiu,addi,subu,%
sll,sra,srl,slt,sltu,slti,sltiu,%
sllv,srlv,srav,%
sw,sh,sb,or,ori,and,andi,nor,%
mtc0,mtc1,mtc2,mtc3,%
mfc0,mfc1,mfc2,mfc3,%
li,lw,lh,lhu,lb,lbu,lwl,lwr,%
beq,bne,bgtz,blez,%
bltz, bltzal, bgez, bgezal,%
div,divu,mul,multu,mflo,mfhi,mtlo,mthi,%
syscall,break,rfe,
},%
sensitive,%
moredelim=[s]{::}{::},%
morecomment=[s]{/*}{*/},%
morestring=[b]",%
morestring=[b]',%
}[keywords,comments,strings]
\lstdefinelanguage{glsl}%
{
morekeywords={void,main,in,out,float,core,%
vec2,vec3,vec4,%
ivec2,ivec3,ivec4,%
uvec2,uvec3,uvec4,%
xyzw,x,y,z,w,%
rgba,r,g,b,a,%
},%
sensitive,%
morecomment=[s]{/*}{*/},%
morecomment=[l]//,%
moredelim=*[directive]\#,%
moredirectives={version}%
}[keywords,comments,strings,directives]
\definecolor{comment}{rgb}{0,0.6,0}
\definecolor{string}{rgb}{0.58,0,0.82}
\lstset{ %
basicstyle=\footnotesize,
columns=fixed,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
commentstyle=\color{comment},
frame=single
keepspaces=true,
keywordstyle=\color{blue},
language=Rust,
numbers=none,
stringstyle=\color{string},
tabsize=4,
}
\newcommand{\commit}[1] {%
\medskip
\framebox{You can get the current version of the emulator in
\href{https://github.com/simias/psx-rs/commit/#1}{commit
\texttt{#1}}}
\medskip}
\newcommand{\code}[1] {\texttt{#1}}
\title{Playstation Emulation Guide}
\date{\today}
\author{Lionel Flandrin}
\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage
\section{Introduction}
This is my attempt at documenting my implementation of a PlayStation
emulator from scratch. I'll write the document as I go and I'll try to
explain as much as possible along the way. You can find the complete
source of the emulator itself in my
\href{https://github.com/simias/psx-rs}{GitHub repository}.
Since my favourite passtime is to reinvent the wheel and recode things
that already exist I decided that this time I might as well document
it. This way maybe this time something useful will come out of it and
it'll give me a motivation to finish it.
I will be using the Rust programming language but this is not meant as
a Rust tutorial and knowledge of the language shouldn't be necessary
to follow this guide, although it won't hurt.
\subsection{Isn't emulation complicated?}
Emulation requires some low-level knowledge about how computers work
and some basics in electronics might help for certain things. Since
this doc is meant as an introduction to emulation I'll assume that the
reader doesn't bring anything with them beyond some decent programming
skills. So don't worry if you're not familiar with registers, cache,
memory mapped IO, virtual memory, interrupts and other low level fun:
I'll try to explain everything when needed. Emulators are a good
introduction to low level programming without having to bother with
that pesky hardware in person!
Since this is supposed to be a general guide about writing PlayStation
emulators I won't put the entire source code of the emulator here,
only snippets relevant to the matter beind discussed.
Finally, keep in mind that getting a PlayStation emulator even capable
to run \emph{some} games decently will require quite a lot of work. Don't
expect to play Final Fantasy VII on your brand new emulator in two
days. If you want to start with something simpler to see if you have a
taste for it you can search for Chip-8, Game Boy or NES emulation
tutorials (by increasing complexity).
\subsection{Feedback}
If some part of this document is unclear, poorly written or incomplete
please submit an issue so that I can fix or complete it. Corrections
for grammar, syntax and typos are very welcome. Thank you!
Ready? Let's begin!
\section{The CPU: Instructions and the memory}
\subsection{What is a CPU, anyway?}
That might seem like a silly question to some but I'm sure there are
plenty of competent programmers out there who are used to program in
high level managed environements haven't seen a register in their
entire life. Let me make the introductions.
For our first version of the PlayStation CPU I'm going to make some
simplifying assumptions. I'm going to ignore the caches for instance
and assume that it directly accesses the system bus. Basically we're
going to implement a
\href{https://en.wikipedia.org/wiki/Von_Neumann_architecture}{Von
Neumann architecture}.As we make progress we'll have to revisit this
design to add the missing bits when they are needed.
The objective of this section is to implement all the instructions and
try to reach the part of the BIOS where it starts to draw on the
screen. As we'll see there's a bunch of boring initialization code to
run before we get there.
There are 67 opcodes in the Playstation MIPS CPU. Some take one line
to implement, others will give us more trouble. In order to make the
process more interactive and less tedious we'll implement them as
they're encountered while we're running the original BIOS code. This
way we'll immediately be able to see our emulator in action.
But first things first, before we start implementing instructions we
need to explain how a CPU works.
\subsection{Architecture}
A simple Von Neumann architecture looks like this: the CPU only sees a
flat address space: an array of bytes. The PlayStation uses 32bit
addresses so the CPU sees \code{1 << 32} addresses. In other words it
can address 4GB of memory. That's why the PlayStation is said to be a
32bit console (that and the fact that it uses 32bit registers in the
CPU as we'll see in a minute).
This address space contains all the external ressources the CPU can
access: the RAM of course but also the various peripherals (GPU,
controllers, CD drive, BIOS...). That's called
\href{https://en.wikipedia.org/wiki/Memory-mapped_I/O}{memory mapped
IO}. Note that in this context "memory" doesn't mean RAM. Rather it
means that you access peripherals as if they were memory (instead of
using dedicated instructions for instance). From the point of view of
the CPU, everything is just a big array of bytes and it doesn't really
know what's out there.
Of course we'll have to figure out how the devices and RAM are mapped
in this address space to make sure the transactions end up at the
right location when the CPU starts reading and writing to the bus. But
first we need to understand how the code is executed.
\subsection{The code}
In this architecture the instructions live in the global address space
along with everything else. Typically in RAM but again, the CPU
doesn't care. If you want to run code from the controller input port
I'm sure the console will let you. Probably not very useful but it's
all the same as far as the CPU is concerned.
So somewhere in this 4GB address space there's the next instruction
for the CPU to run. How does it know the address of this instruction?
By using a register of course!
\subsection{The Program Counter register}
\href{https://en.wikipedia.org/wiki/Processor_register}{Registers} are
very small and very fast special purpose memories built inside the
CPU. Most CPU instructions manipulate those registers by adding them,
multiplying them, masking them, storing their content to memory or
fetching it back\dots{}
\href{https://en.wikipedia.org/wiki/Program_counter}{The Program
Counter} (henceforth refered to as PC) is one of the most
elementary registers, it exists in one form or an other on basically
all computer architectures (although it goes by various names, on x86
for instance it's called the Instruction Pointer, IP). Its
job is simply to hold the address of the next instruction to be run.
As we've seen, the PlayStation uses 32bit addresses, so the PC
register is 32bit wide (as are all other CPU registers for that
matter).
A typical CPU execution cycle goes roughly like this:
\begin{enumerate}
\item Fetch the instruction located at address PC,
\item Increment the PC to point to the next instruction,
\item Execute the instruction,
\item Repeat
\end{enumerate}
We need to know how big an instruction is in order to know how many
bytes to fetch and how much we need to increment the PC to
point at the next instruction. Some architectures have variable length
instructions (x86 and derivatives are a common example) which means
we'd have to decode the instruction to know how many bytes it
takes. Fortunately for us, the PlayStation uses a fixed length
instruction set
(\href{https://en.wikipedia.org/wiki/MIPS_instruction_set}{The MIPS
instruction set}) and all instructions are 32bit long.
With all that in mind we can finally start writing some code!
Here's what the CPU state looks like at that point:
\begin{lstlisting}
/// CPU state
pub struct Cpu {
/// The program counter register
pc: u32,
}
\end{lstlisting}
And here's the implementation of our CPU cycle described above:
\begin{lstlisting}
impl Cpu {
pub fn run_next_instruction(&mut self) {
let pc = self.pc;
// Fetch instruction at PC
let instruction = self.load32(pc);
// Increment PC to point to the next instruction.
self.pc = pc.wrapping_add(4);
self.decode_and_execute(instruction);
}
}
\end{lstlisting}
In Rust \code{wrapping\_add} means that we want the PC to
wrap back to 0 in case of an overflow (i.e. \code{0xfffffffc + 4 =>
0x00000000}). We'll see that most CPU operations wrap on overflow
(although some instructions catch those overflows and generate an
exception, we'll see that later).
If you're coding in C you don't need to worry about that if you use
\code{uint32\_t} since the C standard mandates that unsigned overflow wraps
around in this fashion. Rust however says that overflows are undefined
and will generate an error in debug builds if an unchecked overflow is
detected, that's why I need to write \code{pc.wrapping\_add(4)} instead of
\code{pc + 4}.
We now finally have some code but it doesn't build yet.
We're still missing 3 pieces of the puzzle before we can run this
piece of code:
\begin{itemize}
\item What's the initial value of PC when starting up?
\item How do we implement the \code{fetch32} function?
\item How do we implement the \code{decode\_and\_execute} function?
\end{itemize}
\subsubsection{Reset value of the PC}
In integrated circuits
\href{https://en.wikipedia.org/wiki/Reset_%28computing%29}{reset} is a
state where the chip generally does nothing and its internal state
is set to some known default ``factory'' value. What exactly the
reset does varies from chip to chip (it's just a convention) but
it's assumed that a chip will restart in a clean and deterministic
state after a reset cycle.
Generally the reset is a dedicated pin on the chip that's connected to
a button or some other control logic. Sometimes you can also request a
"soft" reset through software using a specific command or sequence of
instructions. Reseting a chip does necessitate cutting off the power
(nor is power cycling an integrated circuit a good way to reset a
chip: if the reset signal is not asserted it might not load the
default values correctly).
When you power up the console or hit the reset button the hardware
forces the CPU (and other peripherals) into a reset state to
initialize the logic.
Knowing this it's pretty obvious that the reset value of the
PC is very important since it's going to tell the CPU where
it should start running the code. It basically defines the location of
the "main" function of the console's kernel.
The docs say that the reset value of PC is
\code{0xbfc00000}. In the playstation memory map that's the
beginning of the BIOS (we'll look at the memory map in greater details
in the next section).
Now that we know where our story starts we can write our CPU
initializer:
\begin{lstlisting}
impl Cpu {
pub fn new() -> Cpu {
Cpu {
// PC reset value at the beginning of the BIOS
pc: 0xbfc00000,
}
}
//...
}
\end{lstlisting}
\subsection{The Playstation memory map}
Our CPU treats all addresses the same way but at some point we'll have
to dispatch the load/store requests to the correct peripheral. If we
read the BIOS and we get GPU data instead we're going to run into
troubles very quickly\dots{}
So how do we know what is mapped at some arbitrary address? By using
the \href{https://en.wikipedia.org/wiki/Memory_map}{memory map} of
course!
Here's an overview of the PlayStation memory map, courtesy of
\href{http://problemkaputt.de/psx-spx.htm#cpuspecifications}{the Nocash
specs}:
\begin{table}[ht]
\centering
\begin{tabular}{ l | l | l | r | l }
KUSEG & KSEG0 & KSEG1 & Length & Description \\
\hline
\code{0x00000000} & \code{0x80000000} & \code{0xa0000000} & 2048K
& Main RAM \\
\code{0x1f000000} & \code{0x9f000000} & \code{0xbf000000} & 8192K
& Expansion Region 1 \\
\code{0x1f800000} & \code{0x9f800000} & \code{0xbf800000} & 1K
& Scratchpad \\
\code{0x1f801000} & \code{0x9f801000} & \code{0xbf801000} & 8K
& Hardware registers \\
\code{0x1fc00000} & \code{0x9fc00000} & \code{0xbfc00000} & 512K
& BIOS ROM \\
\end{tabular}
\caption{Playstation memory map}
\label{tab:mmap}
\end{table}
Let's take the time to parse through this.
We can see that most peripherals in table~\ref{tab:mmap} are mapped at
several addresses. For instance if we look at the PC reset
value \code{0xbfc00000} corresponds to the beginning of the BIOS range in
region KSEG1. However we can also reach the same location through
addresses \code{0x1fc00000}(KUSEG) and \code{0x9fc00000}(KSEG0).
What's the point of having those mirrored regions? What's the
difference between KUSEG and KSEG1 for instance? Those are memory
regions which are used to specify certain attributes of the memory
access. On the Playstation hardware it's mostly used to specify
whether the access is cached or not.
For now we're going to ignore regions and treat all mappings the same,
we'll study them more closely later on.
\begin{table}[ht]
\centering
\begin{tabular}{ l | r | l }
KSEG2 & Length & Description \\
\hline
\code{0xfffe0000} & 512B & I/O Ports \\
\end{tabular}
\caption{KSEG2 memory map}
\label{tab:kseg2}
\end{table}
Table~\ref{tab:kseg2} shows the last region: KSEG2. It's a bit
different from the others. It doesn't mirror the other regions,
instead it gives access to a unique set of registers. As far as I know
the only important register there is the cache control but there might
be others I haven't encountered yet.
\subsubsection{Implementing the memory map}
In order to implement the PlayStation memory map in our emulator we
will need an interconnect to dispatch the load/store operations to the
correct peripheral.
I don't know if the PlayStation really has a hardware
interconnect. The CPU could just "broadcast" the read/write operations
on the system bus and the peripherals would check the address and only
answer if it's for them. However this design would be inefficient in
software: we'd need to iterate over the peripherals for each
transaction until we find the correct receiver.
Instead we're just going to implement a "switchboard" that will match
the address to the correct peripheral and forward it there.
Since the first thing the emulator will run is the BIOS we'll use it
as our first peripheral.
\subsection{The BIOS}
On the PlayStation the BIOS displays the first screens (with the logos
and that memorable sweeping tune) and starts the game from the CD
drive. If no CD is present it displays a menu that can be used to
manage the memory cards and play CDs. As a player that's probably the
only time you'd know there was a BIOS running.
But that's just the tip of the iceberg! The BIOS remains loaded at all
time and provides a Basic Input/Output System to the running
game. That means that the game can call into the BIOS to do things
like allocating memory, reading the memory card, common libc functions
(qsort, memset...) and many other things.
We won't be implementing the BIOS ourselves. It's possible (and it's
been done) but that's a lot of work and probably something you'd want
to do once you have a working emulator. It might also hurt
compatibility since many games are known to patch the BIOS at
runtime. The
\href{http://problemkaputt.de/psx-spx.htm#biospatches}{Nocash specs}
have more info.
We could dump the BIOS of a console but that requires access to the
actual hardware and the know-how to access the BIOS
memory. Fortunately some nice people have done it for us and these
days it's easy to find BIOS files on the web.
There are many BIOS versions: they change depending on the region, the
hardware revision and patches. Any good dump should work (after all,
they all do more or less the same thing) but if you're following this
guide it's probably better that we use the same file.
\begin{table}[ht]
\centering
\begin{tabular}{ l | l }
Algorithm & Hash \\
\hline
MD5 & \code{924e392ed05558ffdb115408c263dccf} \\
SHA-1 & \code{10155d8d6e6e832d6ea66db9bc098321fb5e8ebf} \\
\end{tabular}
\caption{\code{SCPH1001.BIN} BIOS checksums}
\label{tab:checksums}
\end{table}
I've decided to go for the version named \code{SCPH1001.BIN}. The file
should be \emph{exactly} 512KB big. Check table~\ref{tab:checksums} to make
sure you got the right one.
\subsection{Loading the BIOS}
Once we got our BIOS the rest is pretty straightforward. We just read
the file into a 512KB buffer:
\begin{lstlisting}
/// BIOS image
pub struct Bios {
/// BIOS memory
data: Vec<u8>
}
impl Bios {
/// Load a BIOS image from the file located at `path`
pub fn new(path: &Path) -> Result<Bios> {
let file = try!(File::open(path));
let mut data = Vec::new();
// Load the BIOS
try!(file.take(BIOS_SIZE).read_to_end(&mut data));
if data.len() == BIOS_SIZE as usize {
Ok(Bios { data: data })
} else {
Err(Error::new(ErrorKind::InvalidInput,
"Invalid BIOS size"))
}
}
}
/// BIOS images are always 512KB in length
const BIOS_SIZE: u64 = 512 * 1024;
\end{lstlisting}
We also need to be able to read data from the BIOS. The CPU wants to
read 32bit of data to load the instructions so let's start by
implementing load32:
\begin{lstlisting}
impl Bios {
// ...
/// Fetch the 32bit little endian word at `offset`
pub fn load32(&self, offset: u32) -> u32 {
let offset = offset as usize;
let b0 = self.data[offset + 0] as u32;
let b1 = self.data[offset + 1] as u32;
let b2 = self.data[offset + 2] as u32;
let b3 = self.data[offset + 3] as u32;
b0 | (b1 << 8) | (b2 << 16) | (b3 << 24)
}
}
\end{lstlisting}
A few things to note: \code{offset}, as its name implies, is not the
absolute address used by the CPU, it's just the offset in the BIOS
memory range. Remember that the BIOS is mapped in multiple regions so
we'll handle that in the generic interconnect code. Each peripheral
will just handle offsets in its address range.
In the comment I mention that we read the word in \emph{little
endian}. That's important. If you've never had to worry about
\href{https://en.wikipedia.org/wiki/Endianness}{endianess} issues
before let me give you the gist.
The basic unit of memory is a byte (8 bits in our case). You cannot
address anything smaller than that. However sometimes you need to
store data over multiple bytes. For instance we've seen that our
instructions are 4byte long. We have multiple way to store 4byte words
in our "array of bytes".
Let's take an example: you have the 32bit word
\code{0x12345678}. You have multiple way to store that value in 4
consecutive bytes. We can store [0x12, 0x34, 0x56, 0x78] or
[0x78, 0x56, 0x34, 0x12] for instance. The former is called
\emph{big-endian} because we store the most significant byte
first. The latter is \emph{little-endian} because we store the least
significant byte first. There are other endian types with weirder
patterns but they're not often used is modern computers. Check
wikipedia if you want more details.
The PlayStation is little-endian so we're in the 2nd case: when
reading or writing multi-byte values the least significiant byte goes
first. If we do it the other way around we'll end up with garbage.
Now we can implement our interconnect to let the CPU communicate with
the BIOS.
\subsection{The interconnect}
We now have an embryo of a CPU and our first device ready to talk to
each other. We just need to figure out how to link them together.
At that point we could have the CPU talk directly to the BIOS, after
all it's our only device. Obviously that won't work for very long
however, we need to be able to dispatch the CPU's loads and stores to
the correct peripheral depending on the address range.
I'm not quite sure how this is handled on the actual hardware. For
simple buses it's very possible that the CPU just "broadcasts" the
address to all the peripherals and each of them just checks if it's
within their address range and simply ignores the transaction if they
see it's not for them. It's fast in hardware because all peripherals
work in parallel so there's no delay induced: they can all receive and
decode the address at the same moment.
Unfortunately we can't really do that in software: the closest
equivalent would be to spawn a thread for each peripheral. The problem
is that memory transactions are very common (several millions per
second potentially) and having to send data and resynchronize across
threads would kill our performances.
Multihreading emulators in general is a very tough issue: for
threading to be really efficient you need to reduce data exchange and
resynchronization as much as possible to let each thread live its
life. When we emulate however we want to mimick the original hardware
behaviour and speed as much as possible which requires very frequent
resynchronization and we have plenty of shared state. The two
endeavors are somewhat at odds. That's not to say multithreading is
impossible in emulators, just that it's hard. We can't just spawn
threads willy-nilly.
Anyway, back to our interconnect: since threads are out it means we'll
have to sequentially match the address against each mapping until we
get a match. Then we can let the selected peripheral handle the
transaction.
Let's do just that:
\begin{lstlisting}
/// Global interconnect
pub struct Interconnect {
/// Basic Input/Output memory
bios: Bios,
}
impl Interconnect {
pub fn new(bios: Bios) -> Interconnect {
Interconnect {
bios: bios,
}
}
}
\end{lstlisting}
I've decided to store the BIOS directly in the interconnect
\code{struct}. We'll append the other peripherals there as we
implement them. We are going to store the interconnect inside the
\code{struct Cpu} which will give us a device tree with the CPU at
the top. It makes the data paths pretty simple: everything goes \emph{from}
the CPU \emph{to} the peripherals. It's easier to reason about than a full
``everybody sees everybody'' architecture in my opinion but it might
prove limiting as we progress. We'll see if we need to revise that
later.
Now we can finally implement the \code{load32} function that the CPU
will be using. I don't like having hardcoded constants all over the
place so I'm going to tie the address ranges to nice symbolic names:
\begin{lstlisting}
mod map {
struct Range(u32, u32);
impl Range {
/// Return `Some(offset)` if addr is contained in `self`
pub fn contains(self, addr: u32) -> Option<u32> {
let Range(start, length) = self;
if addr >= start && addr < start + length {
Some(addr - start)
} else {
None
}
}
}
pub const BIOS: Range = Range(0xbfc00000, 512 * 1024);
}
\end{lstlisting}
If you're not familiar with rust what this does is create a new type
\code{Range} which is a tuple of two values: the start address and length
of the mapping.
I also declare a \code{contains} methods which takes an address and
returns \code{Some(offset)} if the address is within the range,
\code{None} otherwise. You can think of it as a form of multiple
return values with some nice type-safety on top.
Finally I declare our first range for the BIOS.
Now for the load32 function:
\begin{lstlisting}
impl Interconnect {
//...
/// Load 32bit word at `addr`
pub fn load32(&self, addr: u32) -> u32 {
if let Some(offset) = map::BIOS.contains(addr) {
return self.bios.load32(offset);
}
panic!("unhandled fetch32 at address {:08x}", addr);
}
}
\end{lstlisting}
The \code{if let} syntax is an other rust nicety: if the
\code{contains} function returns \code{Some(offset)} we enter the
body of the if with \code{offset} bound to a temporary variable. If
\code{contains} returns \code{None} on the other hand the
\code{if} is refuted and we don't enter the body and go straight to
the \code{panic!} command which will make our emulator crash.
\subsection{Gluing the interconnect to the CPU}
The only thing left before we can finally build our code is gluing the
Interconnect with the Cpu.
We add an \code{inter} member to the \code{struct Cpu} and take an
\code{Interconnect} object in the constructor:
\begin{lstlisting}
/// CPU state
pub struct Cpu {
/// The program counter register
pc: u32,
/// Memory interface
inter: Interconnect,
}
impl Cpu {
pub fn new(inter: Interconnect) -> Cpu {
Cpu {
// PC reset value at the beginning of the BIOS
pc: 0xbfc00000,
inter: inter,
}
}
// ...
}
\end{lstlisting}
We can also implement the \code{load32} function for the CPU which
will just call the interconnect.
\begin{lstlisting}
impl Cpu {
//...
/// Load 32bit value from the interconnect
fn load32(&self, addr: u32) -> u32 {
self.inter.load32(addr)
}
}
\end{lstlisting}
We're still lacking the \code{decode\_and\_execute} function, let's use a
placeholder function that just panics for now:
\begin{lstlisting}
impl Cpu {
//...
fn decode_and_execute(&mut self, instruction: u32) {
panic!("Unhandled instruction {:08x}",
instruction);
}
}
\end{lstlisting}
Finally we can instantiate everything in our \code{main} function:
\begin{lstlisting}
fn main() {
let bios = Bios::new(&Path::new("roms/SCPH1001.BIN")).unwrap();
let inter = Interconnect::new(bios);
let mut cpu = Cpu::new(inter);
loop {
cpu.run_next_instruction();
}
}
\end{lstlisting}
I've hardcoded the BIOS path for now. It would be better to read it
from the command line, a config file or even some fancy dialog window
but it'll do nicely for now.
We should now be able to build the code. When I run it, assuming that
the BIOS file was found at the correct location I get:
\begin{verbatim}
thread `<main>' panicked at 'Unhandled instruction 3c080013'
\end{verbatim}
As expected the \code{decode\_and\_execute} function died on us but
we managed to fetch an instruction. If you've been using the same BIOS
file as me you should have exactly the same value of
\code{0x3c080013}. If you got an other value something is wrong with
your code. In particular if you end up with \code{0x1300083c} it
means you're erroneously reading in big-endian.
\subsection{Instruction decoding}
We've now fetched our first instruction from the BIOS:
\code{0x3c080013}. What do we do with this?
In order to be able to run this instruction we need to decode it to
figure out what it means. Instruction encoding is of course CPU
dependent so we need to interpret this value in the context of the
Playstation \href{https://en.wikipedia.org/wiki/R3000}{R3000}
processor. Once again the
\href{http://problemkaputt.de/psx-spx.htm#cpuspecifications}{Nocash
specs} have our back and list the format of the
instruction. \href{https://en.wikipedia.org/wiki/MIPS_instruction_set}{MIPS}
is a common architecture used outside of the playstation and you can
find plenty of resources online describing its \href{https://en.wikipedia.org/wiki/Instruction_set}{instruction set}.
Let's decode this one by hand to see how this works. If we look at the
``Opcode/Parameter Encoding'' table in Nocash's docs we see that we
need to look at the bits [31:26] of the operation to see what type it
is. In our case they are \code{001111}. That means the operation is
a \code{LUI} or ``Load Upper Immediate''. \emph{Immediate} means
that the value loaded is directly in the instruction, not indirectly
somewhere else in memory. \emph{Upper} means that it's loading this
immediate value into the high 16 bits of the target register. The 16
low bits are cleared (set to 0).
But what are the register and the value used by the instruction? Well
we need to finish decoding it to figure it out: for a \code{LUI}
bits [20:16] give us the target register: in our case it's
\code{01000} which means it's register 8. Finally bits [15:0]
contain the immediate value: \code{0000 0000 0001 0011} or 19 in
decimal. Bits [25:21] are not used and their value doesn't matter.
In other words this instruction puts \code{0x13} in the 16 high bits
of the register 8. In MIPS assembly\footnote{I'm using the GNU
assembler syntax in this guide unless otherwise noted.} it would be
equivalent to:
\begin{lstlisting}[language=assembly]
lui $8, 0x13
\end{lstlisting}
Enough babbling, let's implement decoding. First I'll wrap the raw
instruction in a nice interface that will let us extract the fields
without doing the bitshifts and masking everywhere. If you look at the
encoding for other MIPS instructions you'll see that it's fairly
regular, for instance immediate values are always stored in the LSBs:
\begin{lstlisting}
struct Instruction(u32);
impl Instruction {
/// Return bits [31:26] of the instruction
fn function(self) -> u32 {
let Instruction(op) = self;
op >> 26
}
/// Return register index in bits [20:16]
fn t(self) -> u32 {
let Instruction(op) = self;
(op >> 16) & 0x1f
}
/// Return immediate value in bits [16:0]
fn imm(self) -> u32 {
let Instruction(op) = self;
op & 0xffff
}
}
\end{lstlisting}
The names for the accessor functions match those I've seen used in the
various references to name the various fields.
We can now leverage that fancy interface in
\code{decode\_and\_execute}:
\begin{lstlisting}
impl Cpu {
// ...
fn decode_and_execute(&mut self, instruction: Instruction) {
match instruction.function() {
0b001111 => self.op_lui(instruction),
_ => panic!("Unhandled instruction {:x}", instruction.0),
}
}
/// Load Upper Immediate
fn op_lui(&mut self, instruction: Instruction) {
let i = instruction.imm();
let t = instruction.t();
panic!("what now?");
}
}
\end{lstlisting}
We're very close to finally run our first instruction in full but
we're still missing something: we see that the register field in this
instruction is 5bits, that means it can index 32 registers. But for
now we only have one register in our CPU: the PC. We need to
introduce the rest of them.
\subsection{General purpose registers}
\begin{table}[ht]
\centering
\begin{tabular}{ l | l | l }
Register & Name & Conventional use \\
\hline
\$0 & \$zero & Always zero \\
\$1 & \$at & Assembler temporary \\
\$2, \$3 & \$v0, \$v1 & Function return values \\
\$4\dots{}\$7 & \$a0\dots{}\$a3 & Function arguments \\
\$8\dots{}\$15 & \$t0\dots{}\$t7 & Temporary registers \\
\$16\dots{}\$23 & \$s0\dots{}\$s7 & Saved registers \\
\$24, \$25 & \$t8, \$t9 & Temporary registers \\
\$26, \$27 & \$k0, \$k1 & Kernel reserved registers \\
\$28 & \$gp & Global pointer \\
\$29 & \$sp & Stack pointer \\
\$30 & \$fp & Frame pointer \\
\$31 & \$ra & Function return address \\
\end{tabular}
\caption{R3000 CPU general purpose registers}
\label{tab:cpuregs}
\end{table}
Table~\ref{tab:cpuregs} lists the registers in the Playstation MIPS
R3000 CPU (ignoring the coprocessors for now). They're all 32bit
wide.
You can see that we have 32 registers (\$0 to \$31) which are the
\emph{general purpose} registers. They're all given a mnemonic used
when writing assembly. For instance, by convention, \$29 is the
\href{https://en.wikipedia.org/wiki/Call_stack}{stack pointer}(\$sp)
and \$30 holds the
\href{https://en.wikipedia.org/wiki/Call_stack#FRAME-POINTER}{frame
pointer} (\$fp).
It's important to understand that those are just a convention between
developers, in the hardware there's no difference between \$29 and
\$30. The point of those
\href{https://en.wikipedia.org/wiki/Calling_convention}{calling
conventions} is to make it possible to make code generated from
different compilers or written in assembly by different coders remain
interoperable. If you write MIPS assembly and want to call third party
functions (like the BIOS functions for instance) you'll have to adhere
to this convention.
Only two general purpose registers are given a special meaning by the
hardware itself: \$zero and \$ra.
\subsubsection{The \$zero register}
\$zero (\$0) is \emph{always} equal to 0. If an instruction attempts
to load a value in this register it doesn't do anything, the register
will still be 0 afterwards.
Having a constant 0 register is useful to reduce the size of the
instruction set. For instance if you want to move the value of the
register \$v0 in \$a0 you can write this:
\begin{lstlisting}[language=assembly]
move $a0, $v0
\end{lstlisting}
However this ``move'' instruction is not actually part of the MIPS
instruction set, it's just a convenient shorthand understood by the
assembler which will generate the equivalent instruction:
\begin{lstlisting}[language=assembly]
addu $a0, $v0, $zero