-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store16 #1
Comments
Thank you for your feedback, don't hesitate to let me know if those parts are unclear or poorly writte, english is not my mother tongue and it's the first time I write something like this. Your code would work for the RAM and "RAM-like" peripherals. Unfortunately not all devices handle all access "widths" properly. NoCash has a more exhaustive list: http://problemkaputt.de/psx-spx.htm#unpredictablethings You can see that some devices treat 16 or 8 bit writes like 32bit writes (I assume by setting the MSBs to 0?) so your code wouldn't be accurate for those. An other case I encountered a few days ago is when reading the timer's registers: the registers are 16bits but if you load them with a LW you end up with the value of the next instruction in the high 16bits. What seems to be happening is that the CPU fetches the next instruction then executes the LW on the timer peripheral. This peripheral sets the low 16bits to the register value but doesn't touch the high bits which still contain the value of the previously fetched instruction, so we end up reading that. Of course it's useless and no game should rely on that value being there but if you want to be completely accurate you have to handle 32bit loads from the timer registers with some special code. An other thing to consider is that sometimes reading from a register can have a side effect. For instance reading from the timer mode registers clears certain bits: http://problemkaputt.de/psx-spx.htm#timers (see bits 11 and 12 in the Counter Mode register). In your implementation you read the current value when doing a store16 but doing so will trigger your "register read" code which will clear those bits, and that's not accurate. You would need a special read function for those "fake" reads and that makes your code more complex. For these reason it's not really possible to write a generic read and write code for all peripherals as far as I can see. |
@simias, the behavior you're referring to is called 'open bus', where a peripheral doesn't update some or all of the data bus. This is a side effect of something called 'capacitance' in the wires connecting to the data bus (any wire, really), where the last value put on the wire lingers for a short amount of time before decaying back to 0. Typically in an emulator, this is implemented with a single variable that you pass by reference to a read function: uint32_t mdr; // memory data register
void read(uint32_t address, uint32_t &data) {
// ...
case 0x1f800000: // example address of a register that uses open bus behavior.
data &= 0xffff0000; // keep the high-order bits
data |= open_bus_register_16; // update the low-order bits
break;
// ...
}
// ...
read(address, mdr);
register[rt] = mdr; The reason why the data is combined with the value of the next instruction is that the CPU just got done reading the instruction, and the data bus still has the instruction's value due to capacitance. I am curious as to what happens in this situation during an instruction cache hit, would the data bus have the current instruction? 4 instructions back? a partially/fully decayed value? I imagine a long loop that doesn't access memory and uses fully cached instructions would cause the bus to decay to 0, but it's just a guess. |
I wasn't sure if it was open bus or if it was just a predictable garbage value. In my experience open bus tends to be a little more random than that, here the values seem predictable even across consoles. That being said I'm used to high-Z on input pads, not within the IC itself. That's why I preferred to be vague about it rather than risk saying something inaccurate. |
It definitely reeks of open bus. The ARM7TDMI does the same thing, although it only has a 3 stage pipeline. Since the data buses are shared for data/instructions this is something you'd expect to see in a pipelined architecture if a component didn't drive all the bus lines. PS: Open bus is usually predictable 😄 |
That does make sense. Since you seem more knowledgeable about those issues don't hesitate to edit the guide if you feel like adding these details. I'm also thinking about writing a similar guide about PocketStation emulation, it's a lot simpler than the PlayStation and might be a better fit for newcomers to emulation. Do you have docs about the ARM7TDMI's pipelining? It's one of the things I haven't implemented yet (all my instructions take the same number of cycles to run for now). |
I just might do that. As far as ARM7 documentation goes, the pipeline is pretty simple: The stages behave exactly as you'd imagine, and there is a register file that is read in the decode stage, written in the execute stage. However when emulating this detail is insignificant. Since the last stage is where I/O occurs, there are no bubbles in the pipeline. (It would be painful to look at since ARM's NOP isn't 0 AFAIR 😄) The pipeline is only significant for emulation of the ARM7 in regards to timing, and PC calculations. Are you planning to emulate the R3051 pipeline at all? |
The problem is that emulating the PlayStation CPU very accurately would be pretty slow. Mednafen emulates the "load absorb" feature to some extent, where if you load something to a certain register the CPU can keep running as long as the instructions don't have a dependency on this register. If you wanted to be very accurate you'd also have to implement the write buffer and probably a few other things. Ryphecha told me that there were a few games which had compatibility issues on Mednafen PSX because of the inaccurate pipeline emulation. Sounds like a tough problem to tackle if you want you emulator to run on commodity hardware. |
I don't know how much I believe that. I think it can be done in an optimal enough way that most computers could run it full speed. CEN64 was approaching full speed while emulating a much more powerful system (2 MIPS CPUs with pipeline emulation). Since a PSX emulator only has 1 pipelined CPU the performance hit wouldn't be as big as you might think. Do you have an IRC room? I'd like to pick your brain on a few things, and offer my experience for your PocketStation project. |
Sure, I hang around on freenode, I just made a #psx channel. My nick is simias, obviously. |
Hi,
First of all, great work! Your guide is very well written, kudos. I really hope you'll be able to finish it, good emulation guides are quite hard to find, and even more rare are those as clean as yours.
I have a small question about the store32/16/8 instructions:
Do you have more informations on this, maybe references? For example, can't we just do something like this (assuming a C code, which doesn't have generics)? Why?
The text was updated successfully, but these errors were encountered: