Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Titanfall 2 fails to start (Unimplemented Opcode) #210

Closed
r3muxd opened this issue Jan 15, 2022 · 83 comments
Closed

Titanfall 2 fails to start (Unimplemented Opcode) #210

r3muxd opened this issue Jan 15, 2022 · 83 comments

Comments

@r3muxd
Copy link

r3muxd commented Jan 15, 2022

6443|0x180008561: Unimplemented Opcode (C6) F0 0F C0 01 0F B6 D8 48 8B C6 8B CB 48 D3 E0

on boot
in tier0.dll

tier0.zip

@pg9182
Copy link

pg9182 commented Jan 15, 2022

@pg9182
Copy link

pg9182 commented Jan 15, 2022

Also, note that this was with the Northstar dedicated server + wine64 + d3d11 and gfsdk stubs in the game dir (with d3d11 set to native in winecfg).

ptitSeb added a commit that referenced this issue Feb 6, 2022
@pg9182
Copy link

pg9182 commented Feb 13, 2022

Thanks @ptitSeb, we'll try running it again soon (I didn't notice your commit earlier).

@pg9182
Copy link

pg9182 commented Feb 13, 2022

OK, another one: 1444394|0x188a17fd: Unimplemented Opcode (3B) CA 76 04 49 8B C3 C3 72 57 4D 3B 43 40 75 51 (retf).

@pg9182
Copy link

pg9182 commented Mar 15, 2022

@ptitSeb?

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 15, 2022

the opcode is a RET FAR. it maybe linked to a 64bits <-> 32bits call / ret, and need some more refactor. I'll probably start working on next dev. cycle (so in 0.1.9)

@Kuratius
Copy link

Current release is 0.2.0, what's the current status on it?

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Current status of what?

I don't have Titanfall 2 in my collection so I cannot tes myself. I haven't advanced on 64<->32 call as I have no program using that currently.

@Kuratius
Copy link

If you tell me your email or a steam account I can buy Titanfall 2 for you as a steam gift, if you're still interested in working on this. It's currently on sale, so it's a good time to do so.

@Happyllama25
Copy link

If you just want to test the server, it requires few of the actual game files, there’s a script somewhere that downloads the correct files. You wouldn’t be able to open the game and join the server for full testing though, caveat there.

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

My steam account is _ptitSeb_. Is this game big?

@Happyllama25
Copy link

Happyllama25 commented Dec 11, 2022 via email

@Kuratius
Copy link

Kuratius commented Dec 11, 2022

My steam account is _ptitSeb_. Is this game big?

I've sent you a friend request on steam. For some reason I can't send a gift using just a name or an email, at least according to the web interface.

This is the project where this was tested:
https://github.com/pg9182/northstar-dedicated

According to the readme it's possible to trim the game to 2GB-4GB if necessary, but the base game is fairly big.
The idea is to make it possible to run a headless server on ARM.

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

I'll probably need to remove Doom from my ARM dev. machine then, as it's about the same size. Or Dota2, it's probably big too. Both are working fine now, so it's not a problem to switch to something else.

@pg9182
Copy link

pg9182 commented Dec 11, 2022

  • For the dedicated server only (which is the important part -- the goal is to be able to run servers on the Oracle Free Tier, and on Raspberry Pis), you can use https://gist.github.com/pg9182/9a962adbfc27e93237cd14e4523c9da8 to download the 2.5GB of files I've optimized. You'll need to download Northstar and extract it over the downloaded files.
  • You will need 3 cores, or it will likely hang during startup due to limitations in the game (1: the game checks for 3 cores in a few places, and while we've managed to get it to start on 2 with a lot of coaxing, it's not reliable and we haven't included the patch in Northstar, 2: there's some screwy threading stuff around rpak loading which we haven't figured out which makes it deadlock if you force it to start on 1 core).
  • I recommend Wine 7.0, but anything 7.0+ should work (however, note that ~7.8+ (possibly as early as 7.3) has performance issues with the server, although that doesn't matter if you're just testing).
  • My custom wine build in the Docker image is not mandatory. You should disable ShowCrashDialog in the registry. winedbg will probably not work.
  • My nswrap wrapper can be compiled standalone, and works on Linux 5.4+. I highly recommend using it, as it'll manage Xvfb and clean up the Wine output. You don't need Xvfb f you have a real X server. To enable Xvfb, use DISPLAY=xvfb (nswrap will handle the env var).
  • To test it, you can use a command like ./nswrap /path/to/northstar/files -dedicated with wine64 (and you don't need WOW64 support in wine btw) and Xvfb in your PATH.
  • If Northstar v0.11.0 gets released, I still recommend staying on v0.10.x since the crash handler changes may make it more difficult to test.
  • The R2Northstar directory needs write permissions.
  • Not all logged errors are actually a problem; basically just look for a "registered to master server" and a "mapspawn" log with some ASCII art and a bunch of warnings/errors about AINs without a crash to know it's successful.
  • You don't need a real or emulated GPU; my D3D11 stubs automatically loaded in Northstar 1.6+ will do the job.
  • Don't hesitate to contact me here or on Discord if you have trouble getting it set up.
  • If it's easier for @ptitSeb, they can use the full Titanfall 2 build on Steam and simply test the vanilla client, but I wouldn't recommend it since it's actually more complicated (due to Origin/EA issues), and we are mostly just aiming to run the Northstar dedicated server (which also happens to patch out a lot of code paths).

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Ok, I'll start grabbing the optimized server (my dev. arm machine has 8 cores, and I have a wine 7.5 & wine 7.22, both 64bits, ready to use)

@Kuratius
Copy link

@ptitSeb I sent you a steam gift with Titanfall 2, I hope it helps.

@pg9182
Copy link

pg9182 commented Dec 11, 2022

also cc @GeckoEidechse

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Thank you @Kuratius
You I try the optimized server for now or switch to the full version directly?

@Kuratius
Copy link

Kuratius commented Dec 11, 2022

Thank you @Kuratius You I try the optimized server for now or switch to the full version directly?

I'd say try out the optimized server, as getting that working is probably more useful.

But @pg9182 probably has a better grasp on what to do.

@pg9182
Copy link

pg9182 commented Dec 11, 2022

And if you want to connect to the server once it runs, you'll need to forward tcp/8081 and udp/37015 (you can change these, though), and you'll find it in the Northstar server list (see the wiki for client setup instructions for Steam).

I'd say try out the optimized server, as getting that working is probably more useful.

Yes, and even if you do plan to get the full client working, it's a bit easier to test the server.

Oh, and if you do try the client and get an Invalid Name error, that's a known issue caused by changes on EA's side, but send me your Origin UID or a request ID from the logs, and I can manually do a workaround for you.

But @pg9182 probably has a better grasp on what to do.

Yes; I did a lot of the stuff for getting the server working on Linux (docker image, nswrap, file optimization, d3d stubs)

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Ok, So I downloaded the server.
aunching wine64 NorthstarLauncher.exe -dedicated start the thing, and end with a crash for read 0x0000000000000001
Is that what you have? I haven't tried nswrap yet as I'm trying to see how/where it crash with the minium stuff loaded in memory, to ease the debugging for now.

@pg9182
Copy link

pg9182 commented Dec 11, 2022

with the minium stuff loaded in memory

nswrap will make debugging easier; it helps normalize stuff.

end with a crash for read 0x0000000000000001

I'd need to see the logs around it; I haven't tried running it on ARM since Feb.

P.S. I might take a little longer to respond (if I respond) for the next two hours.

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Using emulated /home/seb/wine/lib/wine/x86_64-unix/crypt32.so
[2022-12-11 17:50:40.250] [info] Profile was not found in command line arguments. Using default: R2Northstar
[17:50:40] [info] Enabling hook _GetCommandLineA
[17:50:40] [info] Enabling hook _LoadLibraryExA
[17:50:40] [info] Enabling hook _LoadLibraryA
[17:50:40] [info] Enabling hook _LoadLibraryExW
[17:50:40] [info] Enabling hook _LoadLibraryW
[17:50:40] [info] Command line: "Z:\home\seb\Games\x86_64\Titanfall2_server\NorthstarLauncher.exe" -dedicated +setplaylist private_match
[17:50:40] [info] NorthstarLauncher version: 1.10.9.0
[17:50:40] [info] Loading resource from library
[17:50:40] [info] Succesfully loaded R2Northstar/plugins\DiscordRPC.dll
[*] Loading l0108:fixme:ver:GetCurrentPackageId (0000000019D0FE10 0000000000000000): stub
auncher.dll
[*] Launching the game...
Failed to instantiate discord core! (err 4)
[17:50:41] [info] Enabling hook ReadFileFromVPK
[17:50:41] [info] Enabling hook CBaseFileSystem__OpenEx
[17:50:41] [info] Enabling hook AddSearchPathHookterialSystem
[17:50:41] [info] Enabling hook ReadFromCacheHook11.dll
[17:50:41] [info] Enabling hook MountVPKHook
[17:51:54] [error] Northstar has crashed! a minidump has been written and exception info is available below:
[17:51:54] [error] Cause: Access Violation
Attempted to read from: 0x0000000000000000
[17:51:54] [error] At: filesystem_stdio.dll + 0xe890a
[17:51:54] [error]     Northstar.dll + 0x52df4 (0x179b2df4)
[17:51:54] [error]     ntdll.dll + 0x27fd6 (0x170027fd6)
[17:51:54] [error]     ntdll.dll + 0x60a25 (0x170060a25)
[17:51:54] [error]     ntdll.dll + 0x5e5ae (0x17005e5ae)

It was just a quick test. I'll build nswrap and try it properly...

@pg9182
Copy link

pg9182 commented Dec 11, 2022

Ugh, haven't seen that one before... I'll look into it later today if you can't get it to work. Might also be a good idea to try running it unemulated so you can compare the output. @BobTheBob9 might also be able to help.

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Here with nswrap (just the end)

[18:15:37] [info] Registering ConCommand reload_mods
[18:15:37] [info] CreateInterface ENGINE VCvarQuery001
[18:15:38] [info] Enabling hook D3D11CreateDevice
[18:15:38] [info] CreateInterface ENGINE VAvi001
[18:15:38] [info] CreateInterface ENGINE VBik001
[18:15:38] [info] CreateInterface ENGINE VENGINE_LAUNCHER_API_VERSION004
[18:15:38] [info] CreateInterface ENGINE VDataCache003
[18:15:38] [info] CreateInterface ENGINE VPrecacheSystem001
d3d11: D3D11CreateDevice: initializing d3d11 stub for northstar (github.com/R2Northstar/NorthstarStubs)
Using emulated /home/seb/wine/lib/wine/x86_64-unix/opengl32.so
0154:fixme:nvapi:unimplemented_stub function 0x7f9b368 is unimplemented!
0104:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION
[18:15:38] [info] MountVPK vpk/client_frontend.bsp
[18:15:38] [error] Northstar has crashed! a minidump has been written and exception info is available below:
[18:15:38] [error] Cause: Access Violation
Attempted to read from: 0x0000000000000000
[18:15:38] [error] At: filesystem_stdio.dll + 0x84e90
[18:15:38] [error]     Northstar.dll + 0x52df4 (0x179b2df4)
[18:15:38] [error]     ntdll.dll + 0x27fd6 (0x170027fd6)
[18:15:38] [error]     ntdll.dll + 0x60a25 (0x170060a25)
[18:15:38] [error]     ntdll.dll + 0x5e5ae (0x17005e5ae)
[18:15:38] [error]     filesystem_stdio.dll + 0x84e90 (0x1e3c4e90)
[18:15:38] [error]     filesystem_stdio.dll + 0x61df0 (0x1e3a1df0)
[18:15:38] [error]     filesystem_stdio.dll + 0x5d15c (0x1e39d15c)
[18:15:38] [error]     filesystem_stdio.dll + 0x180b8 (0x1e3580b8)
[18:15:38] [error]     filesystem_stdio.dll + 0x182a6 (0x1e3582a6)
[18:15:38] [error]     filesystem_stdio.dll + 0x18cf5 (0x1e358cf5)
[18:15:38] [error]     filesystem_stdio.dll + 0x1837a (0x1e35837a)
[18:15:38] [error]     engine.dll + 0x1516c1 (0x364a16c1)
[18:15:38] [error]     engine.dll + 0x1511f8 (0x364a11f8)
[18:15:38] [error]     engine.dll + 0x150b55 (0x364a0b55)
[18:15:38] [error]     engine.dll + 0x1346c5 (0x364846c5)
[18:15:38] [error]     engine.dll + 0x1c7d2a (0x36517d2a)
[18:15:38] [error]     launcher.dll + 0xb9d1 (0x19d6b9d1)
[18:15:38] [error]     launcher.dll + 0x15205 (0x19d75205)
[18:15:38] [error]     launcher.dll + 0x15ae9 (0x19d75ae9)
[18:15:38] [error]     launcher.dll + 0x15afd (0x19d75afd)
[18:15:38] [error]     launcher.dll + 0xd386 (0x19d6d386)
[18:15:38] [error]     NorthstarLauncher.exe + 0x48dd (0x1400048dd)
[18:15:38] [error]     NorthstarLauncher.exe + 0x8068 (0x140008068)
[18:15:38] [error]     kernel32.dll + 0x29a89 (0x7b629a89)
[18:15:38] [error]     ntdll.dll + 0x66ecc (0x170066ecc)
[18:15:38] [error]     NorthstarLauncher.exe + 0xfffffffec0000000 (0x0)
[18:15:38] [error]     NorthstarLauncher.exe + 0x80d8 (0x1400080d8)
[18:15:38] [error]     NorthstarLauncher.exe + 0x7fef0000 (0x7fef0000)
[18:15:38] [error]     NorthstarLauncher.exe + 0xfffffffec0000000 (0x0)
[18:15:38] [error] RAX: 0x5e7
[18:15:38] [error] RBX: 0x64728b0
[18:15:38] [error] RCX: 0x7d92400
[18:15:38] [error] RDX: 0xfffffffff826dc00
[18:15:38] [error] RSI: 0x1e428980
[18:15:38] [error] RDI: 0x64728b0
[18:15:38] [error] RBP: 0x1
[18:15:38] [error] RSP: 0x86e258
[18:15:38] [error] R8: 0x5e7
[18:15:38] [error] R9: 0x2f
[18:15:38] [error] R10: 0x0
[18:15:38] [error] R11: 0x7d92400
[18:15:38] [error] R12: 0x0
[18:15:38] [error] R13: 0x1e428900
[18:15:38] [error] R14: 0x0
[18:15:38] [error] R15: 0x5e7
wine: Unhandled page fault on read access to 0000000000000000 at address 000000001E3C4E90 (thread 0104), starting debugger...

Yeah, I'll check unemulated to see what it does

@ptitSeb
Copy link
Owner

ptitSeb commented Dec 11, 2022

Mmm, so, the program is crashing in a portion of code that is obfuscated. I guess there is still a bug somewhere in the dynarec to find. I activated BOX64_DYNAREC_SAFEFLAGS=2 and it seems to help, but it's not enough. Need to find hat bug. That might takes time (lot's of time :( )

@p0358
Copy link

p0358 commented Dec 11, 2022

If the offset from Northstar log is correct, it does appear it crashes inside of standard memmove implementation inside of filesystem_stdio.dll while trying to execute instruction mov rax, [rdx+rcx]

@p0358
Copy link

p0358 commented Mar 10, 2023

Turned out my UDP socket doesn't work because Respawn engine uses AF_INET6 family and apparently the ARM device's kernel that I flashed has IPv6 support completely disabled for some reason (💀), and due to how kernel is built for that device I'd have to basically rebuild it to actually undo that, so I won't be able to test any further for now until I find a workaround or this issue is fixed: hexdump0815/imagebuilder#15 ;_;

(forcing AF_INET makes the socket open but game ignores packets, probably because the C structures are not really exactly inter-compatible, such as one for address when trying to send response)

At least we know it's not a box64 issue though...

(which is a good message, because I guess it means it should (without having tested that) genrally work for tfods, and then only Northstar is to be figured out...)

@p0358
Copy link

p0358 commented Mar 10, 2023

I come back with good news. After sorting out my kernel, Titanfall Online dedicated server actually fully works now!
image

Below some thoughts on performance results from my quick test for anyone who's interested in running these servers.

When it comes to performance, it's worth noting that Oracle's Ampere A1 should be quite a bit more powerful than my poor MediaTek MT8173C (2x Cortex-A72 + 2x Cortex-A53, 1.8-2.1 GHz), while Ampere A1 is allegedly consistent 3.0 GHz. I also didn't disable any CPU security vulnerability mitigations. And yet for some reason my idle CPU usage on server was more like 3.5 ms of frame time, instead of the 2.5 ms or even 1.65 ms I'd get before, idk why. So, potentially, consider these results the worst case scenario.

So the server as it first booted up, spawned the biggest initial amount of AIs. And then the server usage went above 100% sadly, which meant that the server can't keep up with the amount of frames it has to generate. Yet it really didn't make the game feel unplayable, perhaps the server was able to process multiple ticks per frame without completely falling behind.

Then when the AIs all killed each other, the server usage dropped back to 70-80% CPU, which made it quite reasonable again (but then again, with only me as single player jumping around). But there was another problem, overage frames. AI processing likes to cause frametime spikes even on x64 CPUs, but usually the percentage of average frames doesn't ever exceed 0.5%. In case of our emulated ARM it was 5.5%, which is definitely far above ideal (though still better than the initial >100% CPU usage for sure).

Afterwards I ran script disable_npcs() to get rid of AI. Without AI, the server CPU usage dropped to 55%. To be honest, I expected more drastic drop, but then again 55% is now very reasonable score. Overage frames also slowed down in their frequency and have definitely fallen well below 5%, which means that all in all this resulted in perfectly smooth and playable server, not going over its frame budget almost ever.

Some quick conclusions then:

  • server without AI = should be perfectly playable even on weaker hardware, think old ARM laptop or Raspberry Pi (but admittedly I did not test with 12+ players etc, so keep that in mind...)
  • server with AI = performance is spotty and there's a lot of overage frames. But it exceeds the frame budget just barely: ~110%. Which means that on Ampere A1 at Oracle cloud it could actually still be fine!!! We might need to integrate some profiler and see if anything in the frame stands out as too performance intensive on emulated ARM and if we can do anything to optimize it. (for example on first startup parsing TFOData (TOML+CSV) takes 100x more than x64, compared to average 7x general slowdown -- perhaps these things could be specifically optimized or maybe even moved to a native aarch64 lib??????)

So overall these results are honestly still very promising, I don't remember exactly what kind of numbers was @pg9182 getting when trying other emulators, but IMO here this is something we can absolutely work with. In worst case scenario the servers just won't have 100% flawless performance or would run without AI, but far from unplayable. I could also give a snarky remark here that it still might run very decent compared to official Respawn servers kek.

Oh also we could potentially still improve performance further later on by tinkering around some BOX64 env options. (Also Northstar still broken, but it's 3 am so not figuring that out today xd)

Big big thanks to ptitSeb for all his work on this project and for all the fixes that eventually led to this moment.

@Kuratius
Copy link

Kuratius commented Mar 10, 2023

@p0358 Are you sure you were reading the cpu usage indicator correctly?
Try htop instead of top if you want to see individual cores, otherwise it'll just add all of them together.
110% could be anything from two cores at 65% to 1 core at 100% and a second one at 10%, 4 cores at 27.5%, etc.

3.5ms frametime is 285 fps, I don't think a game server normally runs that high of a tickrate, it's usually around. 20-120ticks depending on the game.

Also what is an overage frame?

@ptitSeb
Copy link
Owner

ptitSeb commented Mar 10, 2023

First things first. It needs to run fine before we look at possible optimisations.

For optimisations:

There are the box64 env. var. to play with. Also, you can use ~/.box64rc to set env. var. for speicifc process name. And once you have a good set of settings, this can be reported to /etc/box64.box64 (via a PR for example) so everyone can benefit from it.

Also, you can try to run with BOX64_DYNAREC_LOG=1 and see if some opcode are missing from the dynarec.

@p0358
Copy link

p0358 commented Mar 10, 2023

Are you sure you were reading the cpu usage indicator correctly?

Actually no, fair point, Source should use WinAPI to measure process CPU time since last check, so that means above 100% could mean other threads caused that, rather than going over the mark. Still, the frame time was approaching the frame budget at that time.

3.5ms frametime is 285 fps, I don't think a game server normally runs that high of a tickrate, it's usually around. 20-120ticks depending on the game.
Also what is an overage frame?

So the servers run by default (in Titanfall 2 and Online) at 60 Hz tickrate, which gives 16.666... ms (1/60) maximum time for a single tick processing (this value is called INTERVAL_PER_TICK). Frames normally run much for much shorter amount of time, and then the main loop would measure how much a frame took to generate, and sleep for the remaining time to INTERVAL_PER_TICK, telling the kernel to yield the processor to other programs, so that the average framerate is equal to tickrate.

An overage frame is a frame that exceeded this maximum time, which means the server has to start computing the next frame as soon as it has finished the previous one. What happens then is that it might either try to simulate two ticks in one frame, or rewind the clients' simulation back in time. A small percentage of overage frames shouldn't be a tragedy, but it impacts the ideal network experience for sure.

As the percentage rises or the average frame time exceeds the INTERVAL_PER_TICK, trouble arises, as the server can't keep up with real-time processing and simulating the game world, and that will mean it will have to keep rewinding the clients back in time all the time, completely ruining the experience for players as they can visibly keep moving backwards on their screen during their movement.

So 3.5 ms frametime doesn't equal to 285 fps, because the dedi is supposed to sleep for the remaining of (16.77.. - 3.5 ms) time.

On the note of performance, I've read some random comment on the Internet claiming Ampere A1 has 3x better performance than Raspberry Pi 4B. If my CPU could be said to be comparable to this RPi (technically same Cortex-A72 cores but clocked a bit higher), then it's an excellent message, as it could basically mean our performance concerns are solved as the performance then should already be good enough. I should probably soon look into registering there to actually try it out too...

Also for weaker devices in worst case scenario, and to keep using AI, we could turn back the framerate from 60 to 30 like Titanfall 1 did. Not ideal, but perhaps that'd still result in an overall better experience than running with average 5% of overage frames (idk). Also I imagine the PvE gamemode of Frontier Defense could be ran at 30 tickrate there for smooth experience without any moral concerns about it then xd

Also, you can try to run with BOX64_DYNAREC_LOG=1 and see if some opcode are missing from the dynarec.

Right, if these are disabled by default now, then I should probably definitely run that to ensure nothing was missed and ignored (though such occurence has low chances of not resulting in a crash, so hopefully we shouldn't encounter actually any missing opcodes anymore).

@p0358
Copy link

p0358 commented Mar 10, 2023

So first performance thing in my case is ensuring proper core affinity so that I run on Cortex A72 rather than A53. It seems taskset -cp 2-3 results in ~2.3 ms frame time on mp_lobby, while taskset -cp 0-1 in 3.0 ms. Not setting affinity is around 2.5 ms but less consistent. So that's a good starting point in my case on big.LITTLE cpu design. After setting process niceness to -20 (highest priority), the difference became even slightly better in favor of the better cores.

Then BOX64_DYNAREC_SAFEFLAGS=0 seems to improve that frametime further by around 0.3 ms down quite reliably. On the other hand, doing BOX64_DYNAREC_BIGBLOCK=2 and BOX64_DYNAREC_FORWARD=512 seems to have changed pretty much nothing (in the idle frames at least...). And BOX64_DYNAREC_CALLRET=1 actually worsened the performance by 0.2 ms. Similarly BOX64_DYNAREC_FASTPAGE=1 apparently worsens the performance by 0.3-0.5 ms (enabling bigblock=2 and forward=512 makes is 0.5 rather than 0.3 interestingly).

All these tests were on idle server on mp_lobby and lowest possible process niceness, as those were the most consistent conditions to test that out. It also seems that the server is a bit more memory hungry eating ~30% more RAM, but that really doesn't matter when Oracle gives 12 GB of RAM to use per 2 cores kek

Honestly hard to tell which options cause what effect as mid-testing I started getting inconsistent results, so I had to cross out my previous findings. All in all with affinity and all the options below randomly enabled, it seemed that the performance on simple attrition on Box with some grunts was better than in my very first test.

[tfods.exe]
BOX64_DYNAREC_SAFEFLAGS=0
BOX64_DYNAREC_BIGBLOCK=2
BOX64_DYNAREC_FORWARD=512
#BOX64_DYNAREC_CALLRET=1
BOX64_DYNAREC_BLEEDING_EDGE=0
BOX64_PREFER_WRAPPED=1
BOX64_NOPULSE=1
BOX64_NOGTK=1
BOX64_NOVULKAN=1
BOX64_X11GLX=0
BOX64_DYNAREC_FASTPAGE=1

(I made a far-fetched assumption BOX64_DYNAREC_FASTPAGE would be safe as MinHook calls the WinAPI func to clear processor instruction cache after doing its hooks, and that we don't call the patched/hooked code at all before first time modifying it)

However, at least on my hardware, Frontier Defense (most CPU intensive gamemode probably) is certainly no go at 60 tickrate:
image

At 30 tickrate it should generally be fine though, as it only hit something like 45 ms frametime just once. Though who knows how that'd look like if 20 titans dropped in some later waves. In any case, I still have the highest hope of Ampere CPUs performance now to salvate us, rather than my 15 € Chromebook xd (who knows if it didn't start thermal throttling during my testing or something lol)

@p0358
Copy link

p0358 commented Mar 10, 2023

It seems Northstar server does actually boot up with BOX64_DYNAREC_SAFEFLAGS=2!!

[16:15:22] [SCRIPT SV] [info] Code Script: _init
[16:15:22] [NATIVE SV] [info] Created new format snapshot class baseline with 78 classes, 7601 properties and 36KB of prop data.
[16:15:22] [NORTHSTAR] [info] loading took 58.0886827s

@Kuratius
Copy link

Kuratius commented Mar 10, 2023

So 3.5 ms frametime doesn't equal to 285 fps, because the dedi is supposed to sleep for the remaining of (16.77.. - 3.5 ms) time.

That sounds like either the server has multiple threads running on a single core (not optimal) or it is not actually a dedicated server and is instead running multiple servers. 3.5 ms seems like a magic number to me that might be a leftover from when a single server was supposed to host multiple matches or do x matches per core or something like that. I know valorant servers run that way.
https://technology.riotgames.com/news/valorants-128-tick-servers
Being focused on sleeping 80% of the time makes that very likely.

The screenshot you posted has more sensible numbers, though I'm unsure how 33ms (30fps) average frametime gets you to 48fps. I assume it probably dropped ticks to get that result, assuming that's possible. I still think it's probably not problematic, tf clients are only supposed to run 20 tickrate anyway so it might not be an issue if the server can't make 60.

@Kuratius
Copy link

Kuratius commented Mar 10, 2023

https://youtu.be/0SQx3b9iGhg

Screenshot_20230311-074948

With this model, a server running 40/20 for example is probably still fine, assuming that's configurable.

40Hz also has the interesting property that it's exactly in the middle between 60Hz and 30Hz in terms of smoothness (frametime of (1/60 +1/30)/2 is 1/40) , so it should still be better than Tf1.

@p0358
Copy link

p0358 commented Mar 11, 2023

I'm unsure how 33ms (30fps) average frametime gets you to 48fps.

The server was lagging and behind its 60 fps target at which it's normally supposed to sit, that's why. Notice how there's 17% of overage frames, which is a pretty big number. It seemed that dropping an enemy titan on FD was nail in the coffin on that CPU and tickrate at least.

3.5 ms seems like a magic number to me that might be a leftover

No. The server actual frames are supposed to take the least time possible, then sleep until it's time to generate the next frame, according to the tickrate goal. On x64 it takes like < 1 ms for frame generation on an idle server (0.3 ms on idle mp_lobby with no players to be precise). 1/60 so 16.66 ms is just tick interval, how much MAXIMUM amount of time a frame generation can take, until you're running behind schedule and then you're in trouble, and players start warping back or having noticeable delay in shooting etc. (though tbh for most of the gameplay it surprisingly still wasn't THAT bad somehow, definitely much less bad than when the servers would get DoS'd by any of the past methods and actors in the wild xD)

With this model, a server running 40/20 for example is probably still fine, assuming that's configurable.

Maybe, Source is apparently somewhat flexible in its tickrate. But currently something prevents you from even touching the updaterate and changing it to anything but 10 without causing severe issues with stutters, probably caused by wrong interp calculations, something that even Nexon didn't fix. So Titanfall Online runs at 60 Hz tickrate, but yet 30/10 cmdrate/updaterate (cmdrate can be changed just fine though).

Tickrate is configurable by a convar in Titanfall 2, while for TFO I have made a bunch of patches you can enable by using -tickrate <value> command line option, works pretty much the same way.

Also cmdrate changes are rather cheap for the server, since a usercmd is generated every client frame anyways. They're just throttled and queued up to be sent in the next packet, so the server processes the same amount of data (so yes technically you having more client fps slows down the server lol).

And updaterate changes are more expensive, because every time at least one client needs an update, a snapshot of world changes must be taken. However it's still very much cheaper than simulating a tick, where physics and all positions and optionally scripts need to be updated.

So cmdrate/updaterate imo don't really matter that much in terms of compute performance of a server, they mostly matter for network optimization (not counting one quirk of constant cmdrate in Respawn engine that I'm not going to explain here rn). Tickrate is what matters for CPU time.

(on an unrelated note I really wish I could bump up updaterate in r1 up from default 10, but figuring out the inlined and hardcoded interp calculations made me only wish to commit sudoku and didn't bring any progress whatsoever xd)

(I'm salty about the networking improvements Respawn made in r2)

@Kuratius
Copy link

No. The server actual frames are supposed to take the least time possible, then sleep until it's time to generate the next frame, according to the tickrate goal. On x64 it takes like < 1 ms for frame generation on an idle server (0.3 ms on idle mp_lobby with no players to be precise). 1/60 so 16.66 ms is just tick interval, how much MAXIMUM amount of time a frame generation can take, until you're running behind schedule and then you're in trouble, and players start warping back or having noticeable delay in shooting etc. (though tbh for most of the gameplay it surprisingly still wasn't THAT bad somehow, definitely much less bad than when the servers would get DoS'd by any of the past methods and actors in the wild xD)

That's still avoiding the question. Without answering why a 3.5ms target is necessary it's still a magic number. I'm going to assume that's a leftover from trying to run multiple matches per core, else it doesn't make sense.
See the valorant blog post above, OS overhead is not 80+ %.

@p0358
Copy link

p0358 commented Mar 11, 2023

3.5 ms is not a magic number, it was just an example, since you asked about 3.5 ms that was on my screenshot. It's not a magic number or a target, it was simply the average frame time from last second at the moment of taking that screenshot, nothing more.

The only target goal is to run with as lowest average tick time as possible, to stay as far away from quota as possible. Which is much more problematic here than normally due to emulation overhead.

Valorant blog post is good, but in our situation we can't go that deep in optimalization as easily as they did without access to source code and as such deep insight into inner workings of the engine. Their target was to utilize every core to the fullest possible extent, which saves them real money on scale. However problem with Titanfall servers is that they (at least in r1) like to have frametime spikes, in which case it's easy for one such server to affect all others a lot. Which is why originally Titanfall 1 servers had 2 dedicated vCores, and Titanfall 2, iirc, would have one main core per server plus some helper core for multiple servers at once (they do some operations in a threaded fashion on the thread pool such as bone processing and stuff like that here and there, also network packet sending/receiving). But yeah, the concept of frame's deadline (in their case 1/128) is kinda what applies here for us too.

@Kuratius
Copy link

Kuratius commented Mar 11, 2023

Thank you for explaining, it seems I misread your previous comment.

@Kuratius
Copy link

Kuratius commented Mar 11, 2023

This is probably also interesting, it's a comment on the tickrate by a Titanfall server engineer.

https://old.reddit.com/r/titanfall/comments/4v36zj/tickrate_is_about_ai_and_not_about_player/

It seems that even running 20Hz tickrate would be acceptable (?).

@p0358
Copy link

p0358 commented Mar 11, 2023

Good find, I was just thinking of scrolling through his reddit comments to find something related to that.

A tick in our case is a frame loop on the server. That's it.

Uhhhh, that's not exactly true. I hope he oversimplified here, as frame rate is supposed to be limited on a dedi to the same amount as tickrate normally, and a lot of things assume that's the case in engine, but that doesn't mean it's a guarantee. In abnormal circumstances a single frame run can simulate two ticks at once, or none!

Usercmds don't have any notion of a tick in Titanfall (in Valve games they do have a link to ticks - this is something we changed on Titanfall 1). 144hz clients send 144 usercmds to the server to simulate, not 60 or 20 or anything else. When the server runs those commands, it does them at client rates, not tick rates

Very true. I kinda wish I was aware of these comments before I figured that part on my own kek. This is also why cmdrate is an artificial limitation/leftover from Valve's Source. It actually might cause issues, since every time a packet from client is scheduled to be sent (with multiple usercmds), it has single CLC_ClientTick attached with current tick and current simulation times. I kinda wonder if the latter aren't making the simulation of usercmds inaccurate. They allegedly fixed this in Apex Legends and made every usercmd actually send a packet, and it's a thing I added as experimental option to TFO, seems to improve things a bit. It only made sense to keep cmdrate in case someone's client connection couldn't handle say 60 or 144 frames per second to be sent instead of 30 (60 in r2), otherwise it practically only works against the player (at least compared to players without this limitation, who can get their packets faster to the server without it and thus gain better "ping advantage" kinda).

Snapshots are not sent every tick. We have a snapshot rate of 20hz on Titanfall 2, and a tick rate of 60.

On that note, I noticed something weird during experimentation around updaterate. I think the function that evaluates whether it's time to send a world snapshot to a player is called several many times single frame. If I'm correct on that, it could mean server wastes some performance by generating multiple separate world snapshots for players, instead of reusing the same snapshot from the same tick for multiple players like original Source did. Would definitely try to optimize it if I didn't have the interp issue ;_;

If we dropped the tick rate from 60 to 20, the only difference would be that AI could wake up less often and decide what to do next. And if we raised it, all that would happen is that AI could potentially more often, but not necessarily (if the AI are all waiting for 1/60th of a second intervals, the server would just run a frame that would do no work).

That's an interesting perspective to look at it. Hmm... He does have a point, but at the same time I'm not sure if lower tickrate and thus simulating world in bigger increments still wouldn't have its side effects, especially in regards to timings on who kills whom at which time etc. AI also isn't really simulated every frame either methinks. If what he said was really true then it wouldn't make sense to have 30 tickrate instead of 10 in the first place, neither to rise it to 60 in Titanfall 2, would it now?

At the same time the low updaterate kinda does work against the accuracy of the gameplay. Only every 3rd tick there's a snapshot of all changes generated and sent, and clients need to roll back their time and interpolate the movements between them, which isn't accurate. I wonder what originally led to this decision. Servers supposedly have a lot of bandwidth, clients should have on average must faster download then upload speeds. It kinda seems like this is the worst of both worlds, either lower the tickrate as he claimed here, or rise the updaterate, so that clients get real accurate updates for their local simulation.

If tight client bandwidth optimization was their primary goal/paranoia back in 2013 (as it sounded from his talks), I think we could safely lift this updaterate constraint here now 10 years later...

I really wish I could honestly talk to that guy and ask him some further questions about all of this 10 years later, especially in regards of his newer experience with Titanfall 2 and Apex, not sure if he'd respond to me nowadays now that he's not at EA anymore either...

@Kuratius
Copy link

Kuratius commented Mar 11, 2023

On that note, I noticed something weird during experimentation around updaterate. I think the function that evaluates whether it's time to send a world snapshot to a player is called several many times single frame. If I'm correct on that, it could mean server wastes some performance by generating multiple separate world snapshots for players, instead of reusing the same snapshot from the same tick for multiple players like original Source did. Would definitely try to optimize it if I didn't have the interp issue ;_;
At the same time the low updaterate kinda does work against the accuracy of the gameplay. Only every 3rd tick there's a snapshot of all changes generated and sent, and clients need to roll back their time and interpolate the movements between them, which isn't accurate.

This is just speculation, but if making a gameworld snapshot is sufficiently expensive, and you make a different snapshot for every player to limit wallhacks/esp instead of sending everyone the same snapshot then that could have something to do with it. Even if the current game doesn't do that, it could have been a design consideration.

@GeckoEidechse
Copy link

I wonder what originally led to this decision. Servers supposedly have a lot of bandwidth, clients should have on average must faster download then upload speeds.

IIRC from the Respawn talks I watched many years back, server->client messages are a lot bigger as the contain the entire world state while client->server only contain only that client's input. So even though there are 3x client->server than server->client messages, I think they are still smaller than one server->client message.

(We're getting quite out of topic from the original issue here btw :P)

@ptitSeb
Copy link
Owner

ptitSeb commented Jul 4, 2023

Hey there! The _dl_find_object function has been partially implemented. It should be enough for most use case. There are been a few stability improvments also, so it would be worth trying again the server with box64.

@GeckoEidechse
Copy link

Someone in our Matrix server did a quick test and based on their claims we can at the very least make it all the way to the in-game lobby now. We weren't able to test further yet but this is already quite promising :D

@Hacker1245
Copy link

Hacker1245 commented Jul 14, 2023

Yeah the server appears to work fine, but there are some ping spikes, which I am not sure if they are caused by my PC or the server. Loading into a game also works fine (tested with two players). Used Proton 8.0 for testing.

@Jan200101
Copy link

Tested on a Raspberry Pi 4 (4GB Ram) using the full 64bit Raspberry Pi OS.
The wine version used was from debian and downloaded using the script found in X64WINE.md.

The only real issues appear to be caused by the unfit hardware of the Pi, but it worked regardless.

image

@p0358
Copy link

p0358 commented Jul 14, 2023

I don't think Northstar has actually any frame time performance metrics shown anywhere unlike TFORevive, it'd be pretty useful for this to see how it fares

@Kuratius
Copy link

Kuratius commented Aug 11, 2023

These people seem to have tested box64 for the full Titanfall 2 client on a switch.
Not playable (15 to 25 fps, audio issues), but given the underpowered hardware of the switch a full client probably has ok performance on a modern desktop cpu.
https://youtu.be/TnFM3msATio

grafik

And the same on a smartphone

https://youtu.be/ZCwVvynRmG4

I do wonder what's causing the audio issues.

@Hacker1245
Copy link

I do wonder what's causing the audio issues.

Maybe high frame times?

@Kuratius
Copy link

I do wonder what's causing the audio issues.

Maybe high frame times?

I'd like to think that audio playback isnt directly related to frametime because it could work on a separate thread than the render thread, but I guess technically it's possible that some part of it is cpu related and it just doesn't have any headroom on any core whatsover. Or there's some audio driver library that's being emulated when a native library would do better.

@rajdakin
Copy link
Collaborator

rajdakin commented Jul 8, 2024

@r3muxd If this issue is not fixed, please reopen it. However, since you have not answered since your first post and other people claim the issue is fixed, I'm closing this for now.

@rajdakin rajdakin closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests