Skip to content

Performance Profiling with Tracy

Zach Toogood edited this page Aug 22, 2022 · 13 revisions

Tracy

https://github.com/wolfpld/tracy

A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.

Intro

We can't know how good/bad our performance is until we measure it.

Tracy is made up of two parts:

  • The client - which you build into your program and will broadcast your performance information.
  • The server - an external program (available in the Tracy Release) which will receive the information and allow you to analyze it.

Setup

If building on the command line, add -DTRACY_ENABLE=ON to your configuration arguments. It will download the Tracy client and server, and build the Tracy client into xi_map for you.

If building from Visual Studio, select one of the -Tracy build configurations and build as normal.

image

1> Working directory: C:\ffxi\server\build\x64-Release-Tracy
1> [CMake] -- C:/ProgramData/chocolatey/bin/ccache.exe found and enabled
1> [CMake] -- CMAKE_SOURCE_DIR: C:/ffxi/server
1> [CMake] -- CMAKE_SIZEOF_VOID_P == 8: 64-bit build
1> [CMake] -- ENABLE_FAST_MATH: ON
1> [CMake] -- TRACY_ENABLE: ON
1> [CMake] -- Downloading Tracy development library
1> [CMake] x tracy-0.8.2/
1> [CMake] x tracy-0.8.2/.github/
1> [CMake] x tracy-0.8.2/.github/FUNDING.yml
1> [CMake] x tracy-0.8.2/.github/sponsor.png
1> [CMake] x tracy-0.8.2/.github/workflows/
... 
1> [CMake] -- Downloading Tracy client
1> [CMake] x capture.exe
1> [CMake] x csvexport.exe
1> [CMake] x import-chrome.exe
1> [CMake] x Tracy.exe
1> [CMake] x update.exe
1> [CMake] -- Modifying C:/ffxi/server/ext/tracy/tracy-0.8.2/client/TracyProfiler.hpp
...
1> [CMake] -- Configuring done
1> [CMake] -- Generating done
1> [CMake] -- Build files have been written to: C:/ffxi/server/build/x64-Release-Tracy

Tracy.exe will be placed in your repo root.

Usage

Run your Tracy-enabled xi_map.exe and then launch Tracy.exe. You will see it connect and start profiling. You can launch Tracy.exe before or after xi_map.exe, it isn't important.

It is usually better to wait until startup has completed before you attach Tracy, as the startup routine isn't a good indicator of the server's runtime performance.

Once connected, you should see something like this:

image

If you want to record a trace for later use you can click on the Wifi symbol and you'll be given the option to save the current trace.

image

WARNING Traces can be very large! Plan accordingly!

Usage (Headless)

If you need to capture a trace without launching the GUI (on a remote VM, a resource constrained system, etc.), Tracy comes with capture.exe.

You can capture a trace using a command line utility contained in the capture directory. To use it you may
provide the following parameters:

• -o output.tracy – the file name of the resulting trace (required).
• -a address – specifies the IP address (or a domain name) of the client application (uses localhost if
not provided).
• -p port – network port which should be used (optional).
• -f – force overwrite, if output file already exists.
• -s seconds – number of seconds to capture before automatically disconnecting (optional).

If no client is running at the given address, the server will wait until it can make a connection. During the
capture, the utility will display the following information:

You can launch it from the command line:

PS C:\ffxi\server> .\capture.exe -o trace.tracy -f -s 60
Connecting to 127.0.0.1:8086...
Queue delay: 0 ns
Timer resolution: 100 ns
   1.32 Kbps /138.5% =   0.00 Mbps | Tx: 41.34 MB | 330.28 MB | 1:32.9
Frames: 26
Time span: 1:32.9
Zones: 941,349
Elapsed time: 1:00.1
Saving trace... done!
Trace size 40.59 MB (24.26% ratio)
PS C:\ffxi\server> 

You can open the resulting Trace in the Tracy.exe GUI at a later time.

Finding Problems

Searchable statistics are in the Statistics header, log messages are in Messages. You can click and drag and zoom around the main timeline window for information about whats going on. You can "re-attach" to the most active frames by clicking on the Pause/Resume header and using the options there.

image

image

If you click on the entries in the Statistics menu, you can drill down into that function and look at it in more detail.

image

Gotchas

Remember that there are a lot of things that can affect performance.

  • Platform (Windows, Linux, OSX)
  • Architecture (x86, x86_64)
  • Type of build (Debug, RelWithDebugInfo, Release, MinSizeRel)
  • Compiler (MSVC, Clang, GCC)
  • Your system specs (CPU Speed, Available Memory, Memory Latency, HDD R/W speed etc.)
  • Other programs using your system's resources
  • Virtualization/Containerization (VMWare, WSL, Docker)

If you're performing before/after testing, try as hard as you can to make sure the conditions are the same for both runs and change as little as possible for each change. It is also helpful to take multiple readings and many samples per reading to try and get an accurate view of performance.

Known bottlenecks

  • Expensive pathing and navmesh access... all the time... every tick... every mob... everywhere...
  • parse routine is slow
Clone this wiki locally