Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

ptsochantaris · 2024-01-17T20:47:36Z

There are two minor changes:

The autorelease pool in the graph compute method doesn't seem to do anything, as nothing is allocated outside of the dispatch block; the dispatch block in turn uses the private queue's own autorelease pool. Removing the autorelease block and running using Xcode's leaks profile shows no leaks and no memory growth. (Sanity check by more experienced contributors may be appropriate for this just in case?)
In the init code, a retained list of devices is copied out via Copy C method, and following the CF conventions this needs to be released in non-ARC code (and indeed the leak does show up in the leaks profiler). This PR changes it to:
1. Release the list after it is used
2. Only bother to do any of this if GGML_METAL_NDEBUG is not defined -- otherwise the GGML_METAL_LOG_INFO won't do anything in the first place.
3. Removing the function-wide *s and *device references to avoid stale references into the devices list (which didn't break in the current code because devices was leaked, but would break now that it is properly released)

… autorelease pool in graph compute

ggerganov · 2024-01-17T21:10:31Z

Thanks!

Btw, do you have any idea why the Swift build is failing after the recent changes:

/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:235:9: error: unknown type name 'ggml_backend_sched_eval_callback'
        ggml_backend_sched_eval_callback cb_eval;
        ^

I can't figure it out - the function prototype is declared in ggml-backend.h but llama.h does not see it for some reason. cc @jhen0409 for ideas

ptsochantaris · 2024-01-17T21:29:05Z

It's definitely not a syntactic issue, as that exact same file compiles fine in my project (https://github.com/ptsochantaris/emeltal/blob/dd3173e53781def9bb757fa06a2c8d7ab05f381b/Emeltal/llama-cpp/llama.h#L235) so I strongly suspect it may have to do with some compile-time defines. I'll take a look to see if I can figure out a way to "break" it in my build tree...

ptsochantaris · 2024-01-17T21:36:18Z

I see the SPM package uses an spm-headers directory, which currently only contains a symlink to llama.h - I wonder if perhaps a symlink to ggml-backend.h may also be needed now, as I believe that #include line was only recently added. It may be silently failing during the compile?

ptsochantaris · 2024-01-17T21:40:58Z

I've added the symlinks in this branch to see if this improves things.

ggerganov · 2024-01-17T21:42:40Z

Even if it works, it's not desirable - I expect the ggml headers to come from the ggml SPM. Similar to how ggml.h comes from there.

ptsochantaris · 2024-01-17T21:47:41Z

Indeed, that makes sense - I'm not too familiar with the SPM package TBH and how it handles its dependencies. I can remove the symlinks from this PR if you'd prefer, since it's out of scope anyway. For what its worth, it does seem to be working, so for whatever reason the SPM build is probably missing those extra files - could it be the GGML SPM doesn't export those extra header files currently?

ptsochantaris · 2024-01-17T21:53:28Z

(Actually, I've removed them anyway, since it's an unrelated change, to keep things tidy and in-scope for now, but it did prove that it's a question of missing include files, possibly in the ggml package)

ggerganov · 2024-01-17T22:01:55Z

I think these warnings are related:

/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-backend.h'

^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-alloc.h'

^

Need to understand what they mean

ptsochantaris · 2024-01-17T22:09:54Z

An umbrella header is usually a manually created file that contains all the include files in whatever Swift module is being created. The file is usually specified using another file called a "modulemap" which tells the importing tool to "just import framework.h and it will contain all the includes" - however since I believe the ggml package does not have a manual one specified, I suspect it is somehow autogenerated, and there my knowledge of SPM is a bit limited, since I haven't worked a lot with C/C++ integration via SPM. My gut instinct is that possibly creating a module map file in the ggml package may do the trick...

ptsochantaris · 2024-01-17T22:12:21Z

The section Importing Headers from a C++ Package Target here may be of interest: https://www.swift.org/documentation/cxx-interop/project-build-setup/

ggerganov · 2024-01-17T22:20:24Z

I just asked my local 8-bit Mixtral for some insight and I think it narrowed it down! 😄

make -j && ./main -m models/mixtral-instruct-8x7b-fast/ggml-model-q8_0.gguf -f build/task-swift-2.txt -n -1 -c 16384 -s 1 --temp 0 --repeat-penalty 1.0 --no-penalize-nl

[INST] I have the following ggml repo structure:

./include/ggml/ggml.h
./include/ggml/ggml-alloc.h
./include/ggml/ggml-backend.h
./src/ggml.c
./src/ggml-alloc.c
./src/ggml-backend.c

I've created the following Swift Package.swift file:

// swift-tools-version: 5.5

import PackageDescription

let package = Package(
    name: "ggml",
    platforms: [
        .macOS(.v12),
        .iOS(.v14),
        .watchOS(.v4),
        .tvOS(.v14)
    ],
    products: [
        .library(name: "ggml", targets: ["ggml"]),
    ],
    targets: [
        .target(
            name: "ggml",
            path: ".",
            exclude: [],
            sources: [
                "src/ggml.c",
                "src/ggml-alloc.c",
                "src/ggml-backend.c",
                "src/ggml-quants.c",
                "src/ggml-metal.m",
            ],
            resources: [
                .process("src/ggml-metal.metal")
            ],
            publicHeadersPath: "include/ggml",
            cSettings: [
                .unsafeFlags(["-Wno-shorten-64-to-32", "-O3", "-DNDEBUG"]),
                .define("GGML_USE_ACCELERATE"),
                .unsafeFlags(["-fno-objc-arc"]),
                .define("GGML_USE_METAL"),
                // NOTE: NEW_LAPACK will required iOS version 16.4+
                // We should consider add this in the future when we drop support for iOS 14
                // (ref: ref: https://developer.apple.com/documentation/accelerate/1513264-cblas_sgemm?language=objc)
                // .define("ACCELERATE_NEW_LAPACK"),
                // .define("ACCELERATE_LAPACK_ILP64")
            ],
            linkerSettings: [
                .linkedFramework("Accelerate")
            ]
        )
    ],
    cxxLanguageStandard: .cxx11
)

When I use the ggml package as a dependency in another project called "llama", I get the following errors:

/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/Build/Intermediates.noindex/batched_swift.build/Debug/batched_swift.build/Objects-normal/x86_64/batched_swift.swiftmodule -emit-abi-descriptor-path /Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/Build/Intermediates.noindex/batched_swift.build/Debug/batched_swift.build/Objects-normal/x86_64/batched_swift.abi.json
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-backend.h'

^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-alloc.h'

^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:235:9: error: unknown type name 'ggml_backend_sched_eval_callback'
        ggml_backend_sched_eval_callback cb_eval;
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/Sources/main.swift:2:8: error: could not build Objective-C module 'llama'
import llama

What do these errors mean and how do I fix them? [/INST] The errors you're encountering are related to the way Swift Package Manager (SPM) handles C and Objective-C headers. Here's a breakdown of the errors and some suggestions to resolve them:

umbrella header for module 'ggml' does not include header 'ggml-backend.h' and umbrella header for module 'ggml' does not include header 'ggml-alloc.h':

These warnings indicate that the umbrella header for the 'ggml' module (ggml.h) does not directly include 'ggml-backend.h' and 'ggml-alloc.h'. SPM might not be able to find the necessary headers to build the module correctly.

To fix this, you can modify your Swift Package Manager configuration to include these headers in the umbrella header (ggml.h) or create a module.modulemap file to explicitly list the required headers.

Create a module.modulemap file in the include/ggml directory with the following content:

module ggml [system] {
  header "/include/ggml/ggml.h"
  export *
  umbrella "/include/ggml"
}

Then, update your Swift Package Manager configuration to include the module.modulemap file:

// swift-tools-version: 5.5

import PackageDescription

let package = Package(
    // ...
    targets: [
        .target(
            name: "ggml",
            path: ".",
            exclude: [],
            sources: [
                "src/ggml.c",
                "src/ggml-alloc.c",
                "src/ggml-backend.c",
                "src/ggml-quants.c",
                "src/ggml-metal.m",
            ],
            resources: [
                .process("src/ggml-metal.metal")
            ],
            publicHeadersPath: "include/ggml",
            cSettings: [
                // ...
            ],
            linkerSettings: [
                // ...
            ],
            moduleMap: "include/ggml/module.modulemap" // Add this line
        )
    ],
    // ...
)

error: unknown type name 'ggml_backend_sched_eval_callback':

This error indicates that the type 'ggml_backend_sched_eval_callback' is not recognized. It's possible that the necessary header file containing the type definition is not being included properly.

Since you've already addressed the header inclusion issue with the module.modulemap file, you should also ensure that the type definition for 'ggml_backend_sched_eval_callback' is available in one of the included headers. If not, you may need to update your C code to include the appropriate header file containing the type definition.

error: could not build Objective-C module 'llama':

This error is likely a result of the previous issues. Once you've resolved the warnings and errors related to the 'ggml' module, this error should be resolved as well.

If the error persists, ensure that your 'llama' module is correctly configured and that all required headers are properly imported. [end of text]

llama_print_timings: load time = 55003.60 ms
llama_print_timings: sample time = 18.14 ms / 768 runs ( 0.02 ms per token, 42328.04 tokens per second)
llama_print_timings: prompt eval time = 5508.99 ms / 1499 tokens ( 3.68 ms per token, 272.10 tokens per second)
llama_print_timings: eval time = 23249.99 ms / 767 runs ( 30.31 ms per token, 32.99 tokens per second)
llama_print_timings: total time = 28859.46 ms / 2266 tokens
ggml_metal_free: deallocating

Thanks @ptsochantaris for the assistance - will resolve this stuff tomorrow and merge the PR

ptsochantaris · 2024-01-17T22:21:52Z

I suspect what needs to happen is something along these lines:

Create an spm-headers dir like in the llama.cpp package
Symlink ggml-backend.h, ggml-alloc.h inside it.
Symlink ggml.h as ggml-cpp.h or something similar (because you need an actual ggml.h as the umbrella header, which needs to be the same as the module name, which is ggml)
Create a ggml.h header that imports the above 3 files, in the same directory.
Add the publicHeadersPath: "spm-headers", to the Package.swift file

ggerganov · 2024-01-18T08:46:25Z

I've settled for this fix:

ggerganov/ggml@c287543

…ov#5007) * Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute * SPM header potential fix * Reverting symlinks

Metal memory: Small memory leak on init, dangling pointer, and unused…

f470042

… autorelease pool in graph compute

SPM header potential fix

130eac8

Reverting symlinks

1c98f99

ggerganov approved these changes Jan 18, 2024

View reviewed changes

ggerganov merged commit 1e605f4 into ggerganov:master Jan 18, 2024
40 of 44 checks passed

ptsochantaris deleted the minor-memory-leak branch January 18, 2024 14:13

ggerganov mentioned this pull request Jan 19, 2024

Resolve ggml_backend_sched_eval_callback visibility (#5029) #5030

Closed

irbull mentioned this pull request Feb 10, 2024

Use @autoreleasepool to avoid memory leaks #5437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024 •

edited

Loading

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024 •

edited

Loading

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 18, 2024

Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

Conversation

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024 • edited Loading

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024 • edited Loading

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 17, 2024

ptsochantaris commented Jan 17, 2024

ggerganov commented Jan 18, 2024

ptsochantaris commented Jan 17, 2024 •

edited

Loading

ptsochantaris commented Jan 17, 2024 •

edited

Loading