Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute #5007

Merged
merged 3 commits into from
Jan 18, 2024
Merged

Conversation

ptsochantaris
Copy link
Collaborator

There are two minor changes:

  • The autorelease pool in the graph compute method doesn't seem to do anything, as nothing is allocated outside of the dispatch block; the dispatch block in turn uses the private queue's own autorelease pool. Removing the autorelease block and running using Xcode's leaks profile shows no leaks and no memory growth. (Sanity check by more experienced contributors may be appropriate for this just in case?)
  • In the init code, a retained list of devices is copied out via Copy C method, and following the CF conventions this needs to be released in non-ARC code (and indeed the leak does show up in the leaks profiler). This PR changes it to:
    1. Release the list after it is used
    2. Only bother to do any of this if GGML_METAL_NDEBUG is not defined -- otherwise the GGML_METAL_LOG_INFO won't do anything in the first place.
    3. Removing the function-wide *s and *device references to avoid stale references into the devices list (which didn't break in the current code because devices was leaked, but would break now that it is properly released)

@ggerganov
Copy link
Owner

Thanks!

Btw, do you have any idea why the Swift build is failing after the recent changes:

/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:235:9: error: unknown type name 'ggml_backend_sched_eval_callback'
        ggml_backend_sched_eval_callback cb_eval;
        ^

I can't figure it out - the function prototype is declared in ggml-backend.h but llama.h does not see it for some reason. cc @jhen0409 for ideas

@ptsochantaris
Copy link
Collaborator Author

It's definitely not a syntactic issue, as that exact same file compiles fine in my project (https://github.com/ptsochantaris/emeltal/blob/dd3173e53781def9bb757fa06a2c8d7ab05f381b/Emeltal/llama-cpp/llama.h#L235) so I strongly suspect it may have to do with some compile-time defines. I'll take a look to see if I can figure out a way to "break" it in my build tree...

@ptsochantaris
Copy link
Collaborator Author

ptsochantaris commented Jan 17, 2024

I see the SPM package uses an spm-headers directory, which currently only contains a symlink to llama.h - I wonder if perhaps a symlink to ggml-backend.h may also be needed now, as I believe that #include line was only recently added. It may be silently failing during the compile?

@ptsochantaris
Copy link
Collaborator Author

I've added the symlinks in this branch to see if this improves things.

@ggerganov
Copy link
Owner

Even if it works, it's not desirable - I expect the ggml headers to come from the ggml SPM. Similar to how ggml.h comes from there.

@ptsochantaris
Copy link
Collaborator Author

Indeed, that makes sense - I'm not too familiar with the SPM package TBH and how it handles its dependencies. I can remove the symlinks from this PR if you'd prefer, since it's out of scope anyway. For what its worth, it does seem to be working, so for whatever reason the SPM build is probably missing those extra files - could it be the GGML SPM doesn't export those extra header files currently?

@ptsochantaris
Copy link
Collaborator Author

ptsochantaris commented Jan 17, 2024

(Actually, I've removed them anyway, since it's an unrelated change, to keep things tidy and in-scope for now, but it did prove that it's a question of missing include files, possibly in the ggml package)

@ggerganov
Copy link
Owner

I think these warnings are related:

/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-backend.h'

^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-alloc.h'

^

Need to understand what they mean

@ptsochantaris
Copy link
Collaborator Author

An umbrella header is usually a manually created file that contains all the include files in whatever Swift module is being created. The file is usually specified using another file called a "modulemap" which tells the importing tool to "just import framework.h and it will contain all the includes" - however since I believe the ggml package does not have a manual one specified, I suspect it is somehow autogenerated, and there my knowledge of SPM is a bit limited, since I haven't worked a lot with C/C++ integration via SPM. My gut instinct is that possibly creating a module map file in the ggml package may do the trick...

@ptsochantaris
Copy link
Collaborator Author

The section Importing Headers from a C++ Package Target here may be of interest: https://www.swift.org/documentation/cxx-interop/project-build-setup/

@ggerganov
Copy link
Owner

I just asked my local 8-bit Mixtral for some insight and I think it narrowed it down! 😄

make -j && ./main -m models/mixtral-instruct-8x7b-fast/ggml-model-q8_0.gguf -f build/task-swift-2.txt -n -1 -c 16384 -s 1 --temp 0 --repeat-penalty 1.0 --no-penalize-nl

[INST] I have the following ggml repo structure:

./include/ggml/ggml.h
./include/ggml/ggml-alloc.h
./include/ggml/ggml-backend.h
./src/ggml.c
./src/ggml-alloc.c
./src/ggml-backend.c

I've created the following Swift Package.swift file:

// swift-tools-version: 5.5

import PackageDescription

let package = Package(
    name: "ggml",
    platforms: [
        .macOS(.v12),
        .iOS(.v14),
        .watchOS(.v4),
        .tvOS(.v14)
    ],
    products: [
        .library(name: "ggml", targets: ["ggml"]),
    ],
    targets: [
        .target(
            name: "ggml",
            path: ".",
            exclude: [],
            sources: [
                "src/ggml.c",
                "src/ggml-alloc.c",
                "src/ggml-backend.c",
                "src/ggml-quants.c",
                "src/ggml-metal.m",
            ],
            resources: [
                .process("src/ggml-metal.metal")
            ],
            publicHeadersPath: "include/ggml",
            cSettings: [
                .unsafeFlags(["-Wno-shorten-64-to-32", "-O3", "-DNDEBUG"]),
                .define("GGML_USE_ACCELERATE"),
                .unsafeFlags(["-fno-objc-arc"]),
                .define("GGML_USE_METAL"),
                // NOTE: NEW_LAPACK will required iOS version 16.4+
                // We should consider add this in the future when we drop support for iOS 14
                // (ref: ref: https://developer.apple.com/documentation/accelerate/1513264-cblas_sgemm?language=objc)
                // .define("ACCELERATE_NEW_LAPACK"),
                // .define("ACCELERATE_LAPACK_ILP64")
            ],
            linkerSettings: [
                .linkedFramework("Accelerate")
            ]
        )
    ],
    cxxLanguageStandard: .cxx11
)

When I use the ggml package as a dependency in another project called "llama", I get the following errors:

/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/Build/Intermediates.noindex/batched_swift.build/Debug/batched_swift.build/Objects-normal/x86_64/batched_swift.swiftmodule -emit-abi-descriptor-path /Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/Build/Intermediates.noindex/batched_swift.build/Debug/batched_swift.build/Objects-normal/x86_64/batched_swift.abi.json
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-backend.h'

^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:10: note: while building module 'ggml' imported from /Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:4:
#include "ggml.h"
         ^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/build/SourcePackages/checkouts/ggml/include/ggml/ggml.h:2279:1: warning: umbrella header for module 'ggml' does not include header 'ggml-alloc.h'

^
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h"
        ^
/Users/runner/work/llama.cpp/llama.cpp/spm-headers/llama.h:235:9: error: unknown type name 'ggml_backend_sched_eval_callback'
        ggml_backend_sched_eval_callback cb_eval;
        ^
/Users/runner/work/llama.cpp/llama.cpp/examples/batched.swift/Sources/main.swift:2:8: error: could not build Objective-C module 'llama'
import llama

What do these errors mean and how do I fix them? [/INST] The errors you're encountering are related to the way Swift Package Manager (SPM) handles C and Objective-C headers. Here's a breakdown of the errors and some suggestions to resolve them:

  1. umbrella header for module 'ggml' does not include header 'ggml-backend.h' and umbrella header for module 'ggml' does not include header 'ggml-alloc.h':

These warnings indicate that the umbrella header for the 'ggml' module (ggml.h) does not directly include 'ggml-backend.h' and 'ggml-alloc.h'. SPM might not be able to find the necessary headers to build the module correctly.

To fix this, you can modify your Swift Package Manager configuration to include these headers in the umbrella header (ggml.h) or create a module.modulemap file to explicitly list the required headers.

Create a module.modulemap file in the include/ggml directory with the following content:

module ggml [system] {
  header "/include/ggml/ggml.h"
  export *
  umbrella "/include/ggml"
}

Then, update your Swift Package Manager configuration to include the module.modulemap file:

// swift-tools-version: 5.5

import PackageDescription

let package = Package(
    // ...
    targets: [
        .target(
            name: "ggml",
            path: ".",
            exclude: [],
            sources: [
                "src/ggml.c",
                "src/ggml-alloc.c",
                "src/ggml-backend.c",
                "src/ggml-quants.c",
                "src/ggml-metal.m",
            ],
            resources: [
                .process("src/ggml-metal.metal")
            ],
            publicHeadersPath: "include/ggml",
            cSettings: [
                // ...
            ],
            linkerSettings: [
                // ...
            ],
            moduleMap: "include/ggml/module.modulemap" // Add this line
        )
    ],
    // ...
)
  1. error: unknown type name 'ggml_backend_sched_eval_callback':

This error indicates that the type 'ggml_backend_sched_eval_callback' is not recognized. It's possible that the necessary header file containing the type definition is not being included properly.

Since you've already addressed the header inclusion issue with the module.modulemap file, you should also ensure that the type definition for 'ggml_backend_sched_eval_callback' is available in one of the included headers. If not, you may need to update your C code to include the appropriate header file containing the type definition.

  1. error: could not build Objective-C module 'llama':

This error is likely a result of the previous issues. Once you've resolved the warnings and errors related to the 'ggml' module, this error should be resolved as well.

If the error persists, ensure that your 'llama' module is correctly configured and that all required headers are properly imported. [end of text]

llama_print_timings: load time = 55003.60 ms
llama_print_timings: sample time = 18.14 ms / 768 runs ( 0.02 ms per token, 42328.04 tokens per second)
llama_print_timings: prompt eval time = 5508.99 ms / 1499 tokens ( 3.68 ms per token, 272.10 tokens per second)
llama_print_timings: eval time = 23249.99 ms / 767 runs ( 30.31 ms per token, 32.99 tokens per second)
llama_print_timings: total time = 28859.46 ms / 2266 tokens
ggml_metal_free: deallocating

Thanks @ptsochantaris for the assistance - will resolve this stuff tomorrow and merge the PR

@ptsochantaris
Copy link
Collaborator Author

I suspect what needs to happen is something along these lines:

  • Create an spm-headers dir like in the llama.cpp package
  • Symlink ggml-backend.h, ggml-alloc.h inside it.
  • Symlink ggml.h as ggml-cpp.h or something similar (because you need an actual ggml.h as the umbrella header, which needs to be the same as the module name, which is ggml)
  • Create a ggml.h header that imports the above 3 files, in the same directory.
  • Add the publicHeadersPath: "spm-headers", to the Package.swift file

@ggerganov
Copy link
Owner

I've settled for this fix:

ggerganov/ggml@c287543

@ggerganov ggerganov merged commit 1e605f4 into ggerganov:master Jan 18, 2024
40 of 44 checks passed
@ptsochantaris ptsochantaris deleted the minor-memory-leak branch January 18, 2024 14:13
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
…ov#5007)

* Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute

* SPM header potential fix

* Reverting symlinks
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…ov#5007)

* Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute

* SPM header potential fix

* Reverting symlinks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants