DETAILS FOR UPGRADE from llama.cpp sha `b841d0` to `615212`

cpp_paths

main_.cpp

# from folder: llama_cpp_canister/src

# To do the actual changes
meld main_.cpp llama_cpp_onicai_fork/examples/main/main.cpp

# To check what has changed between <git-sha-new> and <git-sha-old>
meld llama_cpp_onicai_fork/examples/main/main.cpp llama_cpp_onicai_fork_<git-sha-old>/examples/main/main.cpp

use main_ instead of main
A few items related to console, ctrl+C & threading need to be outcommented
Added logic for running in a canister with multiple update calls

llama_cpp_onicai_fork/src/llama.cpp

# from folder: llama_cpp_canister/src
# To do the actual changes
meld llama_cpp_onicai_fork/src/llama.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.
outcomment threading related items
outcomment these functions completely:
- llama_tensor_quantize_internal
- llama_model_quantize_internal

llama_cpp_onicai_fork/src/llama-vocab.cpp

# from folder: llama_cpp_canister/src
meld llama_cpp_onicai_fork/src/llama-vocab.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama-vocab.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-grammar.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-sampling.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/src/llama-impl.cpp

no modifications needed for the IC

src/llama_cpp_onicai_fork/src/llama-context.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap

src/llama_cpp_onicai_fork/src/llama-arch.cpp

no modifications needed for the IC

llama_cpp_onicai_fork/src/unicode-data.cpp

no modifications needed for the IC

llama_cpp_onicai_fork/src/unicode.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
replace throw std::invalid_argument with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-kv-cache.cpp

no modifications needed for the IC

llama_cpp_onicai_fork/src/llama-chat.cpp

outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-mmap.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/src/llama-model.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-batch.cpp

no modifications needed for the IC

llama_cpp_onicai_fork/src/llama-adapter.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-model-loader.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap

outcomment all uses of validation_result:

  // ICPP-PATCH-START
  // we do not support check_tensors. It requires threading.
  // std::vector<std::future<std::pair<ggml_tensor *, bool>>> validation_result;
  // ICPP-PATCH-END
  ... several other references to validation_result

outcomment all uses of getenv

llama_cpp_onicai_fork/src/llama-hparams.cpp

no modifications needed for the IC

llama_cpp_onicai_fork/common/arg.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
replace throw std::invalid_argument with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.
outcomment args that require std::thread
outcomment call to ggml_backend_load_all(); We are not loading the dynamic backends, because it is calling dlopen which results in undefined symbols during linking. We can skip it, because we already registered the CPU backend as a compile flag.
outcomment all calls to std::getenv

llama_cpp_onicai_fork/common/json-schema-to-grammar.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap
replace throw std::out_of_range with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/common/build-info.cpp

run this command to create it:

make build-info-cpp-wasm

llama_cpp_onicai_fork/common/sampling.cpp

add #include "ic_api.h"
replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/common/common.cpp

add right below #include llama.h:

  // ICPP-PATCH-START
  #include "ic_api.h"
  extern llama_model ** g_model; // The global variable from main_.cpp
  // ICPP-PATCH-END

In common_init_result, skip loading the model if the --model parameter is not provided:

  // ICPP-PATCH-START
  // Skip loading the model if the --model parameter is not provided
  if (!params.model.empty()) {
  // ICPP-PATCH-END

  ... 
  model = ...
  ...

  // ICPP-PATCH-START
  // Skip loading the model if the --model parameter is not provided
  } else {
      // Access the model through g_model and assign it to the local variable
      model = *g_model;
  }
  // ICPP-PATCH-END

In common_init_result, do NOT transfer ownership of the model pointer:

  // ICPP-PATCH-START: 
  // 'reset' transfers ownership of the model pointer to the std::unique_ptr iparams.model
  // We do NOT want the model to be freed when the unique_ptr goes out of scope
  // iparams.model.reset(model);
  // ICPP-PATCH-END

replace throw std::runtime_error with IC_API::trap
replace throw std::invalid_argument with IC_API::trap
outcomment try - catch. The program will abrupt in case of thrown exceptions.
outcomment std::getenv Compare to changes made last time (!)
outcomment all code related to <pthread.h>: Compare to changes made last time (!)
- cpu_get_num_physical_cores
outcomment #ifdef LLAMA_USE_CURL Compare to changes made last time (!)
outcomment set_process_priority function

llama_cpp_onicai_fork/common/log.cpp

Add function common_log_remove_file to the public API

// ICPP-PATCH-START
// We need to add a public function to remove the log file from the canister
void common_log_remove_file(struct common_log * log) {
    log->remove_file();
}
// ICPP-PATCH-END

Add public function remove_file to the struct common_log:
Remove all threading logic #include #include

llama_cpp_onicai_fork/ggml/src/ggml-backend.cpp

outcomment all uses of getenv:

  // ICPP-PATCH-START
  // const char * GGML_SCHED_DEBUG = getenv("GGML_SCHED_DEBUG");
  // sched->debug = GGML_SCHED_DEBUG ? atoi(GGML_SCHED_DEBUG) : 0;
  sched->debug = 0;
  // ICPP-PATCH-END

llama_cpp_onicai_fork/ggml/src/ggml-threading.cpp

outcomment all code related to threading

llama_cpp_onicai_fork/ggml/src/ggml-backend-reg.cpp

Update dl_handle_deleter, to avoid a call to dlclose that should never happen The linker ends up with undefined if we don't outcomment it

#include "ic_api.h"
struct dl_handle_deleter {
  void operator()(void * handle) {
      // ICPP-PATCH-START
      // We are NOT dynamically loading any backend
      // SO WE SHOULD NEVER GET HERE
      // Avoid linker error by outcommenting this, but inserting a runtime trap
      // dlclose(handle);
      IC_API::trap("THIS SHOULD NEVER HAPPEN - dl_handle_deleter::operator() called");
      // ICPP-PATCH-END
    }
};

llama_cpp_onicai_fork/ggml/src/gguf.cpp

outcomment try - catch. The program will abrupt in case of thrown exceptions.