# from folder: llama_cpp_canister/src
# To do the actual changes
meld main_.cpp llama_cpp_onicai_fork/examples/main/main.cpp
# To check what has changed between <git-sha-new> and <git-sha-old>
meld llama_cpp_onicai_fork/examples/main/main.cpp llama_cpp_onicai_fork_<git-sha-old>/examples/main/main.cpp
- use
main_
instead ofmain
- A few items related to console, ctrl+C & threading need to be outcommented
- Added logic for running in a canister with multiple update calls
# from folder: llama_cpp_canister/src
# To do the actual changes
meld llama_cpp_onicai_fork/src/llama.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama.cpp
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions. - outcomment threading related items
- outcomment these functions completely:
llama_tensor_quantize_internal
llama_model_quantize_internal
# from folder: llama_cpp_canister/src
meld llama_cpp_onicai_fork/src/llama-vocab.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama-vocab.cpp
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- no modifications needed for the IC
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- no modifications needed for the IC
- no modifications needed for the IC
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- replace
throw std::invalid_argument
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- no modifications needed for the IC
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- no modifications needed for the IC
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- outcomment all uses of
validation_result
:// ICPP-PATCH-START // we do not support check_tensors. It requires threading. // std::vector<std::future<std::pair<ggml_tensor *, bool>>> validation_result; // ICPP-PATCH-END ... several other references to validation_result
- outcomment all uses of
getenv
- no modifications needed for the IC
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- replace
throw std::invalid_argument
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions. - outcomment args that require
std::thread
- outcomment call to
ggml_backend_load_all();
We are not loading the dynamic backends, because it is calling dlopen which results in undefined symbols during linking. We can skip it, because we already registered the CPU backend as a compile flag. - outcomment all calls to std::getenv
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- replace
throw std::out_of_range
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- run this command to create it:
make build-info-cpp-wasm
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
-
add right below
#include llama.h
:// ICPP-PATCH-START #include "ic_api.h" extern llama_model ** g_model; // The global variable from main_.cpp // ICPP-PATCH-END
-
In common_init_result, skip loading the model if the --model parameter is not provided:
// ICPP-PATCH-START // Skip loading the model if the --model parameter is not provided if (!params.model.empty()) { // ICPP-PATCH-END ... model = ... ... // ICPP-PATCH-START // Skip loading the model if the --model parameter is not provided } else { // Access the model through g_model and assign it to the local variable model = *g_model; } // ICPP-PATCH-END
-
In common_init_result, do NOT transfer ownership of the model pointer:
// ICPP-PATCH-START: // 'reset' transfers ownership of the model pointer to the std::unique_ptr iparams.model // We do NOT want the model to be freed when the unique_ptr goes out of scope // iparams.model.reset(model); // ICPP-PATCH-END
-
replace
throw std::runtime_error
withIC_API::trap
-
replace
throw std::invalid_argument
withIC_API::trap
-
outcomment
try - catch
. The program will abrupt in case of thrown exceptions. -
outcomment
std::getenv
Compare to changes made last time (!) -
outcomment all code related to
<pthread.h>
: Compare to changes made last time (!)- cpu_get_num_physical_cores
-
outcomment #ifdef LLAMA_USE_CURL Compare to changes made last time (!)
-
outcomment
set_process_priority
function
-
Add function
common_log_remove_file
to the public API// ICPP-PATCH-START // We need to add a public function to remove the log file from the canister void common_log_remove_file(struct common_log * log) { log->remove_file(); } // ICPP-PATCH-END
-
Add public function
remove_file
to the struct common_log: -
Remove all threading logic #include #include
- outcomment all uses of
getenv
:// ICPP-PATCH-START // const char * GGML_SCHED_DEBUG = getenv("GGML_SCHED_DEBUG"); // sched->debug = GGML_SCHED_DEBUG ? atoi(GGML_SCHED_DEBUG) : 0; sched->debug = 0; // ICPP-PATCH-END
- outcomment all code related to threading
- Update dl_handle_deleter, to avoid a call to dlclose that should never happen
The linker ends up with undefined if we don't outcomment it
#include "ic_api.h" struct dl_handle_deleter { void operator()(void * handle) { // ICPP-PATCH-START // We are NOT dynamically loading any backend // SO WE SHOULD NEVER GET HERE // Avoid linker error by outcommenting this, but inserting a runtime trap // dlclose(handle); IC_API::trap("THIS SHOULD NEVER HAPPEN - dl_handle_deleter::operator() called"); // ICPP-PATCH-END } };
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- outcomment all code related to signals & threading:
#include "ggml-threading.h"
#include <signal.h>
No updates needed for icpp-pro
- outcomment all code related to signals & threading
#include "ggml-threading.h"
#include <signal.h>
No updates needed for icpp-pro
No updates needed for icpp-pro
No updates needed for icpp-pro
No updates needed for icpp-pro
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- add
#include "ic_api.h"
- replace
throw std::runtime_error
withIC_API::trap
- re-define two functions:
// ICPP-PATCH-START // throw not supported, using IC_API::trap instead, which expects a string // std::runtime_error unexpected(const TemplateToken & token) const { // return std::runtime_error("Unexpected " + TemplateToken::typeToString(token.type) // + error_location_suffix(*template_str, token.location.pos)); // } // std::runtime_error unterminated(const TemplateToken & token) const { // return std::runtime_error("Unterminated " + TemplateToken::typeToString(token.type) // + error_location_suffix(*template_str, token.location.pos)); // } std::string unexpected(const TemplateToken & token) const { return ("Unexpected " + TemplateToken::typeToString(token.type) + error_location_suffix(*template_str, token.location.pos)); } std::string unterminated(const TemplateToken & token) const { return ("Unterminated " + TemplateToken::typeToString(token.type) + error_location_suffix(*template_str, token.location.pos)); } // ICPP-PATCH-END
- replace
throw unterminated(**start)
withIC_API::trap(unterminated(**start))
- replace
throw unexpected(**(it-1))
withIC_API::trap(unexpected(**(it-1)))
- replace
throw unexpected(**(it))
withIC_API::trap(unexpected(**(it)))
- outcomment try-catch
- Modify these:
// ICPP-PATCH-START
// bool use_mmap = true; // use mmap for faster loads
bool use_mmap = false; // not in a canister...
// ICPP-PATCH-END
// ICPP-PATCH-START
// We do NOT load a default model into the canister
// #define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"
#define DEFAULT_MODEL_PATH ""
// ICPP-PATCH-END
- replace
throw std::runtime_error
withIC_API::trap
- outcomment
try - catch
. The program will abrupt in case of thrown exceptions.
- #define GGML_DEFAULT_N_THREADS 1
TODOs:
(-) LOG & LOG_TEE have been replaced by LOG, LOG_ERR, LOG_WRN, LOG_INF, LOG_CNT -> LOG is used just for Console/Stream Output -> LOG_xxx is used for ERR, WRN, INF, CNT --> Not sure yet where this goes...
Q4: Update the README about downloading different LOG files?
(-) llama-vocab.cpp --- This function is no longer there. Is tinystories still working?
We had added a check on `llama_token_bos(model)`, else the llama2.c models never stop generating:
```
bool llama_token_is_eog_impl(const struct llama_vocab & vocab, llama_token token) {
return token != -1 && (
token == llama_token_eos_impl(vocab) ||
token == llama_token_eot_impl(vocab) ||
token == llama_token_bos_impl(vocab) // ICPP-PATCH: the llama2.c model predicts bos without first predicting an eos
);
}
```
(-) TODO: Monitor memory, and make sure that ctx is freed up... See free_ctx() method that has been outcommented in main_.cpp
NOTES:
(-) main_.cpp includes a new file: llama_cpp_onicai_fork/common/chat-template.hpp
(-) All the LLM architectures supported by llama_cpp_canister are listed in
src/llama_cpp_onicai_fork/src/llama-arch.cpp
(-) NOTE: common/grammar-parser.cpp
is no longer there.
It appears to be fully included in src/llama-grammar.cpp
(-) NOTE: llama_cpp_onicai_fork/ggml/src/ggml-backend.cpp
used to be llama_cpp_onicai_fork/ggml/src/ggml-backend.c
(-) NOTE: llama_cpp_onicai_fork/ggml/src/ggml-aarch64.c
no longer exists
Previous update: No updates needed for icpp-pro
(-) NOTE: llama_cpp_onicai_fork/common/log.h
no update was needed this time:
Previous update:
- #include <thread>
- Some other threading code
(-) NOTE: llama_cpp_onicai_fork/common/common.h
no update was needed this time:
Previous update:
- #include <thread>