Skip to content

Latest commit

 

History

History
376 lines (306 loc) · 13.2 KB

README-0002-615212.md

File metadata and controls

376 lines (306 loc) · 13.2 KB

DETAILS FOR UPGRADE from llama.cpp sha b841d0 to 615212

cpp_paths

main_.cpp

# from folder: llama_cpp_canister/src

# To do the actual changes
meld main_.cpp llama_cpp_onicai_fork/examples/main/main.cpp

# To check what has changed between <git-sha-new> and <git-sha-old>
meld llama_cpp_onicai_fork/examples/main/main.cpp llama_cpp_onicai_fork_<git-sha-old>/examples/main/main.cpp
  • use main_ instead of main
  • A few items related to console, ctrl+C & threading need to be outcommented
  • Added logic for running in a canister with multiple update calls

llama_cpp_onicai_fork/src/llama.cpp

# from folder: llama_cpp_canister/src
# To do the actual changes
meld llama_cpp_onicai_fork/src/llama.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama.cpp
  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.
  • outcomment threading related items
  • outcomment these functions completely:
    • llama_tensor_quantize_internal
    • llama_model_quantize_internal

llama_cpp_onicai_fork/src/llama-vocab.cpp

# from folder: llama_cpp_canister/src
meld llama_cpp_onicai_fork/src/llama-vocab.cpp llama_cpp_onicai_fork_<git-sha-old>/src/llama-vocab.cpp
  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-grammar.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-sampling.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/src/llama-impl.cpp

  • no modifications needed for the IC

src/llama_cpp_onicai_fork/src/llama-context.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap

src/llama_cpp_onicai_fork/src/llama-arch.cpp

  • no modifications needed for the IC

llama_cpp_onicai_fork/src/unicode-data.cpp

  • no modifications needed for the IC

llama_cpp_onicai_fork/src/unicode.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • replace throw std::invalid_argument with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-kv-cache.cpp

  • no modifications needed for the IC

llama_cpp_onicai_fork/src/llama-chat.cpp

  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-mmap.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/src/llama-model.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-batch.cpp

  • no modifications needed for the IC

llama_cpp_onicai_fork/src/llama-adapter.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/src/llama-model-loader.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • outcomment all uses of validation_result:
      // ICPP-PATCH-START
      // we do not support check_tensors. It requires threading.
      // std::vector<std::future<std::pair<ggml_tensor *, bool>>> validation_result;
      // ICPP-PATCH-END
      ... several other references to validation_result
  • outcomment all uses of getenv

llama_cpp_onicai_fork/src/llama-hparams.cpp

  • no modifications needed for the IC

llama_cpp_onicai_fork/common/arg.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • replace throw std::invalid_argument with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.
  • outcomment args that require std::thread
  • outcomment call to ggml_backend_load_all(); We are not loading the dynamic backends, because it is calling dlopen which results in undefined symbols during linking. We can skip it, because we already registered the CPU backend as a compile flag.
  • outcomment all calls to std::getenv

llama_cpp_onicai_fork/common/json-schema-to-grammar.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • replace throw std::out_of_range with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/common/build-info.cpp

  • run this command to create it:
make build-info-cpp-wasm

llama_cpp_onicai_fork/common/sampling.cpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/common/common.cpp

  • add right below #include llama.h:

      // ICPP-PATCH-START
      #include "ic_api.h"
      extern llama_model ** g_model; // The global variable from main_.cpp
      // ICPP-PATCH-END
  • In common_init_result, skip loading the model if the --model parameter is not provided:

      // ICPP-PATCH-START
      // Skip loading the model if the --model parameter is not provided
      if (!params.model.empty()) {
      // ICPP-PATCH-END
    
      ... 
      model = ...
      ...
    
      // ICPP-PATCH-START
      // Skip loading the model if the --model parameter is not provided
      } else {
          // Access the model through g_model and assign it to the local variable
          model = *g_model;
      }
      // ICPP-PATCH-END
  • In common_init_result, do NOT transfer ownership of the model pointer:

      // ICPP-PATCH-START: 
      // 'reset' transfers ownership of the model pointer to the std::unique_ptr iparams.model
      // We do NOT want the model to be freed when the unique_ptr goes out of scope
      // iparams.model.reset(model);
      // ICPP-PATCH-END
  • replace throw std::runtime_error with IC_API::trap

  • replace throw std::invalid_argument with IC_API::trap

  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

  • outcomment std::getenv Compare to changes made last time (!)

  • outcomment all code related to <pthread.h>: Compare to changes made last time (!)

    • cpu_get_num_physical_cores
  • outcomment #ifdef LLAMA_USE_CURL Compare to changes made last time (!)

  • outcomment set_process_priority function

llama_cpp_onicai_fork/common/log.cpp

  • Add function common_log_remove_file to the public API

    // ICPP-PATCH-START
    // We need to add a public function to remove the log file from the canister
    void common_log_remove_file(struct common_log * log) {
        log->remove_file();
    }
    // ICPP-PATCH-END
  • Add public function remove_file to the struct common_log:

  • Remove all threading logic #include #include

llama_cpp_onicai_fork/ggml/src/ggml-backend.cpp

  • outcomment all uses of getenv:
      // ICPP-PATCH-START
      // const char * GGML_SCHED_DEBUG = getenv("GGML_SCHED_DEBUG");
      // sched->debug = GGML_SCHED_DEBUG ? atoi(GGML_SCHED_DEBUG) : 0;
      sched->debug = 0;
      // ICPP-PATCH-END

llama_cpp_onicai_fork/ggml/src/ggml-threading.cpp

  • outcomment all code related to threading

llama_cpp_onicai_fork/ggml/src/ggml-backend-reg.cpp

  • Update dl_handle_deleter, to avoid a call to dlclose that should never happen The linker ends up with undefined if we don't outcomment it
    #include "ic_api.h"
    struct dl_handle_deleter {
      void operator()(void * handle) {
          // ICPP-PATCH-START
          // We are NOT dynamically loading any backend
          // SO WE SHOULD NEVER GET HERE
          // Avoid linker error by outcommenting this, but inserting a runtime trap
          // dlclose(handle);
          IC_API::trap("THIS SHOULD NEVER HAPPEN - dl_handle_deleter::operator() called");
          // ICPP-PATCH-END
        }
    };

llama_cpp_onicai_fork/ggml/src/gguf.cpp

  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/ggml/src/ggml-cpu/ggml-cpu.cpp

  • outcomment all code related to signals & threading:
    • #include "ggml-threading.h"
    • #include <signal.h>

llama_cpp_onicai_fork/ggml/src/ggml-cpu/ggml-cpu-traits.cpp

No updates needed for icpp-pro


c_paths

llama_cpp_onicai_fork/ggml/src/ggml.c

  • outcomment all code related to signals & threading
    • #include "ggml-threading.h"
    • #include <signal.h>

llama_cpp_onicai_fork/ggml/src/ggml-alloc.c

No updates needed for icpp-pro

llama_cpp_onicai_fork/ggml/src/ggml-quants.c

No updates needed for icpp-pro

llama_cpp_onicai_fork/ggml/src/ggml-cpu/ggml-cpu.c

No updates needed for icpp-pro

llama_cpp_onicai_fork/ggml/src/ggml-cpu/ggml-cpu-quants.c

No updates needed for icpp-pro


headers to modify

llama_cpp_onicai_fork/src/llama-model-loader.h

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap

llama_cpp_onicai_fork/src/minja.hpp

  • add #include "ic_api.h"
  • replace throw std::runtime_error with IC_API::trap
  • re-define two functions:
      // ICPP-PATCH-START
      // throw not supported, using IC_API::trap instead, which expects a string
      // std::runtime_error unexpected(const TemplateToken & token) const {
      //   return std::runtime_error("Unexpected " + TemplateToken::typeToString(token.type)
      //     + error_location_suffix(*template_str, token.location.pos));
      // }
      // std::runtime_error unterminated(const TemplateToken & token) const {
      //   return std::runtime_error("Unterminated " + TemplateToken::typeToString(token.type)
      //     + error_location_suffix(*template_str, token.location.pos));
      // }
      std::string unexpected(const TemplateToken & token) const {
        return ("Unexpected " + TemplateToken::typeToString(token.type)
          + error_location_suffix(*template_str, token.location.pos));
      }
      std::string unterminated(const TemplateToken & token) const {
        return ("Unterminated " + TemplateToken::typeToString(token.type)
          + error_location_suffix(*template_str, token.location.pos));
      }
      // ICPP-PATCH-END
  • replace throw unterminated(**start) with IC_API::trap(unterminated(**start))
  • replace throw unexpected(**(it-1)) with IC_API::trap(unexpected(**(it-1)))
  • replace throw unexpected(**(it)) with IC_API::trap(unexpected(**(it)))
  • outcomment try-catch

llama_cpp_onicai_fork/common/common.h

  • Modify these:
    // ICPP-PATCH-START
    // bool use_mmap          = true;  // use mmap for faster loads
    bool use_mmap          = false;  // not in a canister...
    // ICPP-PATCH-END

    // ICPP-PATCH-START
    // We do NOT load a default model into the canister
    // #define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"
    #define DEFAULT_MODEL_PATH ""
    // ICPP-PATCH-END

llama_cpp_onicai_fork/common/chat-template.hpp

  • replace throw std::runtime_error with IC_API::trap
  • outcomment try - catch. The program will abrupt in case of thrown exceptions.

llama_cpp_onicai_fork/ggml/include/ggml.h

  • #define GGML_DEFAULT_N_THREADS 1

TODOs:

(-) LOG & LOG_TEE have been replaced by LOG, LOG_ERR, LOG_WRN, LOG_INF, LOG_CNT -> LOG is used just for Console/Stream Output -> LOG_xxx is used for ERR, WRN, INF, CNT --> Not sure yet where this goes...

Q4: Update the README about downloading different LOG files?

(-) llama-vocab.cpp --- This function is no longer there. Is tinystories still working?

We had added a check on `llama_token_bos(model)`, else the llama2.c models never stop generating:
  ```
  bool llama_token_is_eog_impl(const struct llama_vocab & vocab, llama_token token) {
      return token != -1 && (
          token == llama_token_eos_impl(vocab) ||
          token == llama_token_eot_impl(vocab) || 
          token == llama_token_bos_impl(vocab) // ICPP-PATCH: the llama2.c model predicts bos without first predicting an eos
      );
  }
  ```

(-) TODO: Monitor memory, and make sure that ctx is freed up... See free_ctx() method that has been outcommented in main_.cpp


NOTES:

(-) main_.cpp includes a new file: llama_cpp_onicai_fork/common/chat-template.hpp

(-) All the LLM architectures supported by llama_cpp_canister are listed in src/llama_cpp_onicai_fork/src/llama-arch.cpp

(-) NOTE: common/grammar-parser.cpp is no longer there. It appears to be fully included in src/llama-grammar.cpp

(-) NOTE: llama_cpp_onicai_fork/ggml/src/ggml-backend.cpp used to be llama_cpp_onicai_fork/ggml/src/ggml-backend.c

(-) NOTE: llama_cpp_onicai_fork/ggml/src/ggml-aarch64.c no longer exists Previous update: No updates needed for icpp-pro

(-) NOTE: llama_cpp_onicai_fork/common/log.h no update was needed this time: Previous update: - #include <thread> - Some other threading code

(-) NOTE: llama_cpp_onicai_fork/common/common.h no update was needed this time: Previous update: - #include <thread>