Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to train-text-from-scratch #4791

Closed
notooth1 opened this issue Jan 6, 2024 · 8 comments
Closed

Failed to train-text-from-scratch #4791

notooth1 opened this issue Jan 6, 2024 · 8 comments

Comments

@notooth1
Copy link

notooth1 commented Jan 6, 2024

I failed to train-text-from-scratch. Can anyone help?

$ ./train-text-from-scratch --vocab-model Nous-Hermes-llama-2-7b.gguf --train-data "shakespeare.txt"
...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
@murillo128
Copy link

Having same problem here:

Breakpoint 3, main (argc=<optimized out>, argv=<optimized out>)
    at examples/train-text-from-scratch/train-text-from-scratch.cpp:1147
1147            size_t max_compute_size = ggml_allocr_max_size(alloc) + tensor_alignment;
(gdb) print best_compute_size
$3 = 18446744073709551615
(gdb) down
Bottom (innermost) frame selected; you cannot go down.
(gdb) s
ggml_allocr_max_size (alloc=0x55555576fc40) at ggml-alloc.c:759
759     size_t ggml_allocr_max_size(ggml_allocr_t alloc) {
(gdb) s
760         return ggml_tallocr_max_size(alloc->talloc);
(gdb) s
ggml_tallocr_max_size (alloc=0x5555558d3b90) at ggml-alloc.c:326
326     size_t ggml_tallocr_max_size(ggml_tallocr_t alloc) {
(gdb) s
327         return alloc->max_size;
(gdb) print alloc->max_size
$4 = 140737056866336
(gdb) print alloc
$5 = (ggml_tallocr_t) 0x5555558d3b90
(gdb) print *alloc
$6 = {buffer = 0x5555558dcbe0, buffer_owned = true, base = 0x1000, alignment = 32, n_free_blocks = 16, free_blocks = {{
      addr = 0x1220, size = 18874368}, {addr = 0x12c1220, size = 262144}, {addr = 0x13c2e20, size = 254976}, {
      addr = 0x19ca9240, size = 82837504}, {addr = 0x1eca9240, size = 28311552}, {addr = 0x207e9240,
      size = 9223372036309614015}, {addr = 0x7fff903a5020, size = 4096}, {addr = 0x7fff96550020, size = 307224576}, {
      addr = 0x7fffd7065020, size = 7340032}, {addr = 0x7fffd7925020, size = 7340032}, {addr = 0x7fffde28f020,
      size = 86943744}, {addr = 0x7fffe38b9c20, size = 1024}, {addr = 0x7fffe3bfa420, size = 1024}, {addr = 0x7fffe427b420,
      size = 2048}, {addr = 0x7fffe44bc020, size = 2359296}, {addr = 0x7fffe6481020, size = 3408896}, {
      addr = 0x7fffe6481020, size = 3408896}, {addr = 0x7fffe6481020, size = 3408896}, {addr = 0x7fffe6481020,
      size = 3408896}, {addr = 0x7fffe6481020, size = 3408896}, {addr = 0x7fffe6481020, size = 3408896}, {
      addr = 0x7fffe6481020, size = 3408896}, {addr = 0x7fffe6481020, size = 3408896}, {addr = 0x0,
      size = 0} <repeats 233 times>}, max_size = 140737056866336, measure = true}
(gdb) n
main (argc=<optimized out>, argv=<optimized out>) at examples/train-text-from-scratch/train-text-from-scratch.cpp:1148
1148            if (max_compute_size < best_compute_size) {
(gdb) print max_compute_size
$7 = 140737056866368
(gdb) c
Continuing.

Breakpoint 3, main (argc=<optimized out>, argv=<optimized out>)
    at examples/train-text-from-scratch/train-text-from-scratch.cpp:1147
1147            size_t max_compute_size = ggml_allocr_max_size(alloc) + tensor_alignment;
(gdb) c
Continuing.
main: compute_size = 140737005371456 bytes (134217264.0 MB)
main: evaluation order = RIGHT_TO_LEFT
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842866 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78268b7 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7ca4f06 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6  0x00007ffff7cb6e6c in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x00007ffff7cb6ed7 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8  0x00007ffff7cb7138 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x7ffff7e5bba8 <typeinfo for std::bad_alloc>,
    dest=0x7ffff7cb5400 <std::bad_alloc::~bad_alloc()>) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:98
#9  0x00007ffff7ca49da in operator new (sz=sz@entry=140737005371456) at ../../../../src/libstdc++-v3/libsupc++/new_op.cc:54
#10 0x0000555555578139 in std::__new_allocator<unsigned char>::allocate (this=0x7fffffffdaf0, __n=140737005371456)
    at /usr/include/c++/13/bits/new_allocator.h:147
#11 std::allocator_traits<std::allocator<unsigned char> >::allocate (__n=140737005371456, __a=...)
    at /usr/include/c++/13/bits/alloc_traits.h:482
#12 std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_allocate (__n=140737005371456, this=0x7fffffffdaf0)
    at /usr/include/c++/13/bits/stl_vector.h:378
#13 std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append (__n=140737005371456, this=0x7fffffffdaf0)
    at /usr/include/c++/13/bits/vector.tcc:663
#14 std::vector<unsigned char, std::allocator<unsigned char> >::resize (this=this@entry=0x7fffffffdaf0,
    __new_size=__new_size@entry=140737005371456) at /usr/include/c++/13/bits/stl_vector.h:1013
#15 0x000055555556c721 in main (argc=<optimized out>, argv=<optimized out>)
    at examples/train-text-from-scratch/train-text-from-scratch.cpp:1163

I have enabled the alloc debuging (removing the assertion so I could get full trace):

main: seed: 1704296566
llama_model_loader: loaded meta data with 17 key-value pairs and 0 tensors from ../llama.cpp/models/ggml-vocab-llama.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  12:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  13:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = all F32 (guessed)
llm_load_print_meta: model params     = 0.00 B
llm_load_print_meta: model size       = 0.00 MiB (-nan BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llama_model_load: vocab only - skipping tensors
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
main: init model
max_size = 7.81 MB: tensors: token_embd.weight (7.81 MB)
max_size = 7.81 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB)
max_size = 15.63 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB)
max_size = 15.63 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB)
max_size = 15.64 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB)
max_size = 15.66 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB)
max_size = 15.67 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB)
max_size = 15.69 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB)
max_size = 15.69 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB)
max_size = 15.88 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB)
max_size = 16.06 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB)
max_size = 16.25 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB)
max_size = 16.25 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB)
max_size = 16.27 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB)
max_size = 16.28 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB)
max_size = 16.30 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB)
max_size = 16.31 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB)
max_size = 16.31 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB)
max_size = 16.50 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB)
max_size = 16.69 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB)
max_size = 16.88 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB)
max_size = 24.69 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB)
max_size = 24.69 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB)
max_size = 32.50 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB)
max_size = 32.50 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB)
max_size = 32.52 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB)
max_size = 32.53 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB)
max_size = 32.55 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB)
max_size = 32.56 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB)
max_size = 32.56 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB)
max_size = 32.75 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB)
max_size = 32.94 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB)
max_size = 33.13 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB)
max_size = 33.13 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB)
max_size = 33.14 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB)
max_size = 33.16 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB)
max_size = 33.17 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB)
max_size = 33.19 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB) blk.1.attn_output.weight (grad) (0.02 MB)
max_size = 33.19 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB) blk.1.attn_output.weight (grad) (0.02 MB) blk.1.ffn_norm.weight (grad) (0.00 MB)
max_size = 33.38 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB) blk.1.attn_output.weight (grad) (0.02 MB) blk.1.ffn_norm.weight (grad) (0.00 MB) blk.1.ffn_gate.weight (grad) (0.19 MB)
max_size = 33.56 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB) blk.1.attn_output.weight (grad) (0.02 MB) blk.1.ffn_norm.weight (grad) (0.00 MB) blk.1.ffn_gate.weight (grad) (0.19 MB) blk.1.ffn_down.weight (grad) (0.19 MB)
max_size = 33.75 MB: tensors: token_embd.weight (7.81 MB) output_norm.weight (0.00 MB) output.weight (7.81 MB) blk.0.attn_norm.weight (0.00 MB) blk.0.attn_q.weight (0.02 MB) blk.0.attn_k.weight (0.02 MB) blk.0.attn_v.weight (0.02 MB) blk.0.attn_output.weight (0.02 MB) blk.0.ffn_norm.weight (0.00 MB) blk.0.ffn_gate.weight (0.19 MB) blk.0.ffn_down.weight (0.19 MB) blk.0.ffn_up.weight (0.19 MB) blk.1.attn_norm.weight (0.00 MB) blk.1.attn_q.weight (0.02 MB) blk.1.attn_k.weight (0.02 MB) blk.1.attn_v.weight (0.02 MB) blk.1.attn_output.weight (0.02 MB) blk.1.ffn_norm.weight (0.00 MB) blk.1.ffn_gate.weight (0.19 MB) blk.1.ffn_down.weight (0.19 MB) blk.1.ffn_up.weight (0.19 MB) token_embd.weight (grad) (7.81 MB) output_norm.weight (grad) (0.00 MB) output.weight (grad) (7.81 MB) blk.0.attn_norm.weight (grad) (0.00 MB) blk.0.attn_q.weight (grad) (0.02 MB) blk.0.attn_k.weight (grad) (0.02 MB) blk.0.attn_v.weight (grad) (0.02 MB) blk.0.attn_output.weight (grad) (0.02 MB) blk.0.ffn_norm.weight (grad) (0.00 MB) blk.0.ffn_gate.weight (grad) (0.19 MB) blk.0.ffn_down.weight (grad) (0.19 MB) blk.0.ffn_up.weight (grad) (0.19 MB) blk.1.attn_norm.weight (grad) (0.00 MB) blk.1.attn_q.weight (grad) (0.02 MB) blk.1.attn_k.weight (grad) (0.02 MB) blk.1.attn_v.weight (grad) (0.02 MB) blk.1.attn_output.weight (grad) (0.02 MB) blk.1.ffn_norm.weight (grad) (0.00 MB) blk.1.ffn_gate.weight (grad) (0.19 MB) blk.1.ffn_down.weight (grad) (0.19 MB) blk.1.ffn_up.weight (grad) (0.19 MB)
print_params: n_vocab: 32000
print_params: n_ctx:   32
print_params: n_embd:  64
print_params: n_head:  8
print_params: n_ff:    768
print_params: n_layer: 2
print_params: n_rot:   8
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: model_size = 35408832 bytes (33.8 MB)
main: opt_size  = 53089248 bytes (50.6 MB)
main: opt iter 0
main: input_size = 4096160 bytes (3.9 MB)
max_size = 0.00 MB: tensors:  (0.00 MB)
max_size = 3.91 MB: tensors:  (0.00 MB)  (3.91 MB)
max_size = 0.00 MB: tensors:  (0.00 MB)
max_size = 0.00 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB)
max_size = 0.01 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB)
max_size = 0.02 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB)
max_size = 0.02 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB)
max_size = 0.03 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB)
max_size = 0.04 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB)
max_size = 0.05 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB)
max_size = 3.95 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB)
max_size = 3.95 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB)
tried to free tensor output_norm.weight not found
max_size = 3.96 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB)
max_size = 3.97 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB)
max_size = 3.98 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB)
max_size = 3.98 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB)
tried to free tensor token_embd.weight not found
max_size = 3.99 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB) t02 (0.01 MB)
max_size = 4.00 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB)
max_size = 4.01 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB)
max_size = 4.02 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB)
max_size = 4.02 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t03 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB)
max_size = 4.03 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB)
max_size = 4.04 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB)
max_size = 4.05 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t22 (0.01 MB)
max_size = 4.13 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t26 (0.09 MB)
max_size = 4.23 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t03 (0.01 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t26 (0.09 MB) t25 (0.09 MB)
max_size = 4.27 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t23 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t26 (0.09 MB) t25 (0.09 MB)
max_size = 7.98 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_87 (3.91 MB) t16 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB)
tried to free tensor targets not found
max_size = 15.79 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_87 (3.91 MB) t16 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB)
tried to free tensor output.weight not found
tried to free tensor blk.1.ffn_norm.weight not found
tried to free tensor blk.1.attn_norm.weight not found
max_size = 134019446.48 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t16 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB)
max_size = 134019446.49 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t16 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t05 (clone) (0.01 MB)
max_size = 134019446.58 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB)
max_size = 134019446.67 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB)
max_size = 134019446.77 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) t25 (clone) (0.09 MB)
max_size = 134019446.86 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) t25 (clone) (0.09 MB) t28 (clone) (0.09 MB)
max_size = 134019447.05 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) t25 (clone) (0.09 MB) t28 (clone) (0.09 MB) node_96 (0.19 MB)
tried to free tensor blk.1.ffn_down.weight not found
max_size = 134019447.14 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) t25 (clone) (0.09 MB) node_98 (0.09 MB) node_96 (0.19 MB) node_100 (0.09 MB)
max_size = 134019447.33 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) node_101 (0.19 MB) t27 (clone) (0.09 MB) node_98 (0.09 MB) node_96 (0.19 MB) node_100 (0.09 MB)
tried to free tensor blk.1.ffn_up.weight not found
tried to free tensor blk.1.ffn_gate.weight not found
max_size = 134019517.27 MB: tensors: leaf_1 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t23 (clone) (0.01 MB) node_95 (0.01 MB) t03 (clone) (0.01 MB) t02 (clone) (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) node_93 (0.00 MB) t20 (clone) (0.01 MB) t22 (clone) (0.01 MB) t16 (0.01 MB) t05 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) t07 (0.01 MB) t24 (clone) (0.01 MB) t10 (0.01 MB) t11 (0.01 MB) t18 (0.01 MB) node_89 (7.81 MB) t04 (clone) (0.01 MB) node_101 (0.19 MB) t27 (clone) (0.09 MB) node_104 (0.01 MB) node_109 (0.00 MB) node_96 (0.19 MB)
tried to free tensor blk.1.attn_output.weight not found
tried to free tensor blk.1.attn_v.weight not found
tried to free tensor blk.1.attn_k.weight not found
tried to free tensor blk.1.attn_q.weight not found
tried to free tensor blk.0.ffn_norm.weight not found
tried to free tensor blk.0.attn_norm.weight not found
tried to free tensor blk.0.ffn_down.weight not found
tried to free tensor blk.0.ffn_up.weight not found
tried to free tensor blk.0.ffn_gate.weight not found
tried to free tensor blk.0.attn_output.weight not found
tried to free tensor blk.0.attn_v.weight not found
tried to free tensor blk.0.attn_k.weight not found
tried to free tensor blk.0.attn_q.weight not found
tried to free tensor tokens_input not found
max_size = 0.00 MB: tensors:  (0.00 MB)
max_size = 0.00 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB)
max_size = 0.01 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB)
max_size = 0.02 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB)
max_size = 0.02 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB)
max_size = 0.03 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB)
max_size = 0.04 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB)
max_size = 0.05 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB)
max_size = 3.95 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB)
max_size = 3.95 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB)
tried to free tensor token_embd.weight not found
max_size = 3.96 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t02 (0.01 MB)
max_size = 3.97 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t02 (0.01 MB) t03 (0.01 MB)
max_size = 3.98 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t03 (0.01 MB) t08 (0.01 MB)
max_size = 3.98 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t03 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB)
max_size = 3.99 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t03 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB)
max_size = 4.00 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB)
max_size = 4.01 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB)
max_size = 4.02 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB)
max_size = 4.02 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t22 (0.01 MB)
max_size = 4.03 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t22 (0.01 MB) t23 (0.01 MB)
max_size = 4.13 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t25 (0.09 MB) t23 (0.01 MB)
max_size = 4.22 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t25 (0.09 MB) t23 (0.01 MB) t26 (0.09 MB)
max_size = 4.27 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t20 (0.01 MB) t25 (0.09 MB) t23 (0.01 MB) t26 (0.09 MB)
tried to free tensor output_norm.weight not found
max_size = 7.97 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB)
tried to free tensor targets not found
max_size = 134019446.48 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB)
tried to free tensor output.weight not found
max_size = 134019446.49 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) t30 (0.01 MB) t31 (0.01 MB) t32 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB)
max_size = 134019446.59 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB)
tried to free tensor blk.1.ffn_down.weight not found
tried to free tensor blk.1.attn_norm.weight not found
max_size = 134019446.59 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB)
max_size = 134019446.60 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t11 (clone) (0.01 MB)
tried to free tensor blk.1.ffn_norm.weight not found
max_size = 134019446.70 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB)
max_size = 134019446.79 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB)
max_size = 134019446.88 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB)
max_size = 134019446.89 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB) node_97 (0.01 MB)
max_size = 134019446.98 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB) node_97 (0.01 MB) t25 (clone) (0.09 MB)
tried to free tensor blk.1.ffn_up.weight not found
max_size = 134019447.08 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) t30 (0.01 MB) node_94 (0.09 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t08 (0.01 MB) t10 (0.01 MB) t05 (0.01 MB) t16 (0.01 MB) t18 (0.01 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB) node_97 (0.01 MB) t25 (clone) (0.09 MB) node_99 (0.09 MB)
tried to free tensor blk.1.ffn_gate.weight not found
tried to free tensor blk.1.attn_output.weight not found
tried to free tensor blk.1.attn_v.weight not found
tried to free tensor blk.1.attn_k.weight not found
tried to free tensor blk.1.attn_q.weight not found
tried to free tensor blk.0.ffn_down.weight not found
tried to free tensor blk.0.attn_norm.weight not found
tried to free tensor blk.0.ffn_norm.weight not found
max_size = 134019516.99 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) node_137 (0.09 MB) node_101 (0.01 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB)  (view) (reshaped) (permuted) (cont) (0.01 MB) node_134 (0.01 MB) node_120 (0.01 MB) t02 (clone) (0.01 MB) t22 (clone) (0.01 MB) node_130 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) node_104 (0.01 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB) node_127 (0.01 MB) t25 (clone) (0.09 MB) node_99 (0.09 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB)
max_size = 134019517.08 MB: tensors: leaf_2 (0.00 MB) leaf_3 (0.00 MB) t01 (0.01 MB) node_137 (0.09 MB) node_101 (0.01 MB) t31 (0.01 MB) t02 (clone) (0.01 MB) t33 (0.01 MB) t34 (3.91 MB) t36 (0.00 MB) t11 (0.01 MB) t07 (0.01 MB) t20 (clone) (0.01 MB) t10 (0.01 MB) t22 (clone) (0.01 MB) t23 (clone) (0.01 MB) t18 (0.01 MB)  (view) (reshaped) (permuted) (cont) (0.01 MB) node_134 (0.01 MB) node_120 (0.01 MB) t02 (clone) (0.01 MB) t22 (clone) (0.01 MB) node_130 (0.01 MB) t18 (0.01 MB) node_87 (3.91 MB) node_90 (0.01 MB) node_92 (0.01 MB) t03 (clone) (0.01 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) node_104 (0.01 MB) t27 (clone) (0.09 MB) node_95 (0.09 MB) node_127 (0.01 MB) t25 (clone) (0.09 MB) node_99 (0.09 MB) t04 (clone) (0.01 MB) t24 (clone) (0.01 MB) t26 (clone) (0.09 MB) t27 (clone) (0.09 MB)
tried to free tensor blk.0.ffn_up.weight not found
tried to free tensor blk.0.ffn_gate.weight not found
tried to free tensor blk.0.attn_output.weight not found
tried to free tensor blk.0.attn_v.weight not found
tried to free tensor blk.0.attn_k.weight not found
tried to free tensor blk.0.attn_q.weight not found
tried to free tensor tokens_input not found
main: compute_size = 140529649144640 bytes (134019520.0 MB)
main: evaluation order = RIGHT_TO_LEFT
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

@jtatman
Copy link

jtatman commented Jan 9, 2024

Same. No matter what the data size is or how I chunk it up, data that was previously very much able to be processed is no longer able to be processed. Fine-tuning works, so there's something fundamentally different between references and headers of those two binaries, or there's something broken in the "train" binary itself.

This is what I see in a diff between non-working version and working version:

diff train-text-from-scratch.cpp train-text-from-scratch.cpp.old
372c372,375
< const float kv_scale = 1.0f/sqrtf(float(n_embd)/n_head);

struct ggml_tensor * kv_scale = NULL;
if (!enable_flash_attn) {
    kv_scale = ggml_new_f32(ctx, 1.0f/sqrtf(float(n_embd)/n_head));
}

443a447

    struct ggml_tensor * one = ggml_new_f32(ctx, 1.0f);

445,446c449,450
< ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t35, 1.0f));
< ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t36, 1.0f));

    ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t35, one));
    ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t36, one));

448c452
< ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t36->grad, 1.0f));

    ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, t36->grad, one));

450c454
< ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, KQ_pos, 1.0f));

    ggml_build_forward_expand(gb, ggml_scale_inplace(ctx, KQ_pos, one));

1291a1296,1299

}

if (alloc) {
    ggml_allocr_free(alloc);

And an strace of the non-working, which has some dicey info itself, but no smoking gun from what I can see:

strace ./train-text-from-scratch --vocab-model models/ggml-vocab-llama.gguf --train-data shakespeare.txt
execve("./train-text-from-scratch", ["./train-text-from-scratch", "--vocab-model", "models/ggml-vocab-llama.gguf", "--train-data", "shakespeare.txt"], 0x7ffd983effb0 /* 43 vars /) = 0
brk(NULL) = 0x5616ab91c000
arch_prctl(0x3001 /
ARCH_??? /, 0x7ffcafd9e5f0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/usr/local/lib/tls/x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls/x86_64/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/tls/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/x86_64/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/x86_64", 0x7ffcafd9d840) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/local/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "tls/x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libopenblas.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=93692, ...}) = 0
mmap(NULL, 93692, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f38ac160000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libopenblas.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\20\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=35123568, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38ac1ad000
mmap(NULL, 35178392, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9fd0000
mprotect(0x7f38aa0c7000, 33996800, PROT_NONE) = 0
mmap(0x7f38aa0c7000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf7000) = 0x7f38aa0c7000
mmap(0x7f38aa0d0000, 32411648, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xff000) = 0x7f38aa0d0000
mmap(0x7f38abfb9000, 1544192, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1fe8000) = 0x7f38abfb9000
mmap(0x7f38ac133000, 122880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2161000) = 0x7f38ac133000
mmap(0x7f38ac151000, 47000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f38ac151000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \341\t\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1956992, ...}) = 0
mmap(NULL, 1972224, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9de8000
mprotect(0x7f38a9e7e000, 1290240, PROT_NONE) = 0
mmap(0x7f38a9e7e000, 987136, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x96000) = 0x7f38a9e7e000
mmap(0x7f38a9f6f000, 299008, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x7f38a9f6f000
mmap(0x7f38a9fb9000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d0000) = 0x7f38a9fb9000
mmap(0x7f38a9fc7000, 10240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f38a9fc7000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\323\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1369384, ...}) = 0
mmap(NULL, 1368336, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9c98000
mmap(0x7f38a9ca5000, 684032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7f38a9ca5000
mmap(0x7f38a9d4c000, 626688, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb4000) = 0x7f38a9d4c000
mmap(0x7f38a9de5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14c000) = 0x7f38a9de5000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3405\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=104984, ...}) = 0
mmap(NULL, 107592, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9c78000
mmap(0x7f38a9c7b000, 73728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f38a9c7b000
mmap(0x7f38a9c8d000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f38a9c8d000
mmap(0x7f38a9c91000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f38a9c91000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220q\0\0\0\0\0\0"..., 832) = 832
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\f\4K\246\21\256\356\256\273\203t\346\6\0374"..., 68, 824) = 68 fstat(3, {st_mode=S_IFREG|0755, st_size=157224, ...}) = 0 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\f\4K\246\21\256\356\256\273\203t\346\6\0374"..., 68, 824) = 68
mmap(NULL, 140408, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9c50000
mmap(0x7f38a9c56000, 69632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f38a9c56000
mmap(0x7f38a9c67000, 24576, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f38a9c67000
mmap(0x7f38a9c6d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c000) = 0x7f38a9c6d000
mmap(0x7f38a9c6f000, 13432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f38a9c6f000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300A\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\356\276]_K\213\212S\354Dkc\230\33\272"..., 68, 880) = 68 fstat(3, {st_mode=S_IFREG|0755, st_size=2029592, ...}) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\356\276]_K\213\212S\354Dkc\230\33\272"..., 68, 880) = 68
mmap(NULL, 2037344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9a58000
mmap(0x7f38a9a7a000, 1540096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f38a9a7a000
mmap(0x7f38a9bf2000, 319488, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19a000) = 0x7f38a9bf2000
mmap(0x7f38a9c40000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f38a9c40000
mmap(0x7f38a9c46000, 13920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f38a9c46000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libgfortran.so.5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libgfortran.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\361\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=2911456, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38ac1ab000
mmap(NULL, 2914304, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9790000
mprotect(0x7f38a97ad000, 2785280, PROT_NONE) = 0
mmap(0x7f38a97ad000, 2576384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7f38a97ad000
mmap(0x7f38a9a22000, 204800, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x292000) = 0x7f38a9a22000
mmap(0x7f38a9a55000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2c4000) = 0x7f38a9a55000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libquadmath.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libquadmath.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200:\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=297784, ...}) = 0
mmap(NULL, 299712, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f38a9740000
mmap(0x7f38a9743000, 184320, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f38a9743000
mmap(0x7f38a9770000, 98304, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x30000) = 0x7f38a9770000
mmap(0x7f38a9788000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x47000) = 0x7f38a9788000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38ac1a9000
mmap(NULL, 73728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a972e000
arch_prctl(ARCH_SET_FS, 0x7f38a973d740) = 0
mprotect(0x7f38a9c40000, 16384, PROT_READ) = 0
mprotect(0x7f38a9de5000, 4096, PROT_READ) = 0
mprotect(0x7f38a9788000, 4096, PROT_READ) = 0
mprotect(0x7f38a9c91000, 4096, PROT_READ) = 0
mprotect(0x7f38a9a55000, 4096, PROT_READ) = 0
mprotect(0x7f38a9c6d000, 4096, PROT_READ) = 0
mprotect(0x7f38a9fb9000, 45056, PROT_READ) = 0
mprotect(0x7f38ac133000, 24576, PROT_READ) = 0
mprotect(0x5616a9ced000, 8192, PROT_READ) = 0
mprotect(0x7f38ac1a5000, 4096, PROT_READ) = 0
munmap(0x7f38ac160000, 93692) = 0
set_tid_address(0x7f38a973da10) = 364343
set_robust_list(0x7f38a973da20, 24) = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f38a9c56bf0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f38a9c64420}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f38a9c56c90, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f38a9c64420}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192
1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL) = 0x5616ab91c000
brk(0x5616ab93d000) = 0x5616ab93d000
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x4), ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x4), ...}) = 0
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x4), ...}) = 0
openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(3, /* 21 entries /, 32768) = 624
getdents64(3, /
0 entries */, 32768) = 0
close(3) = 0
sched_getaffinity(0, 128, [0, 1, 2, 3]) = 8
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f38a8f2d000
mprotect(0x7f38a8f2e000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f38a971dfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[364344], tls=0x7f38a972d700, child_tidptr=0x7f38a972d9d0) = 364344
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f38a672c000
mprotect(0x7f38a672d000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f38a6f1cfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[364345], tls=0x7f38a6f2c700, child_tidptr=0x7f38a6f2c9d0) = 364345
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f38a3f2b000
mprotect(0x7f38a3f2c000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f38a471bfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[364346], tls=0x7f38a472b700, child_tidptr=0x7f38a472b9d0) = 364346
futex(0x7f38a9fc76bc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f38a9fc76c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(0x5616ab95e000) = 0x5616ab95e000
brk(0x5616ab97f000) = 0x5616ab97f000
brk(0x5616ab9a0000) = 0x5616ab9a0000
brk(0x5616ab9c1000) = 0x5616ab9c1000
brk(0x5616ab9e2000) = 0x5616ab9e2000
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x4), ...}) = 0
write(1, "main: seed: 1704828505\n", 23main: seed: 1704828505
) = 23
openat(AT_FDCWD, "models/ggml-vocab-llama.gguf", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=723676, ...}) = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=723676, ...}) = 0
lseek(3, 720896, SEEK_SET) = 720896
read(3, "\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1"..., 2780) = 2780
lseek(3, 0, SEEK_SET) = 0
openat(AT_FDCWD, "models/ggml-vocab-llama.gguf", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=723676, ...}) = 0
read(4, "GGUF\3\0\0\0\0\0\0\0\0\0\0\0\21\0\0\0\0\0\0\0\24\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 516096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1ead000
read(4, "\0\0\0\0\0\0<0xFC>\6\0\0\0\0\0\0\0<0xFD>\6\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0estion\3\0\0\0\0\0\0\0ime\5\0\0\0\0\0\0\0\342\226"..., 4096) = 4096
read(4, "\0\0\0\0ree\2\0\0\0\0\0\0\0}}\7\0\0\0\0\0\0\0\342\226\201time"..., 4096) = 4096
brk(0x5616aba03000) = 0x5616aba03000
read(4, "\0\0\0\0\0Ex\4\0\0\0\0\0\0\0ress\2\0\0\0\0\0\0\0ST\4\0\0"..., 4096) = 4096
read(4, "c\4\0\0\0\0\0\0\0ices\3\0\0\0\0\0\0\0The\4\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\0box\3\0\0\0\0\0\0\0gen\6\0\0\0\0\0\0\0\342\226\201"..., 4096) = 4096
read(4, "\0\0\0\0\0\342\226\201\320\262\320\270\4\0\0\0\0\0\0\0ably\n\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\342\226\201\303\274\3\0\0\0\0\0\0\0lex\7\0\0\0\0\0\0\0\342\226\201turn\7"..., 4096) = 4096
read(4, "lete\7\0\0\0\0\0\0\0\342\226\201Will\5\0\0\0\0\0\0\0\342\226\201Em"..., 4096) = 4096
read(4, "\342\226\201August\2\0\0\0\0\0\0\0It\t\0\0\0\0\0\0\0\342\226\201pl"..., 4096) = 4096
read(4, "\0\0\0\0ras\3\0\0\0\0\0\0\0jor\7\0\0\0\0\0\0\0\342\226\201\320\261\320"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\0\342\226\201\321\206\4\0\0\0\0\0\0\0abor\3\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "anch\6\0\0\0\0\0\0\0inding\10\0\0\0\0\0\0\0\342\226\201est"..., 4096) = 4096
read(4, "ed\10\0\0\0\0\0\0\0\342\226\201contr\5\0\0\0\0\0\0\0aries\t"..., 4096) = 4096
read(4, "\0\0\342\226\201switch\t\0\0\0\0\0\0\0\342\226\201vector\7\0\0\0"..., 4096) = 4096
read(4, "ne\6\0\0\0\0\0\0\0\321\207\320\265\320\275\2\0\0\0\0\0\0\0UP\2\0\0\0\0\0"..., 4096) = 4096
read(4, "\226\201reading\4\0\0\0\0\0\0\0home\5\0\0\0\0\0\0\0zei"..., 4096) = 4096
brk(0x5616aba24000) = 0x5616aba24000
read(4, "\0\0\0\342\226\201VI\5\0\0\0\0\0\0\0\303\252tre\4\0\0\0\0\0\0\0ile"..., 4096) = 4096
read(4, "\0\342\226\201mount\3\0\0\0\0\0\0\0)$,\7\0\0\0\0\0\0\0\342\226\201d"..., 4096) = 4096
read(4, "ngs\7\0\0\0\0\0\0\0\342\226\201norm\7\0\0\0\0\0\0\0\342\226\201run"..., 4096) = 4096
read(4, "\0\342\226\201College\5\0\0\0\0\0\0\0athol\2\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\342\226\201''\6\0\0\0\0\0\0\0\342\226\201Mos\f\0\0\0\0\0\0\0\342\226"..., 4096) = 4096
read(4, "\0Items\4\0\0\0\0\0\0\0look\7\0\0\0\0\0\0\0connec"..., 4096) = 4096
read(4, "\0\0users\3\0\0\0\0\0\0\0GET\7\0\0\0\0\0\0\0\342\226\201del"..., 4096) = 4096
read(4, "\0\320\273\320\260\321\201\321\214\5\0\0\0\0\0\0\0\342\226\201cm\6\0\0\0\0\0\0\0\320\263"..., 4096) = 4096
read(4, "\3\0\0\0\0\0\0\0sey\5\0\0\0\0\0\0\0hline\10\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\201Mount\v\0\0\0\0\0\0\0\342\226\201students\6\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\342\226\201sudden\6\0\0\0\0\0\0\0ometer\6\0\0\0\0\0"..., 4096) = 4096
read(4, "nsport\4\0\0\0\0\0\0\0emas\2\0\0\0\0\0\0\0FC\v\0\0\0"..., 4096) = 4096
read(4, "a\10\0\0\0\0\0\0\0\320\272\321\202\320\270\320\262\6\0\0\0\0\0\0\0\342\226\201sir\3"..., 4096) = 4096
read(4, "\0\303\244hrend\7\0\0\0\0\0\0\0licated\3\0\0\0\0\0\0\0D"..., 4096) = 4096
brk(0x5616aba45000) = 0x5616aba45000
read(4, "\0anning\10\0\0\0\0\0\0\0\342\226\201panel\10\0\0\0\0\0\0\0\342"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\342\226\201arbitr\10\0\0\0\0\0\0\0\342\226\201roman\6"..., 4096) = 4096
read(4, "\201diese\n\0\0\0\0\0\0\0\321\201\321\202\320\270\321\202\321\203\3\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "ri\t\0\0\0\0\0\0\0\342\226\201seinem\v\0\0\0\0\0\0\0\342\226\201f\303"..., 4096) = 4096
read(4, "\7\0\0\0\0\0\0\0\342\226\201pela\5\0\0\0\0\0\0\0Media\5\0\0\0"..., 4096) = 4096
read(4, "\n\0\0\0\0\0\0\0\342\226\201Captain\10\0\0\0\0\0\0\0\342\226\201tod"..., 4096) = 4096
read(4, "contained\5\0\0\0\0\0\0\0Close\3\0\0\0\0\0\0\0ru"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\342\226\201aggreg\10\0\0\0\0\0\0\0\342\226\201weeks\t"..., 4096) = 4096
read(4, "\0\342\226\201municipal\6\0\0\0\0\0\0\0\342\226\201Ori\7\0\0\0\0"..., 4096) = 4096
read(4, "all\5\0\0\0\0\0\0\0\342\226\201Uk\t\0\0\0\0\0\0\0\342\226\201stree"..., 4096) = 4096
read(4, "ceState\t\0\0\0\0\0\0\0\342\226\201\320\261\321\200\320\260\7\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\226\201Audiodateien\6\0\0\0\0\0\0\0Delete\3\0\0\0"..., 4096) = 4096
read(4, "\4\0\0\0\0\0\0\0inds\7\0\0\0\0\0\0\0\342\226\201ging\2\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\342\226\201protection\6\0\0\0\0\0\0\0taient\v\0"..., 4096) = 4096
read(4, "\320\270\320\273\321\214\4\0\0\0\0\0\0\0cido\4\0\0\0\0\0\0\0exit\4\0"..., 4096) = 4096
read(4, "\0\0\0luss\v\0\0\0\0\0\0\0\342\226\201Histoire\f\0\0\0\0\0"..., 4096) = 4096
brk(0x5616aba66000) = 0x5616aba66000
read(4, "\0\0\0\0\0\0arters\3\0\0\0\0\0\0\0--"\7\0\0\0\0\0\0\0\342"..., 4096) = 4096
read(4, "\0\0\0\0omorphism\7\0\0\0\0\0\0\0details\6\0\0\0"..., 4096) = 4096
read(4, "\0\342\226\201Gew\n\0\0\0\0\0\0\0\342\226\201optimal\r\0\0\0\0\0\0"..., 4096) = 4096
read(4, "n\4\0\0\0\0\0\0\0\303\241ra\4\0\0\0\0\0\0\0mals\7\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0\0Look\7\0\0\0\0\0\0\0\342\226\201eran\7\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\6\0\0\0\0\0\0\0\342\226\201cod\6\0\0\0\0\0\0\0\320\264\321\226\320\262\n\0\0\0"..., 4096) = 4096
read(4, "reed\3\0\0\0\0\0\0\0pur\t\0\0\0\0\0\0\0\342\226\201shadow"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\342\226\201bol\4\0\0\0\0\0\0\0isis\10\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\6\0\0\0\0\0\0\0accept\n\0\0\0\0\0\0\0\342\226\201miejsce"..., 4096) = 4096
read(4, "\0\0\0\0\342\226\201Represent\5\0\0\0\0\0\0\0sites\7\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\342\226\201Cha\6\0\0\0\0\0\0\0chmark\4\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0dam\6\0\0\0\0\0\0\0\342\226\201fog\6\0\0\0\0\0\0\0\320\272\320\276\321\200"..., 4096) = 4096
read(4, "\0\0\0\0\0\342\226\201SSL\t\0\0\0\0\0\0\0\342\226\201Kaiser\t\0\0\0"..., 4096) = 4096
read(4, "l\5\0\0\0\0\0\0\0isson\3\0\0\0\0\0\0\0\303\263b\3\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0\342\226\201lavor\t\0\0\0\0\0\0\0Extension\t\0"..., 4096) = 4096
brk(0x5616aba87000) = 0x5616aba87000
read(4, "\6\0\0\0\0\0\0\0sizeof\n\0\0\0\0\0\0\0\342\226\201gem\303\244\303\237"..., 4096) = 4096
read(4, "\226\201gegr\303\274nd\3\0\0\0\0\0\0\0etr\5\0\0\0\0\0\0\0\342\226\201"..., 4096) = 4096
read(4, "fred\4\0\0\0\0\0\0\0week\t\0\0\0\0\0\0\0\342\226\201bronz"..., 4096) = 4096
read(4, "\0\320\261\320\273\320\265\v\0\0\0\0\0\0\0\342\226\201strictly\f\0\0\0\0\0"..., 4096) = 4096
read(4, "\320\275\320\270\320\272\321\226\320\262\4\0\0\0\0\0\0\0\320\234\320\260\v\0\0\0\0\0\0\0\342\226"..., 4096) = 4096
read(4, "\0\342\226\201wp\2\0\0\0\0\0\0\0/{\t\0\0\0\0\0\0\0\342\226\201amazo"..., 4096) = 4096
read(4, "ines\t\0\0\0\0\0\0\0\342\226\201agents\5\0\0\0\0\0\0\0\342\226\201"..., 4096) = 4096
read(4, "engl\7\0\0\0\0\0\0\0\342\226\201coff\7\0\0\0\0\0\0\0\342\226\201du"..., 4096) = 4096
read(4, "\271\321\201\321\202\320\262\320\270\3\0\0\0\0\0\0\0cnt\6\0\0\0\0\0\0\0lich"..., 4096) = 4096
read(4, "\0\0\0\0\0fection\4\0\0\0\0\0\0\0r\303\255a\6\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\342\226\201\321\201\320\270\321\201\321\202\320\265\320\274\n\0\0\0\0\0\0\0\342\226\201Ki"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\0ichtung\22\0\0\0\0\0\0\0\342\226\201straigh"..., 4096) = 4096
read(4, "eutral\21\0\0\0\0\0\0\0\342\226\201\320\272\320\276\321\202\320\276\321\200\320\260\321\217\3"..., 4096) = 4096
read(4, "\321\201\321\202\321\200\320\265\f\0\0\0\0\0\0\0\342\226\201EventArgs\6\0\0\0"..., 4096) = 4096
read(4, "es\6\0\0\0\0\0\0\0\342\226\201Cow\16\0\0\0\0\0\0\0\342\226\201engin"..., 4096) = 4096
read(4, "h\7\0\0\0\0\0\0\0\342\226\201zast\6\0\0\0\0\0\0\0\342\226\201Lig\n\0"..., 4096) = 4096
brk(0x5616abaa8000) = 0x5616abaa8000
read(4, "------\n\0\0\0\0\0\0\0\342\226\201maggior\17\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\342\226\201vm\10\0\0\0\0\0\0\0\342\226\201ordin\r\0\0\0\0\0\0\0\342\226\201"..., 4096) = 4096
read(4, "\226\201gross\f\0\0\0\0\0\0\0\342\226\201entertain\n\0\0\0\0"..., 4096) = 4096
read(4, "\226\201stages\21\0\0\0\0\0\0\0\342\226\201\320\262\321\200\320\265\320\274\320\265\320\275\320"..., 4096) = 4096
read(4, "hen\t\0\0\0\0\0\0\0\342\226\201\320\247\320\265\320\274\2\0\0\0\0\0\0\0Tu\7\0"..., 4096) = 4096
read(4, "\0\0\0\0\0\0geben\f\0\0\0\0\0\0\0\342\226\201Kilometer\t"..., 4096) = 4096
read(4, "l\10\0\0\0\0\0\0\0\342\226\201Miami\r\0\0\0\0\0\0\0\342\226\201type"..., 4096) = 4096
read(4, "rence\r\0\0\0\0\0\0\0\342\226\201\321\207\320\270\321\201\320\273\320\265\n\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0\0\0\342\226\201\320\267\320\260\320\274\320\265\t\0\0\0\0\0\0\0\342\226\201fur"..., 4096) = 4096
read(4, "\201Kant\t\0\0\0\0\0\0\0\342\226\201Austin\7\0\0\0\0\0\0\0\342\226"..., 4096) = 4096
read(4, "\0\0\342\226\201\320\243\320\272\321\200\320\260\320\270\320\275\321\213\6\0\0\0\0\0\0\0forme"..., 4096) = 4096
read(4, "ndly\n\0\0\0\0\0\0\0\342\226\201Chamber\4\0\0\0\0\0\0\0s\303"..., 4096) = 4096
read(4, "\2\0\0\0\0\0\0\0<>\7\0\0\0\0\0\0\0\342\226\201surg\10\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\7\0\0\0\0\0\0\0\342\226\201Luke\t\0\0\0\0\0\0\0\342\226\201suffix"..., 4096) = 4096
read(4, "\0\0\0\0\342\226\201uncertainty\5\0\0\0\0\0\0\0onian\10"..., 4096) = 4096
read(4, "Gi\5\0\0\0\0\0\0\0\303\252que\10\0\0\0\0\0\0\0makeText\f"..., 4096) = 4096
brk(0x5616abac9000) = 0x5616abac9000
read(4, "\0\0\0\0\0\0\0lla\303\247os\v\0\0\0\0\0\0\0\342\226\201Academi"..., 4096) = 4096
read(4, "\0\0~~~~~~~~\t\0\0\0\0\0\0\0\342\226\201\320\240\320\260\320\267\n\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\0\0ccc\r\0\0\0\0\0\0\0\342\226\201quantities\3\0\0"..., 4096) = 4096
read(4, "\320\270\321\207\10\0\0\0\0\0\0\0\342\226\201dying\10\0\0\0\0\0\0\0sect"..., 4096) = 4096
read(4, "t\3\0\0\0\0\0\0\0xic\6\0\0\0\0\0\0\0\320\273\321\226\321\227\n\0\0\0\0\0"..., 4096) = 4096
read(4, "sets\6\0\0\0\0\0\0\0\342\226\201DAT\n\0\0\0\0\0\0\0\342\226\201dou"..., 4096) = 4096
read(4, "\201Kost\16\0\0\0\0\0\0\0\342\226\201eredetib\305\221l\5\0\0\0\0"..., 4096) = 4096
read(4, "\0\342\226\201rok\5\0\0\0\0\0\0\0cards\n\0\0\0\0\0\0\0\320\264\320\265"..., 4096) = 4096
read(4, "\320\234\320\265\320\272\321\201\320\270\321\207\320\272\320\260\v\0\0\0\0\0\0\0\342\226\201\320\267\320\260\320"..., 4096) = 4096
read(4, "\342\226\201Writing\7\0\0\0\0\0\0\0ifiable\5\0\0\0\0\0\0"..., 4096) = 4096
read(4, "\n\0\0\0\0\0\0\0\342\226\201enables\f\0\0\0\0\0\0\0\342\226\201red"..., 4096) = 4096
read(4, "lasgow\n\0\0\0\0\0\0\0\342\226\201lecture\f\0\0\0\0\0\0\0"..., 4096) = 4096
read(4, "n\21\0\0\0\0\0\0\0\342\226\201\321\202\320\265\321\207\320\265\320\275\320\270\320\265\n\0\0\0\0\0"..., 4096) = 4096
read(4, "\0\0\0\342\226\201informaci\303\263n\v\0\0\0\0\0\0\0\342\226\201Air"..., 4096) = 4096
read(4, "\200\236\2\0\0\0\0\0\0\0\320\233\2\0\0\0\0\0\0\0\321\215\2\0\0\0\0\0\0\0\303\275"..., 4096) = 4096
read(4, "\3\0\0\0\0\0\0\0\340\244\225\3\0\0\0\0\0\0\0\340\246\276\3\0\0\0\0\0\0\0\345\260"..., 4096) = 4096
brk(0x5616abaea000) = 0x5616abaea000
read(4, "\0\0\0\0\312\224\3\0\0\0\0\0\0\0\353\246\254\3\0\0\0\0\0\0\0\352\270\260\3\0\0\0"..., 4096) = 4096
read(4, "\0\350\247\243\3\0\0\0\0\0\0\0\343\200\234\3\0\0\0\0\0\0\0\347\224\267\3\0\0\0\0\0"..., 4096) = 4096
read(4, "\3\0\0\0\0\0\0\0\354\204\270\3\0\0\0\0\0\0\0\346\200\235\3\0\0\0\0\0\0\0\346\255"..., 4096) = 4096
read(4, "\3\0\0\0\0\0\0\0\350\214\266\3\0\0\0\0\0\0\0\350\264\245\3\0\0\0\0\0\0\0\340\264"..., 4096) = 4096
brk(0x5616abb14000) = 0x5616abb14000
read(4, "\34\304\0\0\35\304\0@\35\304\0\200\35\304\0\300\35\304\0\0\36\304\0@\36\304\0\200\36\304\0\300"..., 122880) = 122880
read(4, "\364\306\0\350\364\306\0\352\364\306\0\354\364\306\0\356\364\306\0\360\364\306\0\362\364\306\0\364\364\306\0\366"..., 4096) = 4096
read(4, "\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1"..., 122880) = 122880
read(4, "\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1"..., 4096) = 2780
lseek(4, 720896, SEEK_SET) = 720896
read(4, "\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1"..., 4096) = 2780
lseek(4, 4, SEEK_CUR) = 723680
close(4) = 0
write(2, "llama_model_loader: loaded meta "..., 136llama_model_loader: loaded meta data with 17 key-value pairs and 0 tensors from models/ggml-vocab-llama.gguf (version GGUF V3 (latest))
) = 136
write(2, "llama_model_loader: Dumping meta"..., 98llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
) = 98
write(2, "llama_model_loader: - kv 0: "..., 98llama_model_loader: - kv 0: general.architecture str = llama
) = 98
write(2, "llama_model_loader: - kv 1: "..., 101llama_model_loader: - kv 1: general.name str = LLaMA v2
) = 101
write(2, "llama_model_loader: - kv 2: "..., 97llama_model_loader: - kv 2: llama.context_length u32 = 4096
) = 97
write(2, "llama_model_loader: - kv 3: "..., 97llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
) = 97
write(2, "llama_model_loader: - kv 4: "..., 95llama_model_loader: - kv 4: llama.block_count u32 = 32
) = 95
write(2, "llama_model_loader: - kv 5: "..., 98llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
) = 98
write(2, "llama_model_loader: - kv 6: "..., 96llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
) = 96
write(2, "llama_model_loader: - kv 7: "..., 95llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
) = 95
write(2, "llama_model_loader: - kv 8: "..., 95llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
) = 95
write(2, "llama_model_loader: - kv 9: "..., 101llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
) = 101
write(2, "llama_model_loader: - kv 10: "..., 98llama_model_loader: - kv 10: tokenizer.ggml.model str = llama
) = 98
brk(0x5616abb35000) = 0x5616abb35000
mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1e8c000
brk(0x5616abb34000) = 0x5616abb34000
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1e4b000
munmap(0x7f38a1e8c000, 135168) = 0
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1dca000
munmap(0x7f38a1e4b000, 266240) = 0
mmap(NULL, 339968, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1e5a000
munmap(0x7f38a1dca000, 528384) = 0
munmap(0x7f38a1e5a000, 339968) = 0
write(2, "llama_model_loader: - kv 11: "..., 133llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
) = 133
brk(0x5616abb73000) = 0x5616abb73000
brk(0x5616abbb3000) = 0x5616abbb3000
brk(0x5616abc33000) = 0x5616abc33000
brk(0x5616abb34000) = 0x5616abb34000
write(2, "llama_model_loader: - kv 12: "..., 133llama_model_loader: - kv 12: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
) = 133
brk(0x5616abb73000) = 0x5616abb73000
write(2, "llama_model_loader: - kv 13: "..., 133llama_model_loader: - kv 13: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
) = 133
write(2, "llama_model_loader: - kv 14: "..., 94llama_model_loader: - kv 14: tokenizer.ggml.bos_token_id u32 = 1
) = 94
write(2, "llama_model_loader: - kv 15: "..., 94llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32 = 2
) = 94
write(2, "llama_model_loader: - kv 16: "..., 94llama_model_loader: - kv 16: tokenizer.ggml.unknown_token_id u32 = 0
) = 94
mmap(NULL, 1282048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38a1d74000
brk(0x5616abba2000) = 0x5616abba2000
brk(0x5616abbc3000) = 0x5616abbc3000
brk(0x5616abbe4000) = 0x5616abbe4000
brk(0x5616abc11000) = 0x5616abc11000
brk(0x5616abc32000) = 0x5616abc32000
brk(0x5616abc53000) = 0x5616abc53000
brk(0x5616abc74000) = 0x5616abc74000
brk(0x5616abc95000) = 0x5616abc95000
brk(0x5616abcf5000) = 0x5616abcf5000
brk(0x5616abd16000) = 0x5616abd16000
brk(0x5616abd37000) = 0x5616abd37000
brk(0x5616abd58000) = 0x5616abd58000
brk(0x5616abd79000) = 0x5616abd79000
write(2, "llm_load_vocab: special tokens d"..., 74llm_load_vocab: special tokens definition check successful ( 259/32000 ).
) = 74
write(2, "llm_load_print_meta: format "..., 57llm_load_print_meta: format = GGUF V3 (latest)
) = 57
write(2, "llm_load_print_meta: arch "..., 46llm_load_print_meta: arch = llama
) = 46
write(2, "llm_load_print_meta: vocab type "..., 44llm_load_print_meta: vocab type = SPM
) = 44
write(2, "llm_load_print_meta: n_vocab "..., 46llm_load_print_meta: n_vocab = 32000
) = 46
write(2, "llm_load_print_meta: n_merges "..., 42llm_load_print_meta: n_merges = 0
) = 42
write(2, "llm_load_print_meta: n_ctx_train"..., 45llm_load_print_meta: n_ctx_train = 4096
) = 45
write(2, "llm_load_print_meta: n_embd "..., 45llm_load_print_meta: n_embd = 4096
) = 45
write(2, "llm_load_print_meta: n_head "..., 43llm_load_print_meta: n_head = 32
) = 43
write(2, "llm_load_print_meta: n_head_kv "..., 43llm_load_print_meta: n_head_kv = 32
) = 43
write(2, "llm_load_print_meta: n_layer "..., 43llm_load_print_meta: n_layer = 32
) = 43
write(2, "llm_load_print_meta: n_rot "..., 44llm_load_print_meta: n_rot = 128
) = 44
write(2, "llm_load_print_meta: n_embd_head"..., 44llm_load_print_meta: n_embd_head_k = 128
) = 44
write(2, "llm_load_print_meta: n_embd_head"..., 44llm_load_print_meta: n_embd_head_v = 128
) = 44
write(2, "llm_load_print_meta: n_gqa "..., 42llm_load_print_meta: n_gqa = 1
) = 42
write(2, "llm_load_print_meta: n_embd_k_gq"..., 45llm_load_print_meta: n_embd_k_gqa = 4096
) = 45
write(2, "llm_load_print_meta: n_embd_v_gq"..., 45llm_load_print_meta: n_embd_v_gqa = 4096
) = 45
write(2, "llm_load_print_meta: f_norm_eps "..., 48llm_load_print_meta: f_norm_eps = 0.0e+00
) = 48
write(2, "llm_load_print_meta: f_norm_rms_"..., 48llm_load_print_meta: f_norm_rms_eps = 1.0e-05
) = 48
write(2, "llm_load_print_meta: f_clamp_kqv"..., 48llm_load_print_meta: f_clamp_kqv = 0.0e+00
) = 48
write(2, "llm_load_print_meta: f_max_alibi"..., 48llm_load_print_meta: f_max_alibi_bias = 0.0e+00
) = 48
write(2, "llm_load_print_meta: n_ff "..., 46llm_load_print_meta: n_ff = 11008
) = 46
write(2, "llm_load_print_meta: n_expert "..., 42llm_load_print_meta: n_expert = 0
) = 42
write(2, "llm_load_print_meta: n_expert_us"..., 42llm_load_print_meta: n_expert_used = 0
) = 42
write(2, "llm_load_print_meta: rope scalin"..., 47llm_load_print_meta: rope scaling = linear
) = 47
write(2, "llm_load_print_meta: freq_base_t"..., 48llm_load_print_meta: freq_base_train = 10000.0
) = 48
write(2, "llm_load_print_meta: freq_scale_"..., 42llm_load_print_meta: freq_scale_train = 1
) = 42
write(2, "llm_load_print_meta: n_yarn_orig"..., 45llm_load_print_meta: n_yarn_orig_ctx = 4096
) = 45
write(2, "llm_load_print_meta: rope_finetu"..., 48llm_load_print_meta: rope_finetuned = unknown
) = 48
write(2, "llm_load_print_meta: model type "..., 43llm_load_print_meta: model type = 7B
) = 43
write(2, "llm_load_print_meta: model ftype"..., 58llm_load_print_meta: model ftype = all F32 (guessed)
) = 58
write(2, "llm_load_print_meta: model param"..., 47llm_load_print_meta: model params = 0.00 B
) = 47
write(2, "llm_load_print_meta: model size "..., 61llm_load_print_meta: model size = 0.00 MiB (-nan BPW)
) = 61
write(2, "llm_load_print_meta: general.nam"..., 49llm_load_print_meta: general.name = LLaMA v2
) = 49
write(2, "llm_load_print_meta: BOS token "..., 48llm_load_print_meta: BOS token = 1 ''
) = 48
write(2, "llm_load_print_meta: EOS token "..., 49llm_load_print_meta: EOS token = 2 '
'
) = 49
write(2, "llm_load_print_meta: UNK token "..., 50llm_load_print_meta: UNK token = 0 ''
) = 50
write(2, "llm_load_print_meta: LF token "..., 52llm_load_print_meta: LF token = 13 '<0x0A>'
) = 52
write(2, "llama_model_load: vocab only - s"..., 48llama_model_load: vocab only - skipping tensors
) = 48
munmap(0x7f38a1ead000, 516096) = 0
close(3) = 0
write(2, "llama_new_context_with_model: n_"..., 47llama_new_context_with_model: n_ctx = 512
) = 47
write(2, "llama_new_context_with_model: fr"..., 51llama_new_context_with_model: freq_base = 10000.0
) = 51
write(2, "llama_new_context_with_model: fr"..., 45llama_new_context_with_model: freq_scale = 1
) = 45
write(1, "main: init model\n", 17main: init model
) = 17
openat(AT_FDCWD, "checkpoint.gguf", O_RDONLY) = -1 ENOENT (No such file or directory)
mmap(NULL, 240193536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3893863000
mmap(NULL, 360292352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f387e0c9000
write(1, "print_params: n_vocab: 32000\n", 29print_params: n_vocab: 32000
) = 29
write(1, "print_params: n_ctx: 128\n", 27print_params: n_ctx: 128
) = 27
write(1, "print_params: n_embd: 256\n", 27print_params: n_embd: 256
) = 27
write(1, "print_params: n_head: 8\n", 25print_params: n_head: 8
) = 25
write(1, "print_params: n_ff: 768\n", 27print_params: n_ff: 768
) = 27
write(1, "print_params: n_layer: 16\n", 26print_params: n_layer: 16
) = 26
write(1, "print_params: n_rot: 32\n", 26print_params: n_rot: 32
) = 26
write(1, "main: total train_iterations 0\n", 31main: total train_iterations 0
) = 31
write(1, "main: seen train_samples 0\n", 31main: seen train_samples 0
) = 31
write(1, "main: seen train_tokens 0\n", 31main: seen train_tokens 0
) = 31
write(1, "main: completed train_epochs 0\n", 31main: completed train_epochs 0
) = 31
write(1, "main: model_size = 240309120 byt"..., 46main: model_size = 240309120 bytes (229.2 MB)
) = 46
write(1, "main: opt_size = 360288480 byte"..., 45main: opt_size = 360288480 bytes (343.6 MB)
) = 45
write(1, "main: opt iter 0\n", 17main: opt iter 0
) = 17
write(1, "main: input_size = 131076128 byt"..., 46main: input_size = 131076128 bytes (125.0 MB)
) = 46
mmap(NULL, 131080192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f38763c7000
mmap(NULL, 15077376, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3875566000
munmap(0x7f3875566000, 15077376) = 0
brk(0x5616acbe4000) = 0x5616acbe4000
write(1, "main: compute_size = 13988103494"..., 60main: compute_size = 139881034948672 bytes (133400952.0 MB)
) = 60
write(1, "main: evaluation order = RIGHT_T"..., 39main: evaluation order = RIGHT_TO_LEFT
) = 39
mmap(NULL, 139881034952704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0xd54f433e6000) = 0x5616acbe4000
mmap(NULL, 139881035083776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f386e3c7000
munmap(0x7f386e3c7000, 29593600) = 0
munmap(0x7f3874000000, 37515264) = 0
mprotect(0x7f3870000000, 135168, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 139881034952704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
futex(0x7f38a9c921e0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "terminate called after throwing "..., 48terminate called after throwing an instance of ') = 48
write(2, "std::bad_alloc", 14std::bad_alloc) = 14
write(2, "'\n", 2'
) = 2
write(2, " what(): ", 11 what(): ) = 11
write(2, "std::bad_alloc", 14std::bad_alloc) = 14
write(2, "\n", 1
) = 1
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 364343
gettid() = 364343
tgkill(364343, 364343, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=364343, si_uid=1000} ---
+++ killed by SIGABRT +++
Aborted

Lacking time at the moment, I haven't traced this well to the c++ origin file - but might this have something to do with the missing final "ggml_allocr_free(alloc)" line that's disappeared? There are many, many references for the ggml_allocr calls in both new and old files - just perhaps one more in the old?

@slaren
Copy link
Collaborator

slaren commented Jan 9, 2024

The issue is probably related to changes in ggml-alloc as explained in #3548 (comment).

If you want a quick workaround until this is fixed, removing the calls to ggml_allocr_free should be enough to make it work. It will leak a small amount of memory, but that shouldn't be an issue.

@murillo128
Copy link

murillo128 commented Jan 12, 2024

yes, commenting out the ggml_allocr_free prevented the issue. Is this the permanent fix or are some changes to ggml_allocr_free expected to land to prevent the leak? I can send a PR for the train-text-from-scratch if needed.

@slaren
Copy link
Collaborator

slaren commented Jan 13, 2024

It is not a permanent fix, the leak would need to be fixed. The correct fix would be to ensure that allocators aren't freed while the tensors allocated with it are still in use, which is the source of the issue.

@jtatman
Copy link

jtatman commented Jan 16, 2024

I've played a bit with this in between previous versions (pre-gguf) and the only (obvious) fallout seems to be that a segfault occurs when it can't detect which tensors are freed or not. The really neat thing about this program is that it usually faults after writing a checkpoint and model to disk, and both of those are usually recoverable for restarting training without losing progress. I should have figured this was a parallel issue. Thank you kindly for the help with this.

bzuzo added a commit to bzuzo/llama.cpp that referenced this issue Jan 19, 2024
ggerganov pushed a commit that referenced this issue Jan 19, 2024
* Fix issue with alloc causing max_compute_size to be calculated

* remove ggml_allocr_free as suggested in issue #4791
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this issue Feb 3, 2024
* Fix issue with alloc causing max_compute_size to be calculated

* remove ggml_allocr_free as suggested in issue ggerganov#4791
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this issue Apr 1, 2024
* Fix issue with alloc causing max_compute_size to be calculated

* remove ggml_allocr_free as suggested in issue ggerganov#4791
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants