Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : change yarn_ext_factor placeholder to -1 #3922

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

cebtenzzre
Copy link
Collaborator

@cebtenzzre cebtenzzre commented Nov 3, 2023

I have been unable to reproduce any strange results from this code on gcc 12.3.0 or clang 16.0.6 - cparams.yarn_ext_factor is always correctly set to 0.0f for me.

However, with some compilers and system configurations and the -Ofast compiler option, the default of NaN apparently does not work. I was thinking that a good placeholder should be impossible to confuse for a valid value, but apparently the result of std::isnan cannot be relied upon.

ref: #2268 (comment)

Copy link
Collaborator

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested on windows, seems to work.

@ggerganov ggerganov merged commit 3fdbe6b into ggerganov:master Nov 3, 2023
32 checks passed
@vlapra
Copy link

vlapra commented Nov 3, 2023

With this change it got stuck on mac after prompt entered. When I revert the change, it works.

Prompt Hello, got stuck that wya:
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 200.63 MB
llama_new_context_with_model: max tensor size = 128.17 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 8802.34 MB, ( 8802.97 / 21845.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1600.02 MB, (10402.98 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 194.02 MB, (10597.00 / 21845.34)

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.700
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0

Hello

@ggerganov
Copy link
Owner

@cebtenzzre Not sure if you know that branches in this repo are monitored by https://github.com/ggml-org/ci and will run extra CI on commits that have the string ggml-ci in the commit message:

image

image

These tests are more like integration-level tests and require extra resources, so they run on rented infrastructure instead of the default Github Actions. Ideally, all PRs is good to go through that CI, but it requires collaborators to create the branches in llama.cpp. It's obviously optional to do so, but it would help catch such issues earlier

The CI that runs on these instances is implemented in ci/run.sh and can be ran also locally by developers to make sure everything is good. It downloads some extra data in a temp folder though, but it is one-time download.

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants