Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common : refactor cli arg parsing #7675

Merged
merged 15 commits into from
Jun 4, 2024
Merged

common : refactor cli arg parsing #7675

merged 15 commits into from
Jun 4, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented May 31, 2024

TODO

  • remove params.instruct
  • remove params.chatml
  • params.escape = true by default
  • params.n_ctx = 0 by default
  • merge server params in gpt_params
  • merge retrieval params in gpt_params
  • merge passkey params in gpt_params

@ggerganov ggerganov changed the title common : gpt_params_parse do not print usage common : refactor cli arg parsing May 31, 2024
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 31, 2024
@github-actions github-actions bot added script Script related python python script changes server labels Jun 3, 2024
Copy link
Contributor

github-actions bot commented Jun 4, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 532 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8809.09ms p(95)=20316.63ms fails=, finish reason: stop=475 truncated=57
  • Prompt processing (pp): avg=95.21tk/s p(95)=392.2tk/s
  • Token generation (tg): avg=46.1tk/s p(95)=48.31tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/gpt-params-refactor commit=e87c104dfd5c0710166fb5f7193c4a81128829b2

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 377.42, 377.42, 377.42, 377.42, 377.42, 764.65, 764.65, 764.65, 764.65, 764.65, 655.83, 655.83, 655.83, 655.83, 655.83, 682.86, 682.86, 682.86, 682.86, 682.86, 709.57, 709.57, 709.57, 709.57, 709.57, 731.23, 731.23, 731.23, 731.23, 731.23, 735.81, 735.81, 735.81, 735.81, 735.81, 771.26, 771.26, 771.26, 771.26, 771.26, 769.62, 769.62, 769.62, 769.62, 769.62, 780.73, 780.73, 780.73, 780.73, 780.73, 802.39, 802.39, 802.39, 802.39, 802.39, 849.46, 849.46, 849.46, 849.46, 849.46, 847.19, 847.19, 847.19, 847.19, 847.19, 824.18, 824.18, 824.18, 824.18, 824.18, 828.69, 828.69, 828.69, 828.69, 828.69, 835.16, 835.16, 835.16, 835.16, 835.16, 836.16, 836.16, 836.16, 836.16, 836.16, 840.77, 840.77, 840.77, 840.77, 840.77, 824.38, 824.38, 824.38, 824.38, 824.38, 826.68, 826.68, 826.68, 826.68, 826.68, 833.99, 833.99, 833.99, 833.99, 833.99, 837.34, 837.34, 837.34, 837.34, 837.34, 819.34, 819.34, 819.34, 819.34, 819.34, 817.51, 817.51, 817.51, 817.51, 817.51, 819.14, 819.14, 819.14, 819.14, 819.14, 819.99, 819.99, 819.99, 819.99, 819.99, 833.21, 833.21, 833.21, 833.21, 833.21, 833.49, 833.49, 833.49, 833.49, 833.49, 833.91, 833.91, 833.91, 833.91, 833.91, 835.39, 835.39, 835.39, 835.39, 835.39, 837.78, 837.78, 837.78, 837.78, 837.78, 836.41, 836.41, 836.41, 836.41, 836.41, 840.57, 840.57, 840.57, 840.57, 840.57, 839.15, 839.15, 839.15, 839.15, 839.15, 840.46, 840.46, 840.46, 840.46, 840.46, 843.75, 843.75, 843.75, 843.75, 843.75, 847.7, 847.7, 847.7, 847.7, 847.7, 844.38, 844.38, 844.38, 844.38, 844.38, 844.52, 844.52, 844.52, 844.52, 844.52, 847.15, 847.15, 847.15, 847.15, 847.15, 849.15, 849.15, 849.15, 849.15, 849.15, 848.64, 848.64, 848.64, 848.64, 848.64, 841.88, 841.88, 841.88, 841.88, 841.88, 845.86, 845.86, 845.86, 845.86, 845.86, 846.3, 846.3, 846.3, 846.3, 846.3, 845.68, 845.68, 845.68, 845.68, 845.68, 841.22, 841.22, 841.22, 841.22, 841.22, 845.22, 845.22, 845.22, 845.22, 845.22, 846.55, 846.55, 846.55, 846.55, 846.55, 845.4, 845.4, 845.4, 845.4, 845.4, 843.99, 843.99, 843.99, 843.99, 843.99, 847.86, 847.86, 847.86, 847.86, 847.86, 851.73, 851.73, 851.73, 851.73, 851.73, 856.49, 856.49, 856.49, 856.49, 856.49, 854.74, 854.74, 854.74, 854.74, 854.74, 861.46, 861.46, 861.46, 861.46, 861.46, 863.12, 863.12, 863.12, 863.12, 863.12, 862.61, 862.61, 862.61, 862.61, 862.61, 861.74, 861.74, 861.74, 861.74, 861.74, 862.59, 862.59, 862.59, 862.59, 862.59, 863.79, 863.79, 863.79, 863.79, 863.79]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 36.41, 36.41, 36.41, 36.41, 36.41, 38.9, 38.9, 38.9, 38.9, 38.9, 28.85, 28.85, 28.85, 28.85, 28.85, 28.79, 28.79, 28.79, 28.79, 28.79, 30.59, 30.59, 30.59, 30.59, 30.59, 31.55, 31.55, 31.55, 31.55, 31.55, 32.63, 32.63, 32.63, 32.63, 32.63, 34.11, 34.11, 34.11, 34.11, 34.11, 34.33, 34.33, 34.33, 34.33, 34.33, 34.45, 34.45, 34.45, 34.45, 34.45, 34.42, 34.42, 34.42, 34.42, 34.42, 34.17, 34.17, 34.17, 34.17, 34.17, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 31.79, 31.79, 31.79, 31.79, 31.79, 30.42, 30.42, 30.42, 30.42, 30.42, 30.39, 30.39, 30.39, 30.39, 30.39, 30.68, 30.68, 30.68, 30.68, 30.68, 30.44, 30.44, 30.44, 30.44, 30.44, 30.41, 30.41, 30.41, 30.41, 30.41, 30.37, 30.37, 30.37, 30.37, 30.37, 30.5, 30.5, 30.5, 30.5, 30.5, 30.73, 30.73, 30.73, 30.73, 30.73, 30.53, 30.53, 30.53, 30.53, 30.53, 30.63, 30.63, 30.63, 30.63, 30.63, 30.86, 30.86, 30.86, 30.86, 30.86, 30.77, 30.77, 30.77, 30.77, 30.77, 30.74, 30.74, 30.74, 30.74, 30.74, 30.93, 30.93, 30.93, 30.93, 30.93, 31.05, 31.05, 31.05, 31.05, 31.05, 31.17, 31.17, 31.17, 31.17, 31.17, 31.2, 31.2, 31.2, 31.2, 31.2, 31.28, 31.28, 31.28, 31.28, 31.28, 31.37, 31.37, 31.37, 31.37, 31.37, 31.18, 31.18, 31.18, 31.18, 31.18, 31.09, 31.09, 31.09, 31.09, 31.09, 30.6, 30.6, 30.6, 30.6, 30.6, 30.28, 30.28, 30.28, 30.28, 30.28, 30.48, 30.48, 30.48, 30.48, 30.48, 30.57, 30.57, 30.57, 30.57, 30.57, 30.7, 30.7, 30.7, 30.7, 30.7, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.86, 30.81, 30.81, 30.81, 30.81, 30.81, 30.77, 30.77, 30.77, 30.77, 30.77, 30.43, 30.43, 30.43, 30.43, 30.43, 28.97, 28.97, 28.97, 28.97, 28.97, 28.94, 28.94, 28.94, 28.94, 28.94, 28.97, 28.97, 28.97, 28.97, 28.97, 28.95, 28.95, 28.95, 28.95, 28.95, 28.9, 28.9, 28.9, 28.9, 28.9, 28.97, 28.97, 28.97, 28.97, 28.97, 29.09, 29.09, 29.09, 29.09, 29.09, 29.11, 29.11, 29.11, 29.11, 29.11, 29.02, 29.02, 29.02, 29.02, 29.02, 28.94, 28.94, 28.94, 28.94, 28.94, 28.99, 28.99, 28.99, 28.99, 28.99, 29.13, 29.13, 29.13, 29.13, 29.13, 29.24, 29.24, 29.24, 29.24, 29.24, 29.33, 29.33, 29.33, 29.33, 29.33, 29.39, 29.39, 29.39, 29.39, 29.39]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14, 0.14, 0.14, 0.14, 0.14, 0.45, 0.45, 0.45, 0.45, 0.45, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.16, 0.16, 0.16, 0.16, 0.16, 0.07, 0.07, 0.07, 0.07, 0.07, 0.08, 0.08, 0.08, 0.08, 0.08, 0.18, 0.18, 0.18, 0.18, 0.18, 0.11, 0.11, 0.11, 0.11, 0.11, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.23, 0.23, 0.23, 0.23, 0.23, 0.36, 0.36, 0.36, 0.36, 0.36, 0.26, 0.26, 0.26, 0.26, 0.26, 0.33, 0.33, 0.33, 0.33, 0.33, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.31, 0.31, 0.31, 0.31, 0.31, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.39, 0.39, 0.39, 0.39, 0.39, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.36, 0.36, 0.36, 0.36, 0.36, 0.58, 0.58, 0.58, 0.58, 0.58, 0.59, 0.59, 0.59, 0.59, 0.59, 0.58, 0.58, 0.58, 0.58, 0.58, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.32, 0.32, 0.32, 0.32, 0.32, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 532 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717512233 --> 1717512861
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0]
                    
Loading

@ggerganov ggerganov marked this pull request as ready for review June 4, 2024 10:08
@ggerganov ggerganov merged commit 1442677 into master Jun 4, 2024
75 checks passed
@ggerganov ggerganov deleted the gg/gpt-params-refactor branch June 4, 2024 18:23
ggerganov pushed a commit that referenced this pull request Jun 5, 2024
-ins and --instruct were moved in #7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
@bartowski1182
Copy link
Contributor

bartowski1182 commented Jun 5, 2024

seems this broke some imatrix options, trying to run imatrix with -o or --output-file for an output file yields:

error: unknown argument: -o
usage: ./imatrix [options]

error: unknown argument: --output-file
usage: ./imatrix [options]

though based on the changes you made my guess is that it was never proper, but invalid use of gpt_params_parse made it go through

@ericcurtin
Copy link
Contributor

@ggerganov will --instruct be introduced in future again? That was the feature I found most useful in the main cli binary. Or are there other options that can achieve the same result?

@Green-Sky
Copy link
Collaborator

@ericcurtin look at -i and -if with -r

@ericcurtin
Copy link
Contributor

@Green-Sky -i -if -r don't seem to work if looking to create a basic CLI Ollama/ChatGPT-like assistant... --instruct works great for this...

@wtarreau
Copy link
Contributor

wtarreau commented Jul 6, 2024

I too am quite bothered by the removal of "-ins". Previously it used to be really interactive because you had a shell-like prompt ">" inviting you to type. You could distinguish user inputs from outputs in copy-pastes of the output. Now I cannot find an equivalent. I've tried --interactive-first (which is really much less convenient to type than -ins BTW) but there's no invite anymore. For me this is a significant functional regression which will force me to stick to tag b3086 for a while.

I really don't understand why features are removed. I can understand that it's unintended breakage of course, it always happens to any of us, but if the breakage is intentional I don't understand.

@wtarreau
Copy link
Contributor

wtarreau commented Jul 6, 2024

In addition the commit is huge (3800 lines of patch), it's impossible to analyze, too bad it was merged as a huge one and not in small pieces :-(

@ericcurtin
Copy link
Contributor

I do kinda have the feature forked here:

https://github.com/ericcurtin/podman-llm

You can run like:

podman-llm run granite

But preferably the feature would be upstream in llama.cpp here...

@Green-Sky
Copy link
Collaborator

@ericcurtin and conversation mode does not meet your needs? (personally not using it)

  -cnv,  --conversation           run in conversation mode, does not print special tokens and suffix/prefix
                                  if suffix/prefix are not specified, default chat template will be used
                                  (default: false)

@wtarreau
Copy link
Contributor

wtarreau commented Jul 6, 2024

Thank you @Green-Sky it does the job like before indeed. So in the end it's a breaking change caused by a command line argument name change. At least -ins and -cml should be supported transparently (even emit a deprecation warning saying "no longer supported, use -cnv"). The help is so long now that you have little chance of stumbling over the new syntax when looking for the old one (which I did before coming here).

@ericcurtin
Copy link
Contributor

ericcurtin commented Jul 6, 2024

@Green-Sky it seems to work sort of, get stranger output with it:

$ podman --root /home/curtine/.local/share/podman-llm/storage run --rm -it --security-opt=label=disable -v/home/curtine:/home/curtine -v/tmp:/tmp -v/home/curtine/.cache/huggingface/:/root/.cache/huggingface/ granite llama-main -m /root/granite-3b-code-instruct.Q4_K_M.gguf --log-disable -cnv

Tell me about podman
http://i.imgur.com/68hI9.png

$ podman --root /home/curtine/.local/share/podman-llm/storage run --rm -it --security-opt=label=disable -v/home/curtine:/home/curtine -v/tmp:/tmp -v/home/curtine/.cache/huggingface/:/root/.cache/huggingface/ granite llama-main -m /root/granite-3b-code-instruct.Q4_K_M.gguf --log-disable --instruct

Tell me about podman
Podman is a free, open-source container engine that uses cgroups to control group of Linux containers. It is similar to Docker, but it offers more security features and is designed to be more user-friendly. It also has a simpler command-line interface than Docker.

@Green-Sky
Copy link
Collaborator

Green-Sky commented Jul 7, 2024

instruct was hardcoded to use the alpaca prompt template, while conversation loads it from the model (by default), experiment with different prompt templates or explicitly set

         --in-prefix STRING       string to prefix user inputs with (default: empty)
         --in-suffix STRING       string to suffix after user inputs with (default: empty)
         --chat-template JINJA_TEMPLATE
                                  set custom jinja chat template (default: template taken from model's metadata)
                                  if suffix/prefix are specified, template will be disabled
                                  only commonly used templates are accepted:
                                  https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

you can always use:

--verbose-prompt         print a verbose prompt before generation (default: false)

to debug what you get.

@ericcurtin
Copy link
Contributor

ericcurtin commented Jul 25, 2024

@Green-Sky in commit d94c6e0 --conversation seems to be completely broken:

llama_new_context_with_model: graph splits = 1
main: chat template example: <|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant


system_info: n_threads = 6 / 12 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 |
error: input is empty

and even when I fixed it was no good.

I had written a tool that was daemonless/serverless, and used the main binary like so:

$ ./ramalama run granite
> Tell me about the Jackson 5
5 is the fifth studio album by the Jackson 5, released on August 1, 1995. It was produced by Rick Rubin and features guest appearances from the late DJ DJ, the late MC Hammer, and the late MC Hammer. The album is notable for its experimental production style and its use of samples from other records. The album's sound incorporates elements of house, techno, and hip hop, and it is widely regarded as a pioneering work in the development of these genres. The album's success was largely due to the release of the single "B.I." in 1991, which peaked at number 1 on the Billboard Hot 100. The album has sold over 200 million copies worldwide and is considered a classic in the world of dance and electronic music.
>

There seems to be no way to match this behaviour now, which is quite frustrating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix script Script related server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants