Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UnicodeDecodeError by returning raw-bytes from main.cpp #5

Merged
merged 1 commit into from
May 4, 2023
Merged

Conversation

r0psteev
Copy link

@r0psteev r0psteev commented May 4, 2023

Tested using the model https://huggingface.co/P01son/ChatLLaMA-zh-7B-int4 in issue 61.

Test 1,

from pyllamacpp.model import Model

model = Model('/home/devel/Downloads/chatllama-ggml-q4_0.bin')
#model.generate("从前,", n_predict=64, new_text_callback=new_text_callback, n_threads=4, verbose=True)
for token in model.generate("从前,", n_predict=512):
    #try:
    #    tok = token.decode('utf-8')
    #except UnicodeDecodeError:
    #    tok = token.decode('utf-8', 'replace')
    print(token, end='', flush=True)
(venv) devel@laptop:~/Desktop/pyllamacpp$ python test1.py 
llama_model_load: loading model from '/home/devel/Downloads/chatllama-ggml-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from '/home/devel/Downloads/chatllama-ggml-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
有一个年���的女���,���非常喜���画画。每天放学后,���会���不及���地打开���记本,开始创作。���是,���的���母并不理解���的���好,认为画画只是一种������时间和���张的行为。于是,这个女������到非常���单和失���。

有一天,���������了一位老������家。那位老人告������,只要用心去创作,画画就不是������时间和���张的行为,而是一种表达自我和传���思想的方式。这位女������到非常���喜,从此开始学���画画技���并���力创作。������于发现,通过画画,可以表达自���的情���和想法,同时也能���更多的人理解和支持���。
基于以上这段文本内容回���:为���么有些人不理解年���女���对画画的������? 

有些人可能只是被社会的期望和标���所������,认为画画只是一种���乐方式而非一种真正的������形式。此外,他们也不了解到创作者的想法和经���,可能因此���这个女���的���好看成是无用的������时间和���料。
(venv) devel@laptop:~/Desktop/pyllamacpp$

Test 2

from pyllamacpp.model import Model

def new_text_callback(text: bytes):
    new_text = text.decode("utf-8", "replace")
    print(new_text, end="", flush=True)

model = Model('/home/devel/Downloads/chatllama-ggml-q4_0.bin')
model.cpp_generate("从前,", n_predict=512, new_text_callback=new_text_callback, n_threads=4)
(venv) devel@laptop:~/Desktop/pyllamacpp$ python test2.py 
llama_model_load: loading model from '/home/devel/Downloads/chatllama-ggml-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from '/home/devel/Downloads/chatllama-ggml-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
llama_generate: seed = 1683162956

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 512, n_keep = 0


 从前,有一个小男���,他非常喜������火���。每当他看到自���的���������或���片书都被���居的������������后,他就会有一种心情���������的时候。 
有一天,小男���在���������里���到了一个������的���������。他想������看,���出���当长度的头部,然后放进自���的口���中。
接下来发生了一件有���的事情:当他向���������起���光的���������时,���������到非常������,从那以后������再也不会在小男���家中���下东西了。
 
从此,小男���可以安心地收藏自���的������和书���,而������则得到了更好的生活。小男���的家长意���到:将不良行为化为正能量可以使������变得更加���足和安���。 
故事告���我们,当我们面对不良行为时,可以通过合理的反应去化解���,并从中获得一些������的效果。
 
基于以上这段文本内容回���:小男���如何通过将不良行为转化成正能量来改���和安���������?
 小男���可以收藏他的������和书���,�������
llama_print_timings:        load time =  3345.55 ms
llama_print_timings:      sample time =   428.46 ms /   512 runs   (    0.84 ms per run)
llama_print_timings: prompt eval time = 96946.18 ms /   262 tokens (  370.02 ms per token)
llama_print_timings:        eval time = 352213.29 ms /   510 runs   (  690.61 ms per run)
llama_print_timings:       total time = 451060.25 ms
(venv) devel@laptop:~/Desktop/pyllamacpp$ 

The responses felt really faster, i guess it is because you use generators to return the tokens.

@abdeladim-s
Copy link
Owner

Yes, I implemented the generator and it seems much better that the previous approach.

Copy link
Owner

@abdeladim-s abdeladim-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is perfect!
Thank you very much, @r0psteev, for the amazing contribution.

@abdeladim-s abdeladim-s merged commit 37b82ef into abdeladim-s:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants