Langchain integration #111

pseudotensor · 2023-05-04T00:47:10Z

Future PRs:

…nlike HF embedding

…st, unlike others. Possible to make any transformer model an embedder, but uses alot of memory since by default can't change to float16 etc.

…ted answer in end

… chroma db instead of adding duplicate docs, hit index error if try to reuse

pseudotensor · 2023-05-09T21:53:25Z

Using sentence transformer + our 6.9B model, gives consistent answers.

query: Which config.toml enables pytorch for NLP?
answer: The following config.toml file enables PyTorch for NLP:

```
[nlp]
enable_pytorch_nlp_model = true
enable_pytorch_nlp_transformer = true
```

This file can be found in the `config` directory of the DriverlessAI documentation.
sources: ['nlp_settings.rst', 'nlp.rst', 'nlp_settings.rst', 'nlp_settings.rst']

Similar (previous one with sources mentioned in prompt but not actually present anymore) direct prompt in llama does well too:

create_data.py

… fewer docs for rest

pseudotensor · 2023-05-15T23:16:09Z

============================================================================================= short test summary info =============================================================================================
FAILED tests/test_manual_test.py::test_chat_context - NotImplementedError: MANUAL TEST FOR NOW
=================================================================== 1 failed, 13 passed, 7 skipped, 1 xpassed, 9 warnings in 364.21s (0:06:04) ====================================================================
(h2ollm) jon@pseudotensor:~/h2o-llm$

and tested manual test manually.

pseudotensor force-pushed the pdfqa branch from d81ddb4 to bda3537 Compare May 4, 2023 01:04

PDF QA

b24ba89

pseudotensor force-pushed the pdfqa branch from bda3537 to b24ba89 Compare May 4, 2023 01:10

pseudotensor added 25 commits May 3, 2023 19:01

Use GPU for embeddings too if have GPU, and simplify demo part

f58986b

6.7B does even worse, OpenAI embedding does get first part of paper u…

bfb6637

…nlike HF embedding

Trying out various embedders, all-MiniLM-L6-v2 catches authors at lea…

a977f5a

…st, unlike others. Possible to make any transformer model an embedder, but uses alot of memory since by default can't change to float16 etc.

Tried 12layer

651359a

wiki and github examples and add caching

2f2e292

Add github and daidocs test

a8bd019

Add HF test DAI docs

e14ef7f

Refactor out llm a bit

7e418fa

Control embedding vs. model choice

ea81763

Merge branch 'main' into pdfqa

82504c9

Merge branch 'main' into pdfqa

73ed3b1

Merge branch 'main' into pdfqa

c15dd39

Merge branch 'main' into pdfqa

95854ef

Merge branch 'main' into pdfqa

6a6be89

Merge branch 'main' into pdfqa

8f1a4aa

Merge branch 'main' into pdfqa

b2c8b03

Add chroma

0eaddb4

Choose db better

5d8ebd5

Merge branch 'main' into pdfqa

e063a6e

Make test_demo2_hf test use chroma instead of faiss

8a22bed

Deal with chroma langchain-ai/langchain#1946

58a17a8

Allow pandoc way to process dai doc rst files, slower, but more targe…

41ccc66

…ted answer in end

Show question

67fce94

Separate faiss and chroma test. Need to understand when/how to re-use…

e0cee74

… chroma db instead of adding duplicate docs, hit index error if try to reuse

Merge branch 'main' into pdfqa

6fa0677

Use simpler instruct prompt for qa chain if not using OpenAI model

4e0f32b

pseudotensor added 5 commits May 14, 2023 20:48

Make tests work with degenerate case of no dbs etc.

874b1d3

Improved organization, WIP for github link upload

e4d2fa1

Minor case

073b242

Update readme

8cc439a

GitHub import is roadmap item

53e991a

pseudotensor marked this pull request as ready for review May 15, 2023 15:26

pseudotensor requested a review from arnocandel May 15, 2023 15:27

pseudotensor added 4 commits May 15, 2023 08:29

improved readme for db making

2db472d

move langchain unit tests to tests folder in separate file

894c824

Change test name so manual test picked up

670fb19

Assert something

4637531

arnocandel reviewed May 15, 2023

View reviewed changes

create_data.py Show resolved Hide resolved

pseudotensor added 5 commits May 15, 2023 10:40

Fix offline faq for reward model

ee58bed

Add db_dir download test

e23888d

Some notes about isolating imports for LangChain for generate.py

9bb5788

Control which dbs used for HF, all but largest wiki_full

43a5e73

Remove old comments

40780bc

pseudotensor force-pushed the pdfqa branch from b930b77 to 40780bc Compare May 15, 2023 18:36

Give steps for how wiki_full was created

a68b436

pseudotensor force-pushed the pdfqa branch from 110e169 to a68b436 Compare May 15, 2023 19:10

pseudotensor added 2 commits May 15, 2023 12:52

Update readme to match HF db_dirs datacard

b807f1b

Note in readme about downloading existing dbs

dcc503a

pseudotensor requested a review from arnocandel May 15, 2023 21:12

Relax non-wiki_full cut on score so can match more things since often…

dd77aaf

… fewer docs for rest

arnocandel approved these changes May 15, 2023

View reviewed changes

pseudotensor mentioned this pull request May 15, 2023

LangChain task list #134

Open

36 tasks

pseudotensor merged commit b5f7a8b into main May 15, 2023

This was referenced May 16, 2023

ADD UI Chat simple zylon-ai/private-gpt#91

Closed

Added GUI for Using PrivateGPT zylon-ai/private-gpt#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langchain integration #111

Langchain integration #111

pseudotensor commented May 4, 2023 •

edited

Loading

pseudotensor commented May 9, 2023 •

edited

Loading

pseudotensor commented May 15, 2023

Langchain integration #111

Langchain integration #111

Conversation

pseudotensor commented May 4, 2023 • edited Loading

pseudotensor commented May 9, 2023 • edited Loading

pseudotensor commented May 15, 2023

pseudotensor commented May 4, 2023 •

edited

Loading

pseudotensor commented May 9, 2023 •

edited

Loading