Replies: 4 comments 1 reply
-
To further this, chatgpt despite instructions is not good at consuming large files, sometimes it greps them, sometimes it searches, but more often than not it does something dumb like head or random from the file. Using a csv is not reliable whatsoever, and even telling it what code to use to search is unreliable as to if it will. When testing with a pipeline that forces it to search the functionality is superior. The lack of consistency can be forced by limiting what interactions it can have with the data. If a result is too large, no response and forcing it to use more restrictive filters would be more beneficial than pagination. I'll fork and start working on this. |
Beta Was this translation helpful? Give feedback.
-
Just as an update, I cannot make the performance reliable and repeatable unless I use the paid API. I will keep messing with this, but at this stage it requires both a custom API to fetch data in an AI compatible method, and, AI needs to create its own summary of each book for is to search. For a 2000 book library like mine, that costs about US$4 to generate, but the result of the output is fantastic, and worth having even if nothing else works in the short time. |
Beta Was this translation helpful? Give feedback.
-
In my opinion, if we should add such an endpoint, it should be flexible regarding what to fetch. For example, an endpoint where you can specify the selected attributes yourself (I don't think we currently have such an API endpoint) so it can be used for more than just a LLM. However, an endpoint for just an LLM that provide exactly this is far beyond the scope of this project, (in my opinion). |
Beta Was this translation helpful? Give feedback.
-
Reshaping ABS to suit LLMs is looking at it in the wrong direction. From your own description, giving it access to the entire DB, and even spoon-feeding the data via CSV didn't quite give you what you were looking for. IMO returning the same spoon-fed data via HTTP isn't going to result in anything better. You'd be better off focusing on updating your prompt to encourage the LLM to analyze existing data better, and using intermediate prompts to get some idea why your approach isn't working. Basically, if the LLM can't give you what you want when pointed directly at the sqlite database, then the issue lies in the prompt or the LLM itself. Fortunately, there is some variety in available LLMs, and updates are frequent. And you can keep working on better prompting over time. |
Beta Was this translation helpful? Give feedback.
-
Describe the feature/enhancement
I have been experimenting with OpenAIs GPTs to search my library for book suggestions.
While the existing APIs I had some success with, there seems to be unknown arbitrary limits on what GPTs will actually look at, maybe not arbitrary, could be a percentage of token capacity, but the issue remains where the existing responses are too verbose.
To try and better understand how it works I let it access the sqldb direct, about 90MB in my case, and while it seemed to work, it was not finding books it should have. Unfortunately, it was unclear how it was using the db, it did not provide debug.
I then made a simple CSV with just what it should care about.
SELECT books.title as Title, authors.name as Author, books.narrators as Narrators, books.description as Description, books.asin, books.isbn from books join bookAuthors on bookAuthors.bookId == books.Id join authors on bookAuthors.authorId == authors.Id;
Now in testing, the description is a positive and a negative. that makes a 24MB csv, and the GPTs ends up searching for keywords to find books.
If I exclude descriptions, then it must use it's knowledge on books, either internally of searching Bing. This is not a positive because the CSV even with all descriptions removed is still too large for it to directly use (yes, it did work using the new GPT4 turbo API, but I am trying to use the consumer ChatGPT GPTs).
Depending on how it decides to approach the issue, it either only gets a small part of the library, or searches for books it found finding potentially no matches.
Adding in categories was a mixed bag due to it not always using keywords that narrowed the list like it should. It really needs to be able to use categories, but more so as a sort than a filter... maybe?
So, this is what I am proposing (and willing to potentially do myself and upstream), is an API specifically for ChatGPT.
A really dumb search, so any keywords that ChatGPT uses searches Description, Subtitle, Tags, Genre.
A really minimalist response, only Title, Author, Narrators. And an alternative hook it can use which includes Description.
The less in the response, the more it can use. Any unnecessary information only increases the chance it won't be analysed.
A potential additional filter where it can add a username to filter only books the user has watched or not watched would also greatly increase the usefulness. I have also tested that with csv, using only title and author, and has had no issues.
I would also like to have it return book id, so it can provide the url, but I think that would need to be another API, again, due to the need to keep the responses as small as possible.
Discussion?
Beta Was this translation helpful? Give feedback.
All reactions