-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add html parsing and web scraping #86
Conversation
Thanks for the PR for such a cool feature. I'll take a look this morning! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! This fits really well into the existing parser structure.
I tried it with the Wikipedia article on LLMs, and it worked.
Seems like it will fit into Document well enough too; are you thinking of making filename
optional, or maybe renaming it to title
and trying to populate it with the title of the page?
Thanks @nickthecook. I want to work on this a little more, but unfortunately the
|
Yep, that's my fault. I introduced a bug in Setting that keeps creating duplicate settings. I just ran into it myself. Go to the admin ui, then Settings, then delete any settings with nil values. I'll work on a fix. |
I think the root of the problem is that I added validation to Setting so it wouldn't create duplicates, but some duplicates already existed in our databases because of a previous bug.
|
Great. Works now. P.S. There are a couple of tests in the chunker spec that are failing, so I commented them out for now. I don't think that I broke them, but you might want to check. |
Actually don't merge this yet. |
Fixed this. Everthing seems good now. |
Great! I'll have another look, and comment on code organization/UI stuff this time, since it's out of draft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just a few comments.
app/views/documents/_form.html.erb
Outdated
<div class="flex-col"> | ||
|
||
<%= f.file_field :file, class: "w-full p-2 rounded-lg my-2 px-5 py-2 border border-secondary-400 dark:border-secondary-600" %> | ||
|
||
<%= f.url_field "link", class: "w-full p-2 rounded-lg my-2 px-5 py-2 border border-secondary-400 dark:border-secondary-600" %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good in light mode, but in dark mode the text is light but the bg is white.
Can you theme it like the search bar on the same page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be good if there was some hint for the user as to what the link field was. It does say "Add document or URL" above the file upload box, but maybe if you added placeholder text it would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good in dark mode:
<%= f.url_field "link", class: "w-full p-2 rounded-lg my-2 px-5 py-2 border border-secondary-400 dark:border-secondary-600" %> | |
<%= f.url_field "link", | |
placeholder: "Enter a URL", | |
class: "w-full p-2 rounded-lg my-2 px-5 py-2 border border-secondary-400 dark:bg-secondary-925 dark:border-secondary-600" | |
%> |
31a777c
to
17f0550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @mattlindsey !
I think somehow the change you made to set the document title was lost. Moving the fetching logic to a new method is also gone. I know these were there when I reviewed and before you force-pushed! Any chance you can find that in a branch and push it again? |
For issue #68.
Still work in progress, but hoping for feedback. Try importing a page from a site. Need to fix tests.