-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[moebooru] extract 'notes' #3094
Conversation
Again, thank you for your quick response, but why not return notes in the same format as gelbooru or danbooru?
Width/height/x/y can be taken from CSS attributes, I'd think. <div class="note-box" style="width: 314px; height: 588px; top: 438px; left: 900px;" id="note-box-7095">
<div class="note-corner" id="note-corner-7095"></div>
</div>
<div class="note-body" id="note-body-7095" title="Click to edit">The facts that I love playing games</div> |
help, doing """
how do I extract_iter this '</div>' and then
remove_html from 'title="Click to edit">' to the end (now that the '</div>' is gone after the iter)? |
|
|
<div class="note-box" style="width: 314px; height: 588px; top: 438px; left: 900px;" id="note-box-7095"> |
<div class="note-corner" id="note-corner-7095"></div> |
</div> |
<div class="note-body" id="note-body-7095" title="Click to edit">The facts that I love playing games</div> <-|
""" |
You can fetch the entire note_container = text.extract(page, 'id="note-container"', '<img alt=')[0]
if not note_container:
return
notes = []
for note in note_container.split('class="note-box"')[1:]:
extr = text.extract_from(note)
notes.append({
"width" : int(extr("width:", "p")),
"height": int(extr("height:", "p")),
"y" : int(extr("top:", "p")),
"x" : int(extr("left:", "p")),
"id" : int(extr('id="note-box-', '"')),
"body" : extr('class="note-body', "</div>").partition(">")[2],
}) |
since some sites contain <p> tags - "body" : extr('class="note-body', "</div>").partition(">")[2],
+ "body" : text.remove_html(extr('class="note-body', "</div>").partition(">")[2]), we can also do this, since note-box- and note-body- have the same id - "id" : int(extr('id="note-box-', '"')),
- "body" : extr('class="note-body', "</div>").partition(">")[2],
+ "id" : int(extr('id="note-body-', '"')),
+ "body" : extr(">", "</div>"), ... id="note-box-5225">
<div class="note-body" id="note-body-5225" title="Click to edit"><p>Heheh</p> |
Sure. It seems that only lolibooru uses HTML tags inside its notes, and only
Yeah, that's better. Good catch. |
related to #3093
notes[translation][]
, a simplenotes_translation[]
be enough for thisspace
, but the website uses\n\n