Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Bing search asynchronous #570

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

parradam
Copy link

This PR removes the template tags for images, and replaces them with a JS call once the DOM has loaded. Relates to #563

Just wanted to push at this stage as:

  1. This seems to work some of the time, but not always. I think I've narrowed it down to the Flask route bing_search because it seems like the images sources aren't always passed into the template
  2. Would it be better if I looked at moving all of the parsing logic into the JS? Essentially porting the code below to JS. Maybe that's what you meant :-)
    # dump("searching for " + text + " in " + language.getLgName())
    search = urllib.parse.quote(text)
    params = searchstring.replace("[LUTE]", search)
    params = params.replace("###", search)  # TODO remove_old_###_placeholder: remove
    url = "https://www.bing.com/images/search?" + params
    content = ""
    error_msg = ""
    try:
        with urllib.request.urlopen(url) as s:
            content = s.read().decode("utf-8")
    except urllib.error.URLError as e:
        content = ""
        error_msg = e.reason

    # Sample data returned by bing image search:
    # <img class="mimg vimgld" ... data-src="https:// ...">
    # or
    # <img class="mimg rms_img" ... src="https://tse4.mm.bing ..." >

    def is_search_img(img):
        return not ('src="/' in img) and ("rms_img" in img or "vimgld" in img)

    def build_struct(image):
        src = "missing"
        normalized_source = image.replace("data-src=", "src=")
        m = re.search(r'src="(.*?)"', normalized_source)
        if m:
            src = m.group(1)
        return {"html": image, "src": src}

    raw_images = list(re.findall(r"(<img .*?>)", content, re.I))

    images = [build_struct(i) for i in raw_images if is_search_img(i)]

    # Reduce image load count so we don't kill subpage loading.
    # Also bing seems to throttle images if the count is higher (??).
    images = images[:25]

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 17, 2025

Hey @parradam - thanks for giving this one a shot :-) appreciated.

Re point 2, moving it into JS: I wasn't thinking of that, b/c the python code to parse the returned results and get the real image sources works, and it ought to be relatively easy to still call that method from ajax, but modify it so that it returns just a list of image URLs. I don't think that moving that call and the parsing to JS reduces any complexity, it just moves code and rewrites it. So, I feel that something like this should work:

# ... everything the same in the above, up to
images = images[:25]
return jsonify(images)

and that struct would then be handled by the ajax call from the template. Does that make sense?

Re point 1: "This seems to work some of the time, but not always." -- sometimes I get weird blocky things in my image list, and I haven't bothered to look into it :-)

👋

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 17, 2025

(assigning to you b/c you're still hacking at it, just so I see it on the project board)

@parradam
Copy link
Author

parradam commented Jan 18, 2025

Hey @jzohrab ! Happy to keep hacking away :-) hopefully with more sleep behind me this time!

Thanks for your patience and for your input! So here is my thought process, based on our conversation so far, just to ensure we are on the same page. The bing_search route does two things at the moment:

  1. It collects the image srcs from Bing
  2. Then it renders the imagesearch/index.html template, passing them in
  • To decouple these, I presume we'll need a separate route that could be called something like image_search_page with a path of /search_page/<int:langid>/<string:text>/<string:searchstring>, which accepts the existing URL params and renders imagesearch/index.html, passing in the URL that will be needed to call Bing once the DOM has loaded
  • Then we need to update imagesearch/index.html to implement the AJAX call using the URL passed in
  • Update bing_search to render the jsonified image data as you suggested - everything else in this route remains pretty much the same

On the plus side, it looks like Bing is talking to me again. I did some debugging yesterday on that, and will see if there's anything I can do with the headers (such as User-Agent) to prevent an empty <body></body> being returned.

Hope this is useful, and if it sounds sensible, I'm happy to get this done tomorrow!

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 19, 2025

Hey @parradam - that sounds like what I was thinking too. Thx for chatting about it to clarify -- writing a good spec takes time and sometimes I skip it :-) Cheers!

@parradam parradam force-pushed the make-bing-search-async branch from 0ced891 to 9322479 Compare January 19, 2025 13:16
@parradam
Copy link
Author

Hey @jzohrab , no problem :-)

I've force pushed an updated approach - see what you think! Usual issues with Bing not playing ball, but I was able to experiment returning an object with an src property from Flask.

Any thoughts on testing? I guess we could mock the Bing response and check the image renders, might be a pain though?

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 19, 2025

Will take a look soon, am sick. 🤒

Yeah testing is a big hassle and I didn’t bother :-). During testing there’s a variable set to disable images actually.

Copy link
Collaborator

@jzohrab jzohrab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @parradam - Haven't tried the code but it looks great! One small comment tweak requested. Is this still draft, or do you think it's good to go?

lute/bing/routes.py Outdated Show resolved Hide resolved
@jzohrab jzohrab changed the base branch from master to develop January 22, 2025 16:10
@jzohrab
Copy link
Collaborator

jzohrab commented Jan 22, 2025

(PRs go to the dev branch, master is for launched code)

@parradam
Copy link
Author

parradam commented Jan 24, 2025

@jzohrab Hope you're feeling better!

Appreciate the feedback! Have incorporated your comment, I'd just like to make sure that Bing behaves as expected before I mark it as ready. Should be able to confirm that in the next day or two.

Thanks for fixing the base branch, I realised and rebased it locally but didn't update the PR - oops!

@parradam
Copy link
Author

parradam commented Jan 26, 2025

@jzohrab quick update! I've been experimenting with this and I get a blank page back (again) which is quite annoying:

<!DOCTYPE html><html xml:lang="en" xmlns=http://www.w3.org/1999/xhtml xmlns:Web=http://schemas.live.com/Web><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Bing</title><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /><meta name="viewport" content="width=device-width, initial-scale=1"></head><body></body></html>

I've been messing around with different headers (user agent, referer, accept-language, accept) but no luck so far. I can return a dummy image by subbing in images = [{"src": "url.com/image_location"}] and it behaves as expected. Wonder if I'm blacklisted forever!

Thought I'd raise a flag now before I go too far down this rabbit hole!

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 27, 2025

Thanks @parradam , I’ll try it out today. I’ve had times when the URL went dark, maybe periodic blacklisting. Clears up eventually, so far. Will let you know, thanks!

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 27, 2025

@parradam - it's working fine for me, are you still seeing a blank screen? Looks great!

image

@parradam
Copy link
Author

@jzohrab amazing! Yes, still dark for me. Possibly due to the amount of experimentation I did! Thanks so much for testing it!

I'll just push my commit (from memory it was just the docstring but will check) and then I'll mark it for review. :-)

@parradam parradam force-pushed the make-bing-search-async branch from 9322479 to fa48e7d Compare January 27, 2025 21:12
@parradam
Copy link
Author

I've rebased onto Lute's development branch and force pushed.

Double checked and all tests seem to be passing.

Hope it's ok! :-)

@parradam parradam marked this pull request as ready for review January 27, 2025 21:38
Copy link
Collaborator

@jzohrab jzohrab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll fix the conflicts which I just created now. :-) Cheers!

@jzohrab jzohrab merged commit a32b8c3 into LuteOrg:develop Jan 31, 2025
@jzohrab
Copy link
Collaborator

jzohrab commented Jan 31, 2025

Thanks very much @parradam, super!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants