Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction requests fail / Blocked from Audible (Whoops EGG) #110

Closed
joonaspaakko opened this issue Dec 18, 2022 · 9 comments
Closed

Extraction requests fail / Blocked from Audible (Whoops EGG) #110

joonaspaakko opened this issue Dec 18, 2022 · 9 comments
Labels
bug Something isn't working Priority: High

Comments

@joonaspaakko
Copy link
Owner

joonaspaakko commented Dec 18, 2022

Extraction may seem to freeze and afterwards you'll notice that any page in Audible is blocked with the message Whoops. We're not sure what happened, but something went wrong.:

image

Sometimes the extraction does push through eventually, but even if it does, it will be missing some or a lot of data, because the issue is that for some reason Audible started blocking your access and the extractor can't do its thing. It may also stop the extraction entirely.

If you wait for about 10 minutes you should be able to access audible again. Although it might be pointless to try the extraction again as it's likely to repeat the issue.

I'm still not super clear why this is happening. I used to think they started doing some kind of rate limiting, which may have some part in it, since throttling the extraction by a lot seemed to get rid of it... But now it's back. It always seems to happen when extracting series and I'm beginning to think it's not just because at that point it has made too many rapid requests to the server, but rather there's some specific behavior that has to do with the series that triggers Audible to start blocking it. I did also suspect it has something to do with trying to access pages that can't be accessed over and over, but I'm not sure, because I tried to mitigate that before and the issue is back.

@joonaspaakko joonaspaakko added bug Something isn't working Priority: High labels Dec 18, 2022
@joonaspaakko joonaspaakko changed the title Extraction requests fail ( Bad Request 503) Extraction requests fail / Blocked from Audible Dec 19, 2022
@wizardfish2
Copy link

wizardfish2 commented Apr 10, 2023

I will also add that I'm been seeing this more and more as my Audible library has grown. If I attempt to do a "full" extraction now, it is 100% impossible. It generally takes me 5-10 tries to just get the library. Then another 5-10 tries to add the collections. Then another 5-10 tries to go back and add the ISBN numbers.

I suspect that the Audible firewall is seeing the huge number of page requests as an attack, and locking down the source IP for a limited time.

My library now has over 1,700 purchased books, so it ends up being a huge number of "hits" against the audible site. Your suggestion of lowering the extraction rate would be slower, but it is definitely worth it to avoid tripping the audible firewall lockout. :-)

P.S. -- This widget is awesome. I only found out about it a couple of months ago, and I've been a daily user of it on my phone, to sort the series (next book). It became especially useful after Audible removed that feature from their own player app, for some unknown (and probably unholy) reason. THANK YOU for this extension.

-Keith

@joonaspaakko
Copy link
Owner Author

joonaspaakko commented Apr 10, 2023

@wizardfish2, are you using the current official release 0.2.8 or the v.0.2.9 draft that is in the GitHub repository? I updated it just now. It's still a bit unfinished, but you might have more luck with it.

This issue is actually still a bit of a mystery in terms of what the limits are. Sometimes it feels like throttling the requests doesn't really matter, it blocks you anyways... and other times it goes through without a hitch. The weirdest thing is that when I'm using the Brave browser it always stops at the tail end of my series extraction. But in a plain old Chrome installation using same amount of throttling, it goes through fine.

@joonaspaakko joonaspaakko changed the title Extraction requests fail / Blocked from Audible Extraction requests fail / Blocked from Audible (Whoops EGG) Apr 10, 2023
@wizardfish2
Copy link

wizardfish2 commented Apr 10, 2023

I have/had been using the v.0.2.8 version. I just loaded the v.0.2.9 draft and am running that now. It is definitely getting farther (on a full update) than 0.2.8 could get in a single pass.

Will update progress shortly.

UPDATE: Nice! I did the full update (Lib + Collections + ISBN) in a single pass, with no failures. It took quite a bit longer, at around 40 minutes, but that IS PERFECTLY FINE since it actually worked all the way through.

I also will mention that the new "extension tools" pulldown is very convenient.

Thanks for this -- it's awesome!

@joonaspaakko
Copy link
Owner Author

Wow... 40 minutes is super slow. Was ISBN extraction most of that?

ISBN extraction is always going to be super slow due to rate limiting in the Google API. But yea basically if it gets that far, it's going to be fine because it's done extracting stuff from Audible.

@wizardfish2
Copy link

Yes, ISBN was probably 75% of the time. I'm doing a couple of additional runs (full and incremental), and so far Audible has NOT locked me out for the 10 minute timeout. Did I say "very nice!" yet? Because very nice. :-)

The incremental re-downloads took about 3 minutes for Library + Collections... and then 15 minutes for the ISBN re-check. So yeah, you're certainly right about the ISBN limiting.

Just for informational purposes, I am on a 1 GB WAN feed, which also backs up your statement about Google API being the limiter, rather than network.

@joonaspaakko
Copy link
Owner Author

Good :)

I did an extraction test as well, where I extracted everything but ISBNs:

  • Wishlist with 859 books 2 minutes (Doesn't extract data for series)
  • Library with 1135 books ~6 minutes

That isn't fast by any means, but I guess I can live with that if I have to.

The ISBN handling in the extension could be better though. I've kinda neglected it... If you need to do a full extraction, which happens... You always have to throw away previously extracted ISBNs but that could be avoided.

Just a few ideas for myself:

  1. To avoid the issue where removing library data gets rid of ISBNs it could pick up all ISBNs when the user removes library data and store them separately until the next time ISBNs are extracted

    • ...or maybe they could always be stored separately, which in hindsight would've avoided this issue, but it might take a lot of work to change the data structure. (Though maybe not if the data is stored separately in the extension but then combined when used in the gallery)
  2. Maybe it would be helpful if you could import just the ISBN numbers from a previous raw data export, so if you did delete ISBNs for whatever reason, you could bring them back without having to do an extraction (or at least an extraction that takes super long...).

  3. Right now after you've extracted ISBNs once, the partial extraction tries to find ISBNs for any books that don't have them already...

    • But the thing is, certain books never get any ISBNs, so at that point it's basically doing a whole lot of unnecessary checking without really any results.
    • There is the off chance that a book that didn't get matched previously now after some time does get a match, but I think it would be better to just ignore that possibility and...
    • Instead the ISBN extraction could do the same as library and wishlist, which is to only extract ISBNs for newly added books.

@wizardfish2
Copy link

This is just an opinion from the peanut gallery, but I think your last option is probably the most reasonable. It will work for 99% of the cases (either no new work / ISBN's required or they aren't going to be found on later attempts anyway).

And for the 1% -- those who KNOW an ISBN is available, or know what it is already -- have two options.

  1. Delete the saved data, and force a whole new download, which will also re-scan for ISBN's.
  2. Optionally, export the JSON data, MANUALLY edit/add the ISBN themselves, then re-import the JSON data.

In my opinion, it wouldn't make sense to spend a huge amount of time on something that few will need/want at all, especially if there is already the roundabout way to deal with it.

@joonaspaakko
Copy link
Owner Author

joonaspaakko commented Apr 10, 2024

Reopening because I seem to be getting this again...

Only when doing a full extraction (removing all data and extracting). While updating existing data it seems to pull through just fine.

@joonaspaakko
Copy link
Owner Author

joonaspaakko commented Jan 14, 2025

Added some fixes that would hopefully help with it. I haven't seen this page in a while personally...

  1. Slowing down series extraction more than other parts: a05e5ad
  2. Slight adjustment to the prior change → Round up to an even number (5.5 requests per minute, but 6 requests per minute): 570831d
  3. Excluding broken series request URLs: 4e2086e
  4. Here I decided to speed up the series extraction from step n.1: 6c900fc

If this still happens, try extracting library and wishlist one at a time and / or try the new toggle button between slow and fast extraction. There is a possibility that the slower extraction doesn't trigger it:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority: High
Projects
None yet
Development

No branches or pull requests

2 participants