* Since each URL from the website must be retrieved only once, it's easier to use a set() to store the list.

Yaroslav Kargin · Yaroslav Kargin · commit 4039d5bb6204 · 2023-11-30T23:15:30.000+03:00
* Added filtering for # in URLs.

Signed-off-by: Yaroslav Kargin &lt;ykargin@outlook.com&gt;
diff --git a/wbot.py b/wbot.py
@@ -42,11 +42,18 @@ def retrieve(drv, domain, url):
         logging.warning(f'could not load links for {url}')
         return
 
+    scopeset=set()
+
     for l in links:
         u = l.get_attribute('href')
         if not u:
             continue
+        elif '#' in u:
+            u=u.split('#')[0]
+        scopeset.add(u)
+
 
+    for u in scopeset:
         # Get the status with requests library and then retrieve the URL again
         # recursively with Selenium driver. We need the double requests for now
         # becauseit's not easy to get response status from selenium.