Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nodriver]How to select the frame or click 'Verify you are human' #1943

Open
TonyLiooo opened this issue Jul 10, 2024 · 10 comments
Open

[nodriver]How to select the frame or click 'Verify you are human' #1943

TonyLiooo opened this issue Jul 10, 2024 · 10 comments

Comments

@TonyLiooo
Copy link

url: https://masiro.me
I tried the following code:
checkbox = await page.wait_for(text="cf-chl-widget-", timeout=10) print(checkbox) await checkbox.mouse_move() await checkbox.mouse_click() print(checkbox)
The element information is:
<input type="hidden" name="cf-turnstile-response" id="cf-chl-widget-805o7_response"></input>
However, checkbox.mouse_move() reported the following error:
could not find position for <input type="hidden" name="cf-turnstile-response" id="cf-chl-widget-805o7_response"></input>
also:
iframes = await page.select_all('iframe')
but error:
time ran out while waiting for iframe

@sohaib17
Copy link

That is most probably because the iframe is hidden under a shadow root.

@marioeivissa
Copy link

That is most probably because the iframe is hidden under a shadow root.

great!!!any posibility to pass this shadow root????thanks

@pythonlw
Copy link

image

@pythonlw
Copy link

`
import json
from nodriver import start, cdp, loop
import nodriver as uc

async def switch_to_frame(browser, frame):
"""
change iframe
let iframe = document.querySelector("YOUR_IFRAME_SELECTOR")
let iframe_tab = iframe.contentWindow.document.body;
"""
iframe_tab: uc.Tab = next(
filter(
lambda x: str(x.target.target_id) == str(frame.frame_id), browser.targets
)
)
return iframe_tab

async def main():
browser_args = ['--disable-web-security']
browser = await uc.start(browser_args=browser_args)
tab = await browser.get("http://www.yescaptcha.cn/auth/login")
for _ in range(10):
await tab.scroll_down(50)
await tab
await tab.sleep(15)
# solve Ordinary iframe
# query_selector = await tab.select_all('button[class="widgetLabel moveFromRightLabel-enter-done"]', include_frames=True)
# print('query_selector:', query_selector)
# if len(query_selector) == 1:
# await query_selector[0].click()

# solve  Cross-origin iframe
recaptcha0 = await tab.select('iframe[title="reCAPTCHA"]')
print('recaptcha0:', recaptcha0.frame_id)
# # for tar in browser.targets:
# #     print('target_id:', tar.target.target_id)
iframe_tab = await switch_to_frame(browser, recaptcha0)
print('iframe_tabwebsocket_url:', iframe_tab.websocket_url)
iframe_tab.websocket_url = iframe_tab.websocket_url.replace("iframe", "page")
button = await iframe_tab.select("span#recaptcha-anchor")
await button.click()
input('stop')

if name == "main":
loop().run_until_complete(main())`

@today2004
Copy link

url: https://masiro.me I tried the following code: checkbox = await page.wait_for(text="cf-chl-widget-", timeout=10) print(checkbox) await checkbox.mouse_move() await checkbox.mouse_click() print(checkbox) The element information is: <input type="hidden" name="cf-turnstile-response" id="cf-chl-widget-805o7_response"></input> However, checkbox.mouse_move() reported the following error: could not find position for <input type="hidden" name="cf-turnstile-response" id="cf-chl-widget-805o7_response"></input> also: iframes = await page.select_all('iframe') but error: time ran out while waiting for iframe

May I ask if you have resolved it?

@ultrafunkamsterdam
Copy link
Owner

For many elements a position could not be determined since they are either nested or in a different origin

@TonyLiooo
Copy link
Author

For many elements a position could not be determined since they are either nested or in a different origin

Thank you, I've found a solution and can now determine the element's position.

@ultrafunkamsterdam
Copy link
Owner

For many elements a position could not be determined since they are either nested or in a different origin

Thank you, I've found a solution and can now determine the element's position.

Good you found it, better to share your solution :)

@YogaSakti
Copy link

For many elements a position could not be determined since they are either nested or in a different origin

Thank you, I've found a solution and can now determine the element's position.

Glad to hear that the problem is solved. please share the solution :)

@TonyLiooo
Copy link
Author

For many elements a position could not be determined since they are either nested or in a different origin

Thank you, I've found a solution and can now determine the element's position.

Good you found it, better to share your solution :)

Sorry, I haven't had much time lately. I'll provide the code I wrote two months ago; I just tried it, and it still works.

def xpath_to_css(xpath: str) -> str:
    """
    Convert an XPath expression to a CSS selector.
    
    Args:
        xpath (str): The XPath expression to convert.
    
    Returns:
        str: The equivalent CSS selector.
    """
    
    # Convert XPath axis and node tests to CSS selectors
    css = xpath
    
    # Convert predicate expressions (e.g., [1], [@attr='value']) to CSS attribute selectors
    css = re.sub(r'\[@([^\]]+)=["\']([^"\']+)["\']\]', r'[\1="\2"]', css)
    
    # Convert XPath predicates (e.g., [1]) to nth-child CSS selectors
    css = re.sub(r'\[(\d+)\]', r':nth-child(\1)', css)
    
    # Remove the XPath axis from the beginning of the XPath expression
    css = re.sub(r'^//', '', css)
    
    # Replace double slashes with a single slash (XPath to CSS path)
    css = re.sub(r'//', ' ', css)
    
    # Remove unnecessary leading and trailing spaces
    css = css.strip()

    # Ensure CSS selector is properly formatted
    css = re.sub(r'(\s+)', ' ', css)  # Replace multiple spaces with a single space

    # Clean up any residual syntax errors or unnecessary parts
    css = css.replace('[1]', '')

    # Replace common XPath functions and expressions
    css = re.sub(r'\[contains\(@class,["\']([^"\']+)["\']\)\]', r'.\1', css)
    css = re.sub(r'\[contains\(@id,["\']([^"\']+)["\']\)\]', r'#\1', css)
    css = re.sub(r'\[contains\(@name,["\']([^"\']+)["\']\)\]', r'[name="\1"]', css)
    css = re.sub(r'\[@id=["\']([^"\']+)["\']\]', r'#\1', css)
    css = re.sub(r'\[@class=["\']([^"\']+)["\']\]', r'.\1', css)

    return css

async def switch_to_frame(browser:uc.Browser, frame_id):
    """
    change iframe
    let iframe = document.querySelector("YOUR_IFRAME_SELECTOR")
    let iframe_tab = iframe.contentWindow.document.body;
    """
    iframe_tab: Tab = next(
        filter(
        lambda x: str(x.target.target_id) == str(frame_id), browser.targets
        )
    )
    return iframe_tab

def cdp_generator(method, params):
    cmd_dict = {"method": method, "params": params}
    json = yield cmd_dict
    return json

async def find_and_click_element(tab, selector):
    async def describe_node(_tab, node_id):
        return await _tab.send(cdp_generator("DOM.describeNode", {
            "nodeId": node_id,
            # "depth": -1,
            "pierce": True
        }))
    
    async def find_element_in_node(_tab, node_id, selector):
        result = await _tab.send(cdp_generator("DOM.querySelector", {
            "nodeId": node_id,
            "selector": xpath_to_css(selector)
        }))
        return result

    async def process_node(_tab, node):
        node_id = node['nodeId'] if 'nodeId' in node else None
        if not node_id:
            return _tab, None

        result = await find_element_in_node(_tab, node_id, selector)
        if result and result.get('nodeId'):
            return _tab, result

        if 'shadowRoots' in node:
            for shadow_root in node['shadowRoots']:
                process_tab, result = await process_node(_tab, shadow_root)
                if result and result.get('nodeId'):
                    return process_tab, result

        iframe_result = await find_element_in_node(_tab, node_id, 'iframe')
        if iframe_result and iframe_result.get('nodeId'):
            iframe_node_id = iframe_result['nodeId']
            process_tab, result = await process_iframe(_tab, iframe_node_id)
            if result and result.get('nodeId'):
                return process_tab, result

        if 'children' in node:
            process_tab, result = await process_child(_tab, node)
            if result and result.get('nodeId'):
                return process_tab, result
        
        return _tab, None

    async def process_iframe(_tab, node_id):
        iframe_response = await describe_node(_tab, node_id)
        frame_id = iframe_response['node']['frameId']
        iframe_tab = await switch_to_frame(_tab.browser, frame_id)
        
        if iframe_tab:
            iframe_document = await iframe_tab.send(cdp_generator("DOM.getDocument", {"depth": -1, "pierce": True}))
        process_tab, result = await process_node(iframe_tab, iframe_document['root'])
        if result and result.get('nodeId'):
            return process_tab, result
        return _tab, None

    async def process_child(_tab, node):
        if 'children' in node:
            for child in node.get('children'):
                if 'shadowRoots' in child:
                    for shadow_root in child['shadowRoots']:
                        process_tab, result = await process_node(_tab, shadow_root)
                        if result and result.get('nodeId'):
                            return process_tab, result
                if 'children' in child:
                    process_tab, result = await process_child(_tab, child)
                    if result and result.get('nodeId'):
                        return process_tab, result
        return _tab, None
    
    document = await tab.send(cdp_generator("DOM.getDocument", {"depth": -1, "pierce": True}))
    process_tab, result = await process_node(tab, document['root'])

    if result is None or not result.get('nodeId'):
        raise Exception(f"Element with selector '{selector}' not found.")

    node_id = result['nodeId']
    box_model = await process_tab.send(cdp_generator('DOM.getBoxModel', {
        'nodeId': node_id
    }))
    
    if 'model' in box_model and 'content' in box_model['model']:
        content = box_model['model']['content']
        x_min, y_min = content[0], content[1]
        x_max, y_max = content[4], content[5]
        x_center = (x_min + x_max) / 2
        y_center = (y_min + y_max) / 2

        await process_tab.send(cdp_generator('DOM.scrollIntoViewIfNeeded', {
            'nodeId': node_id
        }))

        await tab.send(cdp_generator('Input.dispatchMouseEvent', {
            'type': 'mouseMoved',
            'x': x_center,
            'y': y_center,
            'button': 'none'
        }))

        await process_tab.send(cdp_generator('Input.dispatchMouseEvent', {
            'type': 'mousePressed',
            'x': x_center,
            'y': y_center,
            'button': 'left',
            'clickCount': 1
        }))
        await process_tab.send(cdp_generator('Input.dispatchMouseEvent', {
            'type': 'mouseReleased',
            'x': x_center,
            'y': y_center,
            'button': 'left',
            'clickCount': 1
        }))
    else:
        raise Exception(f"Failed to get box model for element with selector '{selector}'.")

You can use the following code to click 'Verify you are human'.
await find_and_click_element(page, 'input[type="checkbox"]')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants