Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running RPA-Python in Docker - some thoughts and caveats I have #140

Closed
ck81 opened this issue May 3, 2020 · 27 comments
Closed

Running RPA-Python in Docker - some thoughts and caveats I have #140

ck81 opened this issue May 3, 2020 · 27 comments
Labels

Comments

@ck81
Copy link

ck81 commented May 3, 2020

Hi @kensoh,

Was wondering if you have tried, or know of any users, running RPA-Python in Docker?

See. I have conducted 2 runs of RPA classes to our Master's students at the National University of Singapore - using TagUI as the RPA tool to illustrate the many key concepts of RPA . In class, there were students that use Windows 10, Mac and Linux. And for both classes, there were always some students that have problems setting up TagUI e.g. missing DLL, missing lib, missing java library, problem with username with space, etc. Some students even have problem setting up Python!

That's why for my 3rd run of the RPA class, I want to set up a self-contained environment using VirtualBox or Docker - with Python and RPA-Python all setup and ready to run. I have no problem setting up VirtualBox. But I'm very new to Docker. Heard many good things about Docker and wanted to give it a try.

I saw that there are 5 TagUI docker image in Docker Hub: https://hub.docker.com/search?q=tagui&type=image

Are you aware of them? Have you tried any of them?

Also, as I'm new to Docker, it seems that when running TagUI in Docker, it's mostly in headless mode. Is that right?

Are you aware of any way to set up a Docker with the standard GUI, i.e. running a Chrome browser inside a Docker with the ability to interactively use the Chrome Developer Tool - just like the way we use VirtualBox?

@kensoh kensoh added the query label May 4, 2020
@kensoh
Copy link
Member

kensoh commented May 4, 2020

Hi @ck81 I'm afraid I have not tried that and have no in depth experience with Docker images.

It looks like this image has the highest download count, you may want to start with that to try -
https://hub.docker.com/r/hmascend/tagui

Some thoughts -

  1. Docker image is usually for Linux, as macOS and Windows OS need valid paid license
  2. visual automation SikuliX needs special setup on Linux to work - see this guide
  3. Docker image most likely can serve use cases for web-apps only, because thick-client desktop apps would most likely need a paid license and won't be scalable this way thru image.
  4. In theory, I believe if the image is a Linux distribution that already has a GUI system with it, it should be possible to VNC into that system to see a desktop environment to run Chrome and do the usual desktop stuffs
  5. In Linux, I think if any browser is installed, it would be Chromium i think. will need to install Chrome separately to use

Overall, I think it's a great idea to conduct your class with a standardised image, so that the environment is already there. and not have to deal with env setup problems (which isn't the focus of your class).

Yes I have conducted a class once last year, and I was surprised that installing Python is an uphill almost impossible tasks for some attendees! Because their company IT policy has certain firewall or app restrictions that make it really hard to install something which we assume is straightforward like Python.

@kensoh kensoh changed the title Running RPA-Python in Docker Running RPA-Python in Docker - some thoughts and caveats May 4, 2020
@kensoh kensoh changed the title Running RPA-Python in Docker - some thoughts and caveats Running RPA-Python in Docker - some thoughts and caveats I have May 4, 2020
@ck81
Copy link
Author

ck81 commented May 8, 2020

Hi @kensoh,

Thanks for the many pointers!

Yes, I intend to use Docker to run only your RPA-Python within Linux, and for web-apps only.

Will give it a try and share with you more later if I can get it to work.

@kensoh kensoh closed this as completed May 25, 2020
@dpnthanh
Copy link

dpnthanh commented Nov 3, 2020

Hi All, i was make a docker images for this rpa project, you can try this in docker hub
Link: https://hub.docker.com/r/nhth199x/rpa-python
I was write a example in Overview page

@Inaldomarinho
Copy link

Hi All, i was make a docker images for this rpa project, you can try this in docker hub
Link: https://hub.docker.com/r/nhth199x/rpa-python
I was write a example in Overview page

Hello @dpnthanh , I need to run rpa-python on a python larger than 3 and wanted to know what steps you used to create the image. If you could give me a direction or help I would be very grateful.

Thanks for listening.

@Nam-T
Copy link

Nam-T commented Mar 27, 2021

Hi @Inaldomarinho , One of my projects needs to use RPA-Python on AWS SAM, my leader and i built a Dockerfile and I pushed its images to DockerHub. It uses python3.8 and chrome.
https://hub.docker.com/repository/docker/namthp99/python3.8-rpa-aws-sam
Hope it will help you!

@jamesmnixon
Copy link

@kensoh or @dpnthanh

I am trying to run RPA Web automation with airflow from within a docker container.

@dpnthanh I saw and used your image, and you were able to make RPA work within a docker container. But I didn't see a docker file that shows what's actually inside of the container. for security purposes, I would like to re-create it. My end goal is to add the necessary dependencies onto an existing image that holds my airflow and other services.

@kensoh Do you have any insight into this, potentially able to showcase what I would need? After installing Chrome on my container, installing and importing RPA it endlessly waits in r.init(). When I interrupt it with 'keyboard interrupt this is the trace: (I added the "headless" and "read" print statements for testing:

image

@jamesmnixon
Copy link

Hi @Inaldomarinho , One of my projects needs to use RPA-Python on AWS SAM, my leader and i built a Dockerfile and I pushed its images to DockerHub. It uses python3.8 and chrome.
https://hub.docker.com/repository/docker/namthp99/python3.8-rpa-aws-sam
Hope it will help you!

@Nam-T

is there any way to share your docker file? it is not included on your image repo

@kensoh
Copy link
Member

kensoh commented May 18, 2021

Hi Guys, nice discussion here on running with Docker and on Linux! Recently, I created a working example using Google Colab, you can check out or make copy of the notebook below to see some of the things done to make it work there.

Google Colab - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

Namely, if using Chromium instead of Chrome, need to change a setting in the TagUI engine.

If running in headless mode (without display and monitor), you can now do it with v1.34 headless option.

For running as root, Chromium/Chrome doesn't allow that for security reasons, so a change in run flag is needed.

Other than above, Ubuntu will require installing PHP because it does not come with PHP. And using computer vision and OCR stuffs will require installing OpenCV and Tesseract - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html

Also, this RPA for Python package is based on a forked version of TagUI open-source RPA tool.
Feel free to join Telegram community group chat to post any questions - https://t.me/rpa_chat

@jamesmnixon
Copy link

@kensoh

Thank you for your amazingly quick reply. I used your notebook and followed your instructions with the exception of having to use 'apt install chromium' instead of 'apt install chromium-browser as I'm using Debian.

The setup and replacing of the strings in the tagui file worked, but when it came to initialize with r.init() it hung and required keyboard interrupt to see where it was stuck:

Do you know what I'm missing or what else I can try. Here is a screenshot of both my terminal showing success in the installs and my notebook:

image

@kensoh
Copy link
Member

kensoh commented May 19, 2021

Hi @jamesmnixon a few ideas to try -

  1. try adding r.debug(True) before r.init() to see if there is any clue from the logs
  2. edit /root/.tagui/src/tagui and search for below line
    $chrome_command --user-data-dir="$TAGUI_DIR/chrome/tagui_user_profile" $chrome_switches $window_size $headless_switch > /dev/null 2>&1 &
    and add below line just before above line. this will print the exact command to run Chrome. Then you try running Chrome manually this way from the terminal to see if any problem happened there that hangs the execution
    echo $chrome_command --user-data-dir="$TAGUI_DIR/chrome/tagui_user_profile" $chrome_switches $window_size $headless_switch
  3. this looks like an issue that would be happening for the upstream TagUI project, there's a weekly Zoom 1-to-1 call every Thursday from 4-5pm SGT (UTC+8), see if you can join to look at it together - Weekly Zoom Q&A - free 1-to-1 call for any questions or blockers aisingapore/TagUI#914

@kensoh
Copy link
Member

kensoh commented May 19, 2021

Adding on, some time back a user has an unknown issue starting Chrome because his company network policy blocks Chrome from serving a local web socket connection. The TagUI engine requires that web socket connection as a backdoor to control Chrome. For him, the issue was fixed by tweaking the network policy or adding some exception.

@kensoh
Copy link
Member

kensoh commented May 20, 2021

Want to update back here that James joined the call and problem resolved -

  • issue with Chromium browser running on the system (some install Chromium snap error) --> switch to Google Chrome
  • issue with python command not found --> change ~/.tagui/src/casperjs/bin/casperjs to point to python3 on his system

@jamesmnixon
Copy link

Want to update back here that James joined the call and problem resolved -

  • issue with Chromium browser running on the system (some install Chromium snap error) --> switch to Google Chrome
  • issue with python command not found --> change ~/.tagui/src/casperjs/bin/casperjs to point to python3 on his system

@kensoh
Thank you for your help on the call. Very insightful. One area that also pertains to running inside a container is dealing with ReCaptcha. I couldn't figure out why my RPA wasn't working in the container, even though others sites did. Until I added a snap after each line. This is what I found. it got stuck on the ReCaptcha page. Do you have any recommended way to get past this:

image

@kensoh
Copy link
Member

kensoh commented May 22, 2021

Hi @jamesmnixon I see, it looks like the website has anti-automation checks when running on the container. I've heard good review before on 2captcha, a very affordable service provider that can automate solving captchas through API. You can see if below is useful - https://2captcha.com/recaptchav2_eng_instruction

Alternatively, I heard that some folks set up Xvbf to create a virtual display to run Chrome or visual automation on their Linux instances. You can also try settting up Xvbf and run Chrome in the normal visible mode through Xvbf and see whether such setup will still prompt for this captcha check. I haven't tried out Xvbf myself, but below gives an idea of what it involves - https://gist.github.com/addyosmani/5336747

@richylyq
Copy link

richylyq commented May 25, 2021

Hi all,
I have been trying to get RPA-Python to run with the docker container that I am building but i am facing issues when the RPA is running. This is currently built with the usage of PyWebIO as the UI and running RPA-Python with Google Chrome and using the headless_mode=True
I've added the r.debug(True) to see what went wrong and i saw the error to be
[RPA][ERROR] - following happens when starting TagUI...
Terminated
/root/.tagui/src/tagui: line 51: pwd: write error: Broken pipe

Anyway i am pretty lost on where to find /root/.tagui/src/tagui in Docker images so will also appreciate if anyone can show me the light 😅

@kensoh
Copy link
Member

kensoh commented May 25, 2021

I haven't heard of this from users, it seems to be some Docker issue affecting different apps - broken pipe error on Docker

The line that triggered error is below in /root/.tagui/src/tagui file.

if [ "$tagui_baseline_mode" == false ]; then set -- "$(cd "$(dirname "$1")"; pwd)/$(basename "$1")" "${@:2}"

You can try setting to below to see if that works. But if there is some root cause related to Docker that running pwd command can trigger errors, there might be a lot more of similar errors that happening before TagUI can run successfully.

if [ "$tagui_baseline_mode" == false ]; then set -- "/root/.tagui/src/tagui" "${@:2}"

@richylyq
Copy link

if [ "$tagui_baseline_mode" == false ]; then set -- "/root/.tagui/src/tagui" "${@:2}"

thanks for the prompt update! i tried the change you mentioned above, and i encountered a fresh new error which i will be trying to solve if possible.

ERROR - for nested conditions, loops, popup, frame, set { and } explicitly
ERROR - add { before this line and add } accordingly - if [ -f "$online_flowname" ]; then if grep -iq "404\|400" "$online_flowname"; then rm "$online_flowname"
ERROR - automation aborted due to above

@kensoh
Copy link
Member

kensoh commented May 26, 2021

Hi @richylyq I'm sorry I made a mistake, the change below should be the full path and filename for the script that you are running. Can you type add this in your Python program import os; print(os.getcwd()) so that you know where is the generated rpa_python file? After finding out the pathname, you can form the full path and file name to replace -

if [ "$tagui_baseline_mode" == false ]; then set -- "/full_path/rpa_python" "${@:2}"

Hopefully that will give more clues what is going on. I suspect it's probably the way the tool is being used isn't doable out of the box because when TagUI run normal shell commands using $() like pwd it gets broken pipe error. It might be OS or environment related issue. But you can first try changing how the package and TagUI works to see if that helps.

@richylyq
Copy link

richylyq commented May 27, 2021

Hi @richylyq I'm sorry I made a mistake, the change below should be the full path and filename for the script that you are running. Can you type add this in your Python program import os; print(os.getcwd()) so that you know where is the generated rpa_python file? After finding out the pathname, you can form the full path and file name to replace -

if [ "$tagui_baseline_mode" == false ]; then set -- "/full_path/rpa_python" "${@:2}"

Hopefully that will give more clues what is going on. I suspect it's probably the way the tool is being used isn't doable out of the box because when TagUI run normal shell commands using $() like pwd it gets broken pipe error. It might be OS or environment related issue. But you can first try changing how the package and TagUI works to see if that helps.

Hi @kensoh i made the switch and the new broken pipe error is shown to be coming from the change current to TAGUI directory
TAGUI_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"; cd "$TAGUI_DIR"
when pwd is used in this context, does it mean it is trying to get the path of where the tagui file is currently held, and since the permission to somehow access the root folder isn't given which leads to the broken pipe error?

or could this be affected by the base image i used for my Docker container if it's an environment related issue.. 🤔

what did the other TagUI users that successfully integrated Docker with TagUI use to build

@kensoh
Copy link
Member

kensoh commented May 27, 2021

File permission issue could be possible. You can chmod -R 777 on the current working directory where rpa_python is generated and the ~/.tagui folder to see if that helps. If you run as root, make sure the package is installed as root. If you run as normal user, try install the package as a normal user.

The $() command for bash scripts if a normal bash syntax, so it might be permission or some environment issue that can cause issues whenever this syntax is used. https://askubuntu.com/questions/833833/what-does-command-do

There are a lot more of $() inside the TagUI launcher script in tagui/src/tagui. If the root cause is not found, I can imagine you having to do a lot of hacking just to prevent that error.

This is an example of the package running on Ubuntu on Google Colab - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

This is a working Docker example with both TagUI and RPA for Python. It's created by @skadefro as an image to provision TagUI instances on his open-source OpenFlow app - https://hub.docker.com/r/openiap/nodered-tagui

@skadefro
Copy link

You should probably link to the Dockerfile too
https://github.com/open-rpa/openflow/blob/master/OpenFlowNodeRED/Dockerfiletagui

@richylyq
Copy link

hey @kensoh @skadefro
thanks for the inputs, i will take a look at the Dockerfile, and pray that it works for my build as well!

@kensoh
Copy link
Member

kensoh commented May 27, 2021

Thanks @skadefro this is very helpful. I forgot where to find this Docker file for your image, now I know.

@lanSeFangZhou
Copy link

can python-rpa run on linux? Do you have a full demo?

@kensoh
Copy link
Member

kensoh commented Jun 30, 2022

Yes, see this working Google Colab example running on Ubuntu Linux - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

@nicotiendamia
Copy link

Hello i tried rpa locally in my computer and it worked correctly. Now i wanna execute it inside a docker container, through airflow's PythonOperator and i'm having serious trouble setting it up, been trying things for many days now!!

The latest thing u've tried is this https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing#scrollTo=kl58MzRLyNgb with the only difference that instead of installing chromium-browser i install chromium, and when doing the dump, i replace "google-chrome" with "chromium"

This is my dump code:

current_dir = os.path.realpath(os.path.dirname(file))

self.robot.dump(
self.robot.load(f'{current_dir}/.tagui/src/tagui').replace('"google-chrome"', '"chromium"').replace('$headless_switch', '--no-sandbox'), f'{current_dir}/.tagui/src/tagui'
)

I have error and debug set to True. This is the output i get:

[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /opt/airflow
[RPA][INFO] - done. syncing TagUI with stable cutting edge version
[RPA][INFO] - TagUI now ready for use in your Python environment
[RPA][INFO] - visual automation (optional) requires special setup on Linux,
[RPA][INFO] - see the link below to install OpenCV and Tesseract libraries
[RPA][INFO] - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html
finished dump
[RPA][ERROR] - following happens when starting TagUI...

The following command is executed to start TagUI -
"/opt/airflow/.tagui/src/tagui" rpa_python chrome

It leads to following output when starting TagUI -
/opt/airflow/.tagui/src/tagui: line 304: type: google-chrome: not found
ERROR - cannot find Chrome command "google-chrome"
update chrome_command setting in tagui/src/tagui and make sure symlink to command is created

Exception initializing RPA:
[RPA][ERROR] - [RPA][ERROR] - unknown error encountered

I'm doing this, since i wanna replicate what i will need to do in production, where i have airflow running in an AWS EC2 instance with Ubuntu

@kensoh
Copy link
Member

kensoh commented Aug 25, 2022

From above log, it looks like for some reason, the file you edited somehow did not get updated to use Chromium.

You can edit this file manually /opt/airflow/.tagui/src/tagui to check where it has "google-chrome" and replace it with "chromium" or whatever command needed to start your Chromium browser. Hopefully that helps! Let me know if it doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

10 participants