Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a Rally package with all dependencies for offline install #226

Closed
sokoow opened this issue Feb 10, 2017 · 9 comments
Closed

Provide a Rally package with all dependencies for offline install #226

sokoow opened this issue Feb 10, 2017 · 9 comments
Labels
enhancement Improves the status quo :Usability Makes Rally easier to use
Milestone

Comments

@sokoow
Copy link

sokoow commented Feb 10, 2017

Can you work a bit on dependency pullng ? We see that esrally after it's installed, interacts with github and tries to pull from it - it'd be a total show stopper for environments that don't have github access for some reason.

@danielmitterdorfer
Copy link
Member

Can you please elaborate what you mean exactly by "work a bit on dependency pulling"? :) What should Rally do exactly?

after it's installed, interacts with github and tries to pull from it

This is correct. Please see the FAQ item: Do I need an Internet connection?.

Rally uses Github in two places:

  1. It tries to connect to www.github.com to detect whether you have a working internet connection.
  2. It will fetch track data from Github (unless you tell it not to by specifying a different track repository or by taking it offline with --offline).

I think you don't need to worry too much about 1. because it should simply fail in your case and you should see something like:

No Internet connection detected. Automatic download of track data sets etc. is disabled.

For 2., it depends on how you want to use it:

  • If you want to use the default Rally tracks you can simply fetch them somewhere else and place it in ~/.rally/benchmarks/tracks/default.
  • If you don't want to use the default tracks, just define them in your own track repository. There is no need to share it or to push it to a remote server.

@danielmitterdorfer danielmitterdorfer added :Usability Makes Rally easier to use feedback needed An open question blocks progress labels Feb 10, 2017
@sokoow
Copy link
Author

sokoow commented Feb 12, 2017

Hey :) Sorry I wasn't too clear. What I mean is that esrally install process is completely impossible when target machine is offline :)

@danielmitterdorfer
Copy link
Member

I see. I assumed you had Rally already installed, because of:

We see that esrally after it's installed, interacts with github and tries to pull from it

How did you install it originally? I assume, that you did something like:

git clone https://github.com/elastic/rally.git
cd rally
./rally --version

(well, maybe you did the first step on another machine and copied the cloned repo).

So, if I understand you correctly, what would help you is a self-contained Rally package that you can download and then just install on the target machine (which does not have an Internet connection)?

A second question: Rally manages a lot of stuff for you (it's downloading Elasticsearch releases, benchmark data from S3 etc. etc). It's doable that you manage this yourself but it's a bit cumbersome at the moment. It would be great if you could tell me a little bit about your intended use case and then I see what we can do to make this easier for you.

@prajwalkumar83
Copy link

I have the same problem as my environment (like most corporate servers that sit behind a firewall) does not have internet access. I have downloaded esrally and its dependencies from PyPi and installed esrally successfully. How do i run tests in offline mode by downloading the required test files offline? the --offline option does not help as the tracks still need internet access. Any suggestions/workarounds?

@danielmitterdorfer
Copy link
Member

danielmitterdorfer commented Feb 16, 2017

I hear you @prajwalkumar83 and @sokoow.

The only machine that needs access to the Internet is the machine where you invoke Rally (and this should normally be a different machine than the one(s) you want to hit with the benchmark).

So these are actually two issues:

  1. More convenient installation of Rally when no direct Internet access is available. To be honest, I did not consider this yet. I'll look into this but this will take a bit of time.
  2. "Manual" management of tracks. I'd consider this a separate issue to the installation topic and this will be covered in Simplify usage of Rally for offline-only use #231. But to help you right now, see below what you need to do.

The easiest option is to install Rally on a machine that has Internet access, e.g. a developer notebook and then copy the relevant data. For the rest of this comment, I will make the following assumptions:

  • DEV_NOTEBOOK is a machine within your company network but with Internet access. I assume you have Rally already installed and properly configured (i.e. you have a ~/.rally/rally.ini`).
  • LOAD_DRIVER_SERVER is the machine where you will invoke Rally to run the benchmark. This machine is within your company network but it does not have access to the internet. I will also assume that you managed to install Rally on it and it is properly configured (i.e. you have a ~/.rally/rally.ini`).
  • BENCHMARK_TARGET_SERVERS: This is the Elasticsearch cluster consisting of one or more machines that you want to benchmark. You don't need to install Rally on them (this will change with the next release 0.5.0. btw. where you can install Rally on these machines too to get additional help (but it is optional)).

Let's assume you want to run a specific track on LOAD_DRIVER_SERVER, say geopoint. Invoke Rally on DEV_NOTEBOOK as follows:

esrally --track=geopoint --distribution-version=5.2.1

Note: The actual distribution version does not really matter; any version will do. You just need a version to run the benchmark.

After the benchmark has run, you need to copy a few directories from DEV_NOTEBOOK to LOAD_DRIVER_SERVER. Just be sure that you keep the directory structure; otherwise it will not work:

  • ~/.rally/benchmarks/data/geopoint: This directory contains the data that Rally has downloaded from S3 for this benchmark.
  • ~/.rally/benchmarks/tracks/default: In this directory all the tracks are stored. It is a 1:1 clone of https://github.com/elastic/rally-tracks

If you want to run the benchmark on LOAD_DRIVER_SERVER against BENCHMARK_TARGET_SERVERS then that's all you need to do. If you want to run local benchmarks on LOAD_DRIVER_SERVER, you also need to copy ~/.rally/benchmarks/distributions/ so Rally finds the Elasticsearch distribution(s) to use.

Now you just need to add --offline on LOAD_DRIVER_SERVER and Rally should not require an Internet connection.

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo and removed feedback needed An open question blocks progress labels Feb 16, 2017
@danielmitterdorfer danielmitterdorfer added this to the 0.7.0 milestone Feb 16, 2017
@danielmitterdorfer danielmitterdorfer changed the title work on dependency pulling Provide a Rally package with all dependencies for offline install Feb 16, 2017
@danielmitterdorfer
Copy link
Member

@sokoow I changed the ticket title to better reflect the intent.

@jakommo
Copy link

jakommo commented Sep 27, 2017

Here are the steps I used to get this working on CentOS 7:

machine with internet:

# yum install python34-pip.noarch python34-devel.x86_64

# mkdir downloads && cd downloads

# cat > requirements.txt << EOF
MarkupSafe==0.23
urllib3==1.20
certifi==2016.9.26
elasticsearch==5.3.0
Jinja2==2.9.5
jsonschema==2.5.1
psutil==4.1.0
py-cpuinfo==3.2.0
tabulate==0.7.5
thespian==3.8.0
esrally==0.7.2
EOF

# pip3 download -r requirements.txt

# scp -r ../downloads USER@OFFLINEHOST:.

Offline machine

# yum install git gcc python34-pip.noarch python34-devel.x86_64

# cd downloads

# pip3 install -r requirements.txt --no-index --find-links file:////path/to/downloads_dir/

# esrally configure

# mkdir ~/.rally/benchmarks

machine with internet:

# pip3 install esrally

# esrally configure

# esrally --track=logging --distribution-version=5.5.0 (to download one of the tracks. E.g. "logging" in this case)

// cancel once it finished downloading / extracting 

# scp -r .rally/benchmarks/data/ .rally/benchmarks/tracks/ USER@OFFLINEHOST:~/.rally/benchmarks/

Offline machine

# esrally --offline --track=geopoint --target-host ...

@danielmitterdorfer
Copy link
Member

Many thanks @jakommo for creating these step-by-step instructions.

@danielmitterdorfer danielmitterdorfer modified the milestones: Backlog, 0.8.x Oct 27, 2017
@danielmitterdorfer danielmitterdorfer modified the milestones: 0.8.x, 0.7.4 Nov 8, 2017
@danielmitterdorfer
Copy link
Member

With the next release, we will provide a .tar.gz package that contains Rally and all its dependencies + an installation helper script. Users are still required to install all prerequisites (Python et.al.) though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Usability Makes Rally easier to use
Projects
None yet
Development

No branches or pull requests

4 participants