Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues when trying to run chainspace on ec2 #47

Open
GiannaWeb opened this issue Jun 18, 2020 · 18 comments
Open

issues when trying to run chainspace on ec2 #47

GiannaWeb opened this issue Jun 18, 2020 · 18 comments

Comments

@GiannaWeb
Copy link

Hi, I was trying to run tester.py clientlatency and all the ec2 were set properly
but somehow it went wrong:

inputs: ()
reference_inputs: ()
POST http://127.0.0.1:5000/api/1.0/transaction/process HTTP/1.1
{"transaction": {"inputIDs": [], "methodID": "init", "parameters": [], "outputs": ["o"], "returns": [], "dependencies": [], "referenceInputIDs": [], "contractID": "simulator"}, "store": {}}
Traceback (most recent call last):
File "tester.py", line 356, in
print t.measure_client_latency(min_batch, max_batch, batch_step, runs)
File "tester.py", line 83, in measure_client_latency
dumper.simulation_batched(num_transactions, inputs_per_tx=1, batch_size=batch_size, batch_sleep=1)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/chainspacemeasurements/dumper.py", line 106, in simulation_batched
process(init_tx)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/chainspacemeasurements/dumper.py", line 22, in process
client.process_transaction(transaction)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/chainspaceapi/api.py", line 17, in process_transaction
r = requests.post(endpoint, json=transaction)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/venv/lib/python2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/venv/lib/python2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/venv/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/venv/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/Users/GiGi/PycharmProjects/chainspace-prototype/venv/lib/python2.7/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/1.0/transaction/process (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x106654550>: Failed to establish a new connection: [Errno 61] Connection refused',))

Do you know how to fix this?

@musalbas
Copy link
Collaborator

You have to run runclientservice.sh on the node that you're running the measurement script from, after you run config_core and config_me in instances.py and after setting up all the instances with install_deps and install_core

@GiannaWeb
Copy link
Author

@musalbas Hi, thanks for the quick response. I did run runclientservice.sh after setting up all the instances but it still gave me the error above.
So I decided to run runclientservice.sh independently, below is what I got:

Reading config from [/Users/GiGi/PycharmProjects/chainspace-prototype/chainspacecore/ChainSpaceClientConfig/config.txt]
[s0n0] 1592627930937 [thread-1] initializeShardClients: Shard 0 Config /Users/GiGi/PycharmProjects/chainspace-prototype/chainspacecore/ChainSpaceClientConfig/shards/s0
Connecting to replica 0 at /172.31.43.152:3001
Impossible to connect to 0
Connecting to replica 1 at /172.31.34.209:3001
Impossible to connect to 1
Connecting to replica 2 at /172.31.42.230:3001
re-connecting to replica 0 at /172.31.43.152:3001
re-connecting to replica 1 at /172.31.34.209:3001
Impossible to connect to 2
Connecting to replica 3 at /172.31.42.141:3001
re-connecting to replica 2 at /172.31.42.230:3001
Impossible to connect to 3
[s0n0] 1592627971493 [thread-1] initializeShardClients: NEW port of client 0 in shard 0 is 3001
[s0n0] 1592627971494 [thread-1] initializeShardClients: Created new client proxy ID 479208866 for shard 0 with config /Users/GiGi/PycharmProjects/chainspace-prototype/chainspacecore/ChainSpaceClientConfig/shards/s0
[s0n0] 1592627971494 [thread-1] initializeShardClients: The view of client 479208866 for shard 0 is: ID:0; F:1; Processes:0(/172.31.43.152:3001),1(/172.31.34.209:3001),2(/172.31.42.230:3001),3(/172.31.42.141:3001),

...
...(basically showing repeated msg )

and

Starting Chainspace...

re-connecting to replica 0 at /172.31.46.7:3001

Chainspace Client API service is running @ http://192.168.43.98:5000/api/1.0/

Reading from local database @ ../chainspacecore-0-0/database

...

java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:298)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:431)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at spark.embeddedserver.jetty.EmbeddedJettyServer.ignite(EmbeddedJettyServer.java:130)
at spark.Service.lambda$init$2(Service.java:504)
at java.lang.Thread.run(Thread.java:748)

Could you tell me how to fix this issue and lanuch http://127.0.0.1:5000/api/1.0/ ?

My opinion is that maybe the problem is http://127.0.0.1:5000/api/1.0/ can't set up properly so in the tester.py it raises the error:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/1.0/transaction/process (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x106654550>: Failed to establish a new connection: [Errno 61] Connection refused',))

------edited------
I changed the time.sleep to 200 after calling start_client()
and then it showed:

inputs: ()
reference_inputs: ()
POST http://127.0.0.1:5000/api/1.0/transaction/process HTTP/1.1
{"transaction": {"inputIDs": [], "methodID": "init", "parameters": [], "outputs": ["o"], "returns": [], "dependencies": [], "referenceInputIDs": [], "contractID": "simulator"}, "store": {}}
HTTP/1.1 502 Bad Gateway
{u'outcome': u'SUBMIT_T_SYSTEM_ERROR', u'success': u'False'}
inputs: ('o',)
reference_inputs: ()
POTENTIAL ERROR: 'create' method has no checker.
POST http://127.0.0.1:5000/api/1.0/transaction/process HTTP/1.1
{"transaction": {"inputIDs": ["74d46976804fb441e20d6a1f6d041de7ee352eba6a5681c2d2a558e7b70c373e"], "methodID": "create", "parameters": ["600", "963735"], "outputs": ["o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o", "o"], "returns": [], "dependencies": [], "referenceInputIDs": [], "contractID": "simulator"}, "store": {"74d46976804fb441e20d6a1f6d041de7ee352eba6a5681c2d2a558e7b70c373e": "o"}}

and it stucked.

@wlo1999
Copy link

wlo1999 commented Jun 29, 2020

Hey @musalbas @GiannaWeb
I also faced the same issue when running runclientservice.sh, it always shows that Connecting to replica 0 at /ec2.private_ip 3001 and impossible to connect to 0 and same for other replicas.
Have you guys solved the problem yet? or is there something that I should setup on AWS to make it works?

@musalbas
Copy link
Collaborator

Did you boot up all the shards with start_core? You'll need to config the shards first with config_core and config_me. I suggest having a look at the functions in instances.py to see the commands that can be executed.

@musalbas
Copy link
Collaborator

musalbas commented Jun 29, 2020

Here's the steps that you need to do setup the instances:

1.Get a new AWS network instance with n = ChainspaceNetwork(0)
2. Launch a bunch of instances with n.launch(96, 'your ssh key name') to launch 96 instances
3. Install all the dependencies with n.install_deps()
4. Install core on all nodes with n.install_core()
5. Config core with n.config_core(4, 4) for 4 shards with 4 nodes each, for example
6. Start core with n.start_core()
7. Config your own client with n.config_me(). This assumes you have the chainspace repo in /home/admin/chainspace/

You should then be able to run tests.

I would also recommend running byzcuit, a version of chainspace with our upgraded cross-shard tx protocol. The instructions are the same. https://github.com/sheharbano/byzcuit

@wlo1999
Copy link

wlo1999 commented Jun 29, 2020

wow @musalbas that is a extremely prompt response
The steps you mentioned:
#1 check
#2 check ( I launched 8 instances )
#3 check
#4 check
#5 check ( I set n.config_core(2,4) which is 8 nodes in total)
I would like to ask a question here: In your instructions above, you launch 96 instances and set the config_core(4,4) which
is only 16 nodes in total. Shouldn't it be the same number as the instances you launch? say config_core(4,24) which
equals 96 or did I misunderstand it?

#6 check
#7 check I set n.config_me(my/path/chainspace-prototype/chainspacecore/ChainSpaceClientConfig). Is this setting right?

I'd love to try byzcuit after running this repo successfully. Thanks for the recommendation.

@musalbas
Copy link
Collaborator

I would like to ask a question here: In your instructions above, you launch 96 instances and set the config_core(4,4) which is only 16 nodes in total. Shouldn't it be the same number as the instances you launch? say config_core(4,24) which equals 96 or did I misunderstand it?

Yes, those parameters are just examples.

If you followed these steps and you're still unable to connect to the nodes, then it could be because you need to configure the security/firewall settings on your AWS instances to allow for incoming connections, I believe by default they're set to block all incoming connections except for port 22.

@wlo1999
Copy link

wlo1999 commented Jun 29, 2020

Hi @musalbas , I set "All TCP" in AWS security groups with the port range 0-65535 for both inbound and outbound rules. However, it seems like I got the same results from the @GiannaWeb's comment above:

POST http://127.0.0.1:5000/api/1.0/transaction/process HTTP/1.1
{"transaction": {"inputIDs": [], "methodID": "init", "parameters": [], "outputs": ["o"], "returns": [], "dependencies": [], "referenceInputIDs": [], "contractID": "simulator"}, "store": {}}
HTTP/1.1 502 Bad Gateway
{u'outcome': u'SUBMIT_T_SYSTEM_ERROR', u'success': u'False'}
inputs: ('o',)
reference_inputs: ()
POTENTIAL ERROR: 'create' method has no checker.

May I ask how do you configure the Security Groups? Don't know whether I set those rules correctly

@wlo1999
Copy link

wlo1999 commented Jun 30, 2020

@musalbas
I was running tester.py on my own computer and it didn't work so I figured that maybe clients also need to run on ec2? If so, how many clients should I launch?

@musalbas
Copy link
Collaborator

musalbas commented Jun 30, 2020 via email

@wlo1999
Copy link

wlo1999 commented Jul 1, 2020

@musalbas Thanks for your quick reply. I moved to run byzcuit last night.
However, it still went wrong when trying to r = requests.get(127.0.0.1:5000/api/1.0/load_objects_from_file)

requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/1.0/load_objects_from_file (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fd6384a3050>: Failed to establish a new connection: [Errno 111] Connection refused',))

This is how I deploy:
8 clients nodes and 8 shard nodes and then I ssh into one client and run tester.py -> Is this how you run tester.py on ec2?

I couldn't find out how to fix this issue, sorry to bother you again

@wlo1999
Copy link

wlo1999 commented Jul 2, 2020

Do I have to change the hostsconfig in ChainSpaceClientConfig/shards on the instance I am going to run tester.py? cuz it lauch at 127.0.0.1:5000/api/1.0 so the other instances can't reach it, right?
I'm sorry if I ask the dumb question, I'm a beginner to all of these smh.
I am really looking forward to hearing from you. Thanks

@musalbas
Copy link
Collaborator

musalbas commented Jul 3, 2020

Did you run the the client service on the EC2 instance and run tester.py on the same instance?

@wlo1999
Copy link

wlo1999 commented Jul 3, 2020

@musalbas Thanks for the response
Yes I did run client service and tester.py on the same instance.
I launched a total of 16 nodes( 8 shard nodes & 8 client nodes ) in the same zone, after that I SSH into one of the 8 client nodes to run "tester.py clientlatency 2 2 5 20 200 20 1 outfile"

For the AWS ec2 security groups settings:
I set inbound rule: "TCP" "all port" "from everywhere"

However it always shows me the error:

[52.31.179.192] Executing command: python -c 'from chainspaceapi import ChainspaceClient; client = ChainspaceClient(); client.load_objects_from_file()'
[52.31.179.192] Traceback (most recent call last):
[52.31.179.192] File "", line 1, in
[52.31.179.192] File "/usr/local/lib/python2.7/dist-packages/chainspaceapi/api.py", line 19, in load_objects_from_file
[52.31.179.192] r = requests.get(endpoint)
[52.31.179.192] File "/usr/lib/python2.7/dist-packages/requests/api.py", line 70, in get
[52.31.179.192] return request('get', url, params=params, **kwargs)
[52.31.179.192] File "/usr/lib/python2.7/dist-packages/requests/api.py", line 56, in request
[52.31.179.192] return session.request(method=method, url=url, **kwargs)
[52.31.179.192] File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
[52.31.179.192] resp = self.send(prep, **send_kwargs)
[52.31.179.192] File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
[52.31.179.192] r = adapter.send(request, **kwargs)
[52.31.179.192] File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send
[52.31.179.192] raise ConnectionError(e, request=request)
[52.31.179.192] requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/1.0/load_objects_from_file (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f754a0c2890>: Failed to establish a new connection: [Errno 111] Connection refused',))

I literally can't wrap my mind around this problem.
I have no idea where's the part that went wrong

@wlo1999
Copy link

wlo1999 commented Jul 5, 2020

@musalbas I found out that when I run runclientservice on my mac, it successfully launched the service at 0.0.0.0:5000 but when I ran it in ec2 instances, it also showed the msg:

Starting Chainspace...

Node service is running on port 5000


Jul 05, 2020 3:52:30 PM org.eclipse.jetty.util.log.Log initialized
INFO: Logging initialized @50325ms to org.eclipse.jetty.util.log.Slf4jLog
Jul 05, 2020 3:52:30 PM spark.embeddedserver.jetty.EmbeddedJettyServer ignite
INFO: == Spark has ignited ...
Jul 05, 2020 3:52:30 PM spark.embeddedserver.jetty.EmbeddedJettyServer ignite
INFO: >> Listening on 0.0.0.0:5000

which is exactly the same as the results on my mac, however when I typed the command netstat -an | grep :5000 in one of the client nodes of ec2 to check if the service was properly launched, it didn't show nothing. Nothing is listening on port 5000. Does it mean that the service doesn't launch successfully?

Any ideas on what could possibly cause this issue?

@musalbas
Copy link
Collaborator

musalbas commented Jul 9, 2020

Hmm, what Linux distro did you run the client on? I ran it on Debian 8 Jessie, if I remember correctly.

@wlo1999
Copy link

wlo1999 commented Jul 11, 2020

@musalbas I'm using Debian 9 Stretch.
I might found the solution from the comment above, seems like screen is not working on my instances so I used nohup instead. e.g. nohup ./runclientservice.sh &>nohup_client.out &


requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/1.0/load_objects_from_file (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f754a0c2890>: Failed to establish a new connection: [Errno 111] Connection refused',))

This error never shows up again. However, when I checked the output file from the "tester.py clientlatency 2 2 5 20 200 20 1 outfile", it showed nothing. Therefore, I went to check the running log, it shows:


[s0n0] 1594371649081 [thread-19] SUBMIT_T (DRIVER)Target shards for transaction ID 361are as follows:
1;
(20/07/10 09:00:49 - qtp1833072180-19) Asynchronously sending request to [0, 1, 2, 3]
(20/07/10 09:00:49 - qtp1833072180-19) Storing request context for 2
(20/07/10 09:00:49 - qtp1833072180-19) Sending request from 1483956198 with sequence number 2 to [0, 1, 2, 3]
(20/07/10 09:00:49 - qtp1833072180-19) Channel to 3 is not connected
(20/07/10 09:00:49 - qtp1833072180-19) Channel to 2 is not connected
(20/07/10 09:00:49 - qtp1833072180-19) Channel to 1 is not connected
(20/07/10 09:00:49 - qtp1833072180-19) Channel to 0 is not connected
[s0n0] 1594371649083 [thread-19] PREPARE_T (DRIVER)Transaction ID 361 experienced Exception Impossible to connect to servers!
1594371649083 [thread-19] sendTransactionsFromFile: Read this line from thefile: 360 710;712 1000001753;1000001755;1000001757;1000001759;1000001761
1594371649083 [thread-19] sendTransactionsFromFile: Transaction ID is: 360
1594371649083 [thread-19] sendTransactionsFromFile: Input is: : 710
1594371649083 [thread-19] sendTransactionsFromFile: Input is: : 712
1594371649083 [thread-19] sendTransactionsFromFile: Output is: : 1000001753
1594371649083 [thread-19] sendTransactionsFromFile: Output is: : 1000001755
1594371649083 [thread-19] sendTransactionsFromFile: Output is: : 1000001757
1594371649083 [thread-19] sendTransactionsFromFile: Output is: : 1000001759
1594371649083 [thread-19] sendTransactionsFromFile: Output is: : 1000001761
Object 710 mapped to shard 0
Object 712 mapped to shard 0

Seems like the client can't connect to shard nodes? because it says channel 3,2,1,0 is not connected
I can ping the shard nodes from the client nodes
Do you have any idea on how to correct this error?

@musalbas
Copy link
Collaborator

musalbas commented Jul 13, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants