-
Notifications
You must be signed in to change notification settings - Fork 277
Troubleshooting
We all love AppScale, but like all software, it once in a while has problems. This post outlines what to do when you run into a problem with AppScale, how to debug it, and how to fix it. Of course, you can always ask us for help on IRC (#appscale on freenode.net). Let's start off with some common problems we've seen people run into, how to get past those, and then look at what to do when the going gets tough.
If you ran "appscale up" to start AppScale and it didn't start, it could have failed for any of the following reasons:
- (VirtualBox) AppScale hung at "Please wait for AppScale to start your machines."
- (EC2) You're using Spot Instances but AppScale is hung at "Waiting for machines to become available."
- (Eucalyptus) AppScale hung at "Waiting for machines to become available."
Let's look at each of these individually.
When running AppScale on VirtualBox, we've seen problems when VirtualBox 4.1.X is used. Specifically, the AppScale Tools will start up the AppController on port 17443 and then hang at "Please wait for AppScale to start your machines." In this case, the AppScale Tools are waiting for port 17443 to open on the VM, but can't actually reach the VM, which has that port open. Upgrade to VirtualBox 4.2 or newer and that should fix the problem.
If you're using Spot Instances (you've set "use_spot_instances : True" in your AppScalefile), there is a possibility that Amazon won't have any spare machines available at the price and instance type you requested. Typically it takes us about 5 minutes to get a Spot Instance, so if it takes you substantially longer than that (say, 10 minutes), then you can log into the AWS Dashboard, click on EC2, and then click on Spot Instances. There, you can see why your machines aren't available. You can cancel your Spot Instance Request and try again with a higher price or a different instance type, depending on the message the dashboard reports.
When running on Eucalyptus, if there are no virtual machines available, AppScale won't be able to start up. For example, if you tell AppScale to run over 8 machines, and you only have 6 available, then that won't work! In this case, you'll see a message from the tools saying "Spawning 7 virtual machines" (since we spawn one machine and delegate the responsibility of starting up the other 7 to it), and the tools will eventually crash, since the AppController won't be able to get the remaining 7 machines. In this case, the solution is simple - make sure you have enough virtual machines available before you start AppScale! In Eucalyptus, an administrator can find out how many virtual machines are free by running "euca-describe-availability-zones verbose".
If, for some reason, running "appscale down" isn't able to terminate your AppScale deployment, you can bring your VMs back to a pristine state by logging into each of your VMs and running:
ruby /root/appscale/AppController/terminate.rb
This script forcefully kills all of the AppScale-related processes, and you can ignore the output it produces. You can double-check that AppScale has been stopped by verifying that there are no Python, Java, or Ruby processes running (by running "ps ax | grep python"). Since it doesn't clean up local state wherever you ran the AppScale Tools from, they may complain that you need to set the "force" flag to continue. You can do this by setting "force : True" in your AppScalefile.
So you've ran into a problem we don't normally run into - how do you find out what's going on? For this case, we have a special command you can run. On the machine that you've got the AppScale tools installed on, run "appscale logs ~/Desktop/baz" and this will copy over all of the logs from each machine in your AppScale deployment to ~/Desktop/baz (of course, change that path if you want your logs copied somewhere else). If this doesn't work for some reason, you can always use "scp" to copy over the contents of the "/var/log/appscale" directory on each machine.
Logs you will find interesting include:
- controller-17443.log: The most interesting log! This log belongs to the AppController, our provisioning daemon. Since it sets up every other service in AppScale, this log can throw exceptions if Cassandra couldn't be started, if the autoscaling algorithm ran into problems, and so on. This is the first place you want to look in if you're having problems with AppScale. You'll find one of these on each machine in an AppScale deployment, since this service runs on all machines.
- app___*.log: These logs correspond to Google App Engine apps that AppScale is hosting. You'll want to check these out if you're running into problems with your App Engine apps, like if you want to include special libraries that App Engine doesn't normally support or are debugging your application at high load. You'll find one of these for each App Server process that runs on each machine running the "App Engine" role (see which machines are running this service by running "appscale status").