Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabCA not starting properly #60

Closed
radicalrabbit opened this issue Oct 2, 2022 · 14 comments
Closed

LabCA not starting properly #60

radicalrabbit opened this issue Oct 2, 2022 · 14 comments

Comments

@radicalrabbit
Copy link

radicalrabbit commented Oct 2, 2022

Hello @hakwerk,

first of all: Thank you for providing this amazing project and thank you for working on it. I would be very happy to get some help with the following issue. In my case the installation of LabCA finished and all containers are up and running. I was able to set username, password and create the root certificates.
Nevertheless LabCA is trapped in some kind of endless loop saying "Restart - Almost there. Now we will request a certificate for this website and restart one more time...".

1

I checked all the logs as you suggested in https://github.com/hakwerk/labca#troubleshooting. The acme_tiny.log is missing somehow:

2

Boulder control is continously trying to fetch some packages:

3

And boulder is complaining about an error starting a publisher service:

4

Any help and/or advice is highly appreciated!

Cheers
Alex

@tvontheinternet
Copy link

tvontheinternet commented Oct 2, 2022

Hello, I too am experiencing the same behavior on a new install on a debian 11 VM.

I also do not see an acme_tiny.log anywhere.

My boulder and boulder-control logs are a bit different, attaching.

boulderLog.log
control.log

Additionally, the boulder-control-1 container is continually restarting.

boulder

I have tried rerunning the install script, which completes without error but results in the same behavior.

@tvontheinternet
Copy link

I do see a lot of this in boulder-labca-1 logs:

boulder-labca-1 | 2022/10/03 03:21:06 errorHandler: err=dial tcp: lookup control on 127.0.0.11:53: no such host

hakwerk added a commit that referenced this issue Oct 5, 2022
Possibly also is the solution for #58 ?
@hakwerk
Copy link
Owner

hakwerk commented Oct 5, 2022

I think this issue is now resolved in the latest v22.10 release. It looks like something changed when installing the docker-ce package.

By the way, the reason that the acme_tiny.log file does not exist yet is because the setup is unable to reach to point where that file gets created...

@radicalrabbit
Copy link
Author

radicalrabbit commented Oct 7, 2022

Hi @hakwerk,

I did both a reinstall on the existing installation and a fresh install on a Debian 11. Still not able to use the web application. After logging in it says "Almost there! Now we will request a certificate for this website and restart one more time...". Containers are in state running. But the bmysql log logs some connection errors:

1
2
3

@hakwerk
Copy link
Owner

hakwerk commented Oct 7, 2022

I'm afraid there is not much I can tell from the first bmysql log, these aborted connection messages sometimes occur when the boulder container is restarting, but normally not every minute. Please have a look at the boulder-boulder-1 logs - it looks like that container is actually crashing over and over so hopefully there is a clue in that log. Do a docker ps -a to see if the STATUS and CREATED values match - if the status is very recent then docker-compose just restarted the container.

A quite common issue is that the boulder container cannot resolve the IP for the FQDN of your LabCA server.

The slow_log errors might be caused by out of memory, disk space etc...

@tvontheinternet
Copy link

I created a fresh Debian 11 VM, and ran the installer after the release of v22.10, but am seeing the same behavior.

I then rolled that new VM back to a snapshot taken after debian install, before installing labca and tried again, same behavior.

The VM OS is able to resolve its own FQDN.

The web page behavior is the same, "almost there, now we will request a certificate for this website and restart one more time"

The boulder-control-1 container running control.sh is constantly restarting

image

From the logging there, that container is having trouble with dpkg.

I did play around and set the containers restart policy to no, started it with docker-compose up -d connected to it with docker exec -it bash, and ran apt update and install commands without issue.

Logs for all 5 containers attached.

boulder.log
control.log
labca.log
mysql.log
nginx.log

If the control container that is constantly restarting is the right place to look. "boulder-control-1 | ./control.sh: line 97: docker: command not found" is logged, but I think thats normal since

docker ps >/dev/null won't redirect stderr (docker ps >/dev/null 2&>1 would), but then the install_docker runs apt-update and dpkg complains.

Could be the totally wrong tree there though. And I did get that container started forcefully, connected to it and installed docker without error. (on previous attempt, leaving this one as is)

@mwstamant
Copy link

@radicalrabbit I can confirm I am experiencing the same symptoms on a Debian 11 base image on Azure.

@tntounis
Copy link

I managed to move forward with the setup. I added in the /home/labca/boulder/docker-compose.yml file one volume in control image section. - /etc/resolv.conf:/etc/resolv.conf
This mapped the DNS servers from the host to the docker container. I restarted using docker-compose up -d. After a while, I receive an error message 'curl: (6) Could not resolve host: boulder'. I removed the entry from docker-compose.yml file. Restarted again the containers and the installation finished.

@tvontheinternet
Copy link

tvontheinternet commented Oct 11, 2022

I managed to move forward with the setup. I added in the /home/labca/boulder/docker-compose.yml file one volume in control image section. - /etc/resolv.conf:/etc/resolv.conf This mapped the DNS servers from the host to the docker container. I restarted using docker-compose up -d. After a while, I receive an error message 'curl: (6) Could not resolve host: boulder'. I removed the entry from docker-compose.yml file. Restarted again the containers and the installation finished.

That is interesting, boulder-control is supposed to be connected to the network boulder-bluenet:

control: image: *boulder_image networks: - bluenet

As are all other containers (I think control is not listed cause its perpetually restarting on my system):

docker network inspect boulder_bluenet | grep Name "Name": "boulder_bluenet", "Name": "boulder-nginx-1", "Name": "boulder-boulder-1", "Name": "boulder-labca-1", "Name": "boulder-bmysql-1",

and the other containers on the network have internal dns, connected to boulder-labca, installed ping, and was able to ping the local host by FQDN. boulder-labca is not connected to rednet and bluenet either, just bluenet.

/etc/resolv.conf on boulder-labca:

root@ba37f499208b:/go/src/labca# cat /etc/resolv.conf
search localdomain tvent.local
nameserver 127.0.0.11
options ndots:0

I can try to replicate your fix later on, I just don't get why boulder-control would have dns trouble while the other containers don't.

@tvontheinternet
Copy link

tvontheinternet commented Oct 12, 2022

I did not replicate @tntounis ' steps exactly but I think I was able to get up and running and all I did of consequence I believe was force the control container to stay started.

I ran docker update --restart no (control container)
docker-compse up -d

The UI eventually threw a generic something went wrong, here are some error logs to look at
I connected to the container with docker exec -it bash and was messing around installing ping/checking DNS and what not, and eventually just kind of noticed the labca page had updated to look as if the install was complete

I had to add the root CA to firefox for firefox to stop complaining about unknown issuer for the labca apps site, after it successfully grabbed a certificate this time looks like. Chrome and Edge did not need anything after adding the Root CA to the local computers trusted root certificate store.

@tntounis
Copy link

@tvontheinternet I don't understand why this worked as a workaround. But the control container was rebooting all the time. After the change, I got the error message that it couldn't find bounder. So reverting the entry worked for me. The control container was not crashing anymore, and I noticed that it restarted bounder container after a while. I refreshed the page, I could see the dashboard and the new certificate. Moreover, I noticed in the logs that the ACME request happened 1-2 minutes before it finished the setup. I believe the setup process didn't request a certificate.

@hakwerk
Copy link
Owner

hakwerk commented Oct 16, 2022

I think the issue is now resolved in the latest version (v22.10.1), it appears that my installer was restarting the control container while it was still building. For me it only happens in 10% or so of fresh installations, so it is very hard to confirm if the problem is indeed gone now.

In general, if people still have issues like this, it is safe to remove the hanging/restarting container and rebuild it:

cd /home/labca/boulder
docker-compose stop control
docker-compose rm -f control
docker-compose up -d

Give it a few minutes and the container should be up and the web GUI should work.

@hakwerk
Copy link
Owner

hakwerk commented Oct 28, 2022

The latest release (v22.10.2) should further improve any DNS related issues in my experience

@hakwerk hakwerk closed this as completed Oct 28, 2022
@shizzgar
Copy link

shizzgar commented May 7, 2023

I found out what was the problem.
I am running my own dns server in docker, so this solution resolved (lol) dns conflicts...
pi-hole/docker-pi-hole#1166 (comment)

By the way, your work is awesome, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants