Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: run bacalhau in a separate container #438

Merged
merged 7 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ jobs:
- name: Run stack
env:
DISABLE_TELEMETRY: true
API_HOST: ""
run: ./stack compose-up -d

- name: Run tests
Expand Down
4 changes: 2 additions & 2 deletions .local.dev
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@

API_HOST=http://localhost:8002/
WEB3_RPC_URL=ws://localhost:8548
SERVER_PORT=8080
SERVER_URL=http://localhost:8080
SERVER_PORT=8081
SERVER_URL=http://localhost:8081
DIRECTORY_ADDRESS=0x976EA74026E726554dB657fA54763abd0C3a0aa9
JOB_CREATOR_ADDRESS=0x9965507D1a55bcC2695C58ba16FB37d819B0A4dc
JOB_CREATOR_PRIVATE_KEY=0x5de4111afa1a4b94908f83103eb1f1706367c2e68ca870fc3fb9a804cdab365a
Expand Down
3 changes: 3 additions & 0 deletions docker/bacalhau/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ ADD https://github.com/bacalhau-project/bacalhau/releases/download/v1.3.2/bacalh
RUN tar xfv bacalhau_v1.3.2_linux_amd64.tar.gz
RUN mv bacalhau /usr/local/bin

HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD wget http://localhost:1234/api/v1/agent/alive -q || exit 1

ENTRYPOINT [ "bacalhau" ]
52 changes: 43 additions & 9 deletions docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,42 @@ services:
timeout: 10s
retries: 5
ipfs:
image: ipfs/kubo:v0.30.0
image: ipfs/kubo:v0.32.1
container_name: ipfs
restart: unless-stopped
ports:
- 5001:5001
volumes:
- ipfs-data:/data/ipfs
bacalhau:
image: ghcr.io/lilypad-tech/bacalhau
container_name: bacalhau
restart: unless-stopped
depends_on:
ipfs:
condition: service_healthy
build:
context: ..
dockerfile: ./docker/bacalhau/Dockerfile
extra_hosts:
- "localhost:host-gateway"
environment:
- BACALHAU_ENVIRONMENT=local
command:
[
"serve",
"--node-type",
"compute,requester",
"--peer",
"none",
"--private-internal-ipfs=false",
"--job-selection-accept-networked",
"--ipfs-connect",
"/dns4/ipfs/tcp/5001",
]
volumes:
- bacalhau-data:/root/.bacalhau
- /var/run/docker.sock:/var/run/docker.sock
solver:
image: ghcr.io/lilypad-tech/solver
container_name: solver
Expand All @@ -64,13 +93,12 @@ services:
- WEB3_RPC_URL=${WEB3_RPC_URL}
- DISABLE_TELEMETRY=${DISABLE_TELEMETRY}
ports:
- 8080:8080
- 8081:8081
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/"]
test: ["CMD", "curl", "-f", "http://localhost:8081/api/v1/job_offers"]
interval: 30s
timeout: 10s
retries: 5

job-creator:
image: ghcr.io/lilypad-tech/job-creator
container_name: job-creator
Expand All @@ -93,9 +121,12 @@ services:
container_name: resource-provider
restart: unless-stopped
depends_on:
- chain
- solver
- ipfs
ipfs:
condition: service_healthy
solver:
condition: service_healthy
bacalhau:
condition: service_healthy
build:
context: ..
dockerfile: ./docker/resource-provider/Dockerfile
Expand All @@ -107,14 +138,17 @@ services:
extra_hosts:
- "localhost:host-gateway"
volumes:
- bacalhau-data:/tmp/lilypad/data
- /var/run/docker.sock:/var/run/docker.sock
- lilypad-data:/tmp/lilypad/data
environment:
- WEB3_PRIVATE_KEY=${RESOURCE_PROVIDER_PRIVATE_KEY}
- IPFS_CONNECT=/dns4/ipfs/tcp/5001
- LOG_LEVEL=debug
- DISABLE_TELEMETRY=${DISABLE_TELEMETRY}
- BACALHAU_API_HOST=bacalhau
- BACALHAU_NODE_CLIENTAPI_HOST=bacalhau
- BACALHAU_NODE_CLIENTAPI_PORT=1234
volumes:
chain-data:
ipfs-data:
bacalhau-data:
lilypad-data:
33 changes: 30 additions & 3 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This is a docker-compose file for use by Resource Providers
services:
ipfs:
image: ipfs/kubo:v0.30.0
image: ipfs/kubo:v0.32.1
container_name: ipfs
restart: unless-stopped
ports:
Expand All @@ -10,6 +10,30 @@ services:
- 127.0.0.1:8080:8080
volumes:
- ipfs-data:/data/ipfs
bacalhau:
image: ghcr.io/lilypad-tech/bacalhau
container_name: bacalhau
restart: unless-stopped
depends_on:
ipfs:
condition: service_healthy
environment:
- BACALHAU_ENVIRONMENT=local
command:
[
"serve",
"--node-type",
"compute,requester",
"--peer",
"none",
"--private-internal-ipfs=false",
"--job-selection-accept-networked",
"--ipfs-connect",
"/dns4/ipfs/tcp/5001",
]
volumes:
- bacalhau-data:/root/.bacalhau
- /var/run/docker.sock:/var/run/docker.sock
resource-provider:
image: ghcr.io/lilypad-tech/resource-provider:latest
container_name: resource-provider
Expand All @@ -22,12 +46,14 @@ services:
args:
- COMPUTE_MODE=gpu
volumes:
- bacalhau-data:/tmp/lilypad/data
- /var/run/docker.sock:/var/run/docker.sock
- lilypad-data:/tmp/lilypad/data
environment:
- WEB3_PRIVATE_KEY
- WEB3_RPC_URL
- IPFS_CONNECT=/dns4/ipfs/tcp/5001
- BACALHAU_API_HOST=bacalhau
- BACALHAU_NODE_CLIENTAPI_HOST=bacalhau
- BACALHAU_NODE_CLIENTAPI_PORT=1234
watchtower:
image: containrrr/watchtower
container_name: watchtower
Expand All @@ -37,3 +63,4 @@ services:
volumes:
ipfs-data:
bacalhau-data:
lilypad-data:
7 changes: 3 additions & 4 deletions docker/resource-provider/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,10 @@ ENV PATH="/usr/local/bin:${PATH}"
# Create a startup script to run both services simultaneously
RUN touch run
RUN echo "#!/bin/bash" >> run
# Launch Bacalhau
RUN echo "/usr/local/bin/bacalhau serve --node-type compute,requester --peer none --private-internal-ipfs=false --job-selection-accept-networked &" >> run

# Wait for Bacalhau to be ready by checking the correct API endpoint
RUN echo "while ! curl -s http://0.0.0.0:1234/api/v1/agent/alive | grep '\"Status\": \"OK\"'; do echo 'Waiting for Bacalhau...'; sleep 2; done" >> run
# Ensure bacalhau is initialized
RUN echo "export BACALHAU_ENVIRONMENT=local" >> run
RUN echo "bacalhau id" >> run

# Launch Lilypad
RUN echo "/usr/local/bin/lilypad resource-provider --network ${NETWORK} --disable-pow=${DISABLE_POW} --disable-telemetry=${DISABLE_TELEMETRY} &" >> run
Expand Down
2 changes: 1 addition & 1 deletion pkg/executor/bacalhau/bacalhau.go
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ func (executor *BacalhauExecutor) getJobID(
outputError := strings.Join(strings.Fields(strings.Join(splitOutputs[1:], " ")), " ")

if outputError != "" {
return "", fmt.Errorf("error running command %s -> %s, %s", deal.ID, outputError, runOutput)
log.Error().Msgf("error parsing output %s -> %s, %s", deal.ID, outputError, runOutput)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Error().Msgf("error parsing output %s -> %s, %s", deal.ID, outputError, runOutput)
log.Error().Msgf("error found while parsing output %s -> %s, %s", deal.ID, outputError, runOutput)

Possible edit. It's not a parsing error, but an error we found while parsing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have thoughts about it being ERR vs WARN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tough call. On the one hand, the possible error we know about (around the version check) is more of a warning. On the other hand, could there be other errors that aren't a warning?

I'd say let's go with the common case we know about and use WARN. If we start seeing other errors, we can revisit, but this may not matter once we have upgraded to the latest Bacalhau.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, once we get fully over to the bacalhau http api, a lot of this code become moot. 👍

}

id := strings.TrimSpace(string(runOutput))
Expand Down
1 change: 1 addition & 0 deletions stack
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,7 @@ function resource-provider-docker-run() {

function bacalhau-node(){
export BACALHAU_SERVE_IPFS_PATH=/tmp/lilypad/data/ipfs
export BACALHAU_ENVIRONMENT=local
export LOG_LEVEL=debug
bacalhau serve --node-type compute,requester --peer none --private-internal-ipfs=false --job-selection-accept-networked --ipfs-connect "/ip4/127.0.0.1/tcp/5001"
}
Expand Down
2 changes: 1 addition & 1 deletion test/ratelimit_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ func makeCalls(t *testing.T, path string, ch chan rateResult) {

// Make 100 requests
for range 100 {
requestURL := fmt.Sprintf("http://localhost:%d%s", 8080, path)
requestURL := fmt.Sprintf("http://localhost:%d%s", 8081, path)
res, err := http.Get(requestURL)

if err != nil {
Expand Down
Loading