Skip to content

Commit

Permalink
chore: fix references
Browse files Browse the repository at this point in the history
  • Loading branch information
vthiery committed Jan 9, 2025
1 parent 7897e2f commit 9f60c83
Show file tree
Hide file tree
Showing 68 changed files with 127 additions and 127 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/buildChaosBlog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
name: Auto-merge dependency PRs
runs-on: ubuntu-latest
needs: [ build ]
if: github.repository == 'zeebe-io/zeebe-chaos' && (github.actor == 'dependabot[bot]' || github.actor == 'renovate[bot]')
if: github.repository == 'camunda/zeebe-chaos' && (github.actor == 'dependabot[bot]' || github.actor == 'renovate[bot]')
permissions:
checks: read
pull-requests: write
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/go-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
name: Auto-merge dependabot PRs
runs-on: ubuntu-latest
needs: [ go-ci ]
if: github.repository == 'zeebe-io/zeebe-chaos' && (github.actor == 'dependabot[bot]' || github.actor == 'renovate[bot]')
if: github.repository == 'camunda/zeebe-chaos' && (github.actor == 'dependabot[bot]' || github.actor == 'renovate[bot]')
permissions:
checks: read
pull-requests: write
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,6 @@ makes it easy to create chaos experiments and also automate them later on.
All our current experiments are located under `chaos-days/blog/`, for more
details please have a look at the [README](chaos-days/blog/README.md).

Alternatively all our chaos-days experiments can be found [here](https://zeebe-io.github.io/zeebe-chaos/) in blog
Alternatively all our chaos-days experiments can be found [here](https://camunda.github.io/zeebe-chaos/) in blog
format.

2 changes: 1 addition & 1 deletion chaos-days/blog/2020-06-04-first-chaos-day/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ authors: zell

* Documented failure cases for exporter (already some exist, it seemed) gave me a new idea for ZEP
* Introduced Peter to our Chaos Repository, discussed a bit about the hypothesis backlog, reopened the Chaos Trello board where we will organize ourselves
* Run a chaos experiment, where we put high CPU load on the Leader [#6](https://github.com/zeebe-io/zeebe-chaos/issues/6)
* Run a chaos experiment, where we put high CPU load on the Leader [#6](https://github.com/camunda/zeebe-chaos/issues/6)

<!--truncate-->

Expand Down
2 changes: 1 addition & 1 deletion chaos-days/blog/2020-06-11-high-cpu-gateway/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors: zell
* Updated failure cases documentation for exporter based on review
* Documented failure cases for ZeebeDB
* Wrote an chaostoolkit experiment based on the last manual Chaos experiment
* Run a chaos experiment with @Deepthi, where we put high CPU load on the standalone gateway https://github.com/zeebe-io/zeebe-chaos/issues/28
* Run a chaos experiment with @Deepthi, where we put high CPU load on the standalone gateway https://github.com/camunda/zeebe-chaos/issues/28

<!--truncate-->

Expand Down
4 changes: 2 additions & 2 deletions chaos-days/blog/2020-07-09-timer-and-huge-variables/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ authors: zell

### A Lot of Timers

Based on the Hypothesis written here: [#31](https://github.com/zeebe-io/zeebe-chaos/issues/31) we run an experiment with a stable load of 10 simple workflow instances per second (only start and end event) and 10 workflow instances with
Based on the Hypothesis written here: [#31](https://github.com/camunda/zeebe-chaos/issues/31) we run an experiment with a stable load of 10 simple workflow instances per second (only start and end event) and 10 workflow instances with
multiple timers. We wanted to explore what happens when we have a lot of timers running and especially what happens when the are triggered at once. We created the following workflow model, where timers are exponentially created.

![timerProcess](timerProcess.png)

The experiments is based on the hypotheses we wrote here [#31](https://github.com/zeebe-io/zeebe-chaos/issues/31).
The experiments is based on the hypotheses we wrote here [#31](https://github.com/camunda/zeebe-chaos/issues/31).

#### Expectations

Expand Down
4 changes: 2 additions & 2 deletions chaos-days/blog/2020-07-16-big-multi-instance/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ authors: zell

# Chaos Day Summary

* investigate and fix automated chaos experiments - works again with [88c404f](https://github.com/zeebe-io/zeebe-chaos/commit/88c404f97514d4a7a511ce9751085acdd1720cd9) and [cd8d685](https://github.com/zeebe-io/zeebe-chaos/commit/cd8d685b83eaa1ac9050ad3d16868389e1c0c36d)
* investigate and fix automated chaos experiments - works again with [88c404f](https://github.com/camunda/zeebe-chaos/commit/88c404f97514d4a7a511ce9751085acdd1720cd9) and [cd8d685](https://github.com/camunda/zeebe-chaos/commit/cd8d685b83eaa1ac9050ad3d16868389e1c0c36d)
* Closed some issues in the backlog
* Run a chaos experiment with bigger multi instance to reach `maxMessageSize`

<!--truncate-->

## Chaos Experiment

We wanted to run a chaos experiment, which covers [#33](https://github.com/zeebe-io/zeebe-chaos/issues/33).
We wanted to run a chaos experiment, which covers [#33](https://github.com/camunda/zeebe-chaos/issues/33).

### Expected

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ authors: zell

## Chaos Experiment

We wanted to run a chaos experiment, which covers [#20](https://github.com/zeebe-io/zeebe-chaos/issues/20).
We wanted to run a chaos experiment, which covers [#20](https://github.com/camunda/zeebe-chaos/issues/20).
Furthermore, it was recently asked in the forum whether it makes a difference performance wise to run a broker without exporters, see [here](https://forum.zeebe.io/t/zeebe-low-performance/1356/17)

### Expected
Expand All @@ -32,7 +32,7 @@ authors: zell
* only with metrics exporter
* without any exporter

These benchmarks run overnight without bigger issues. This means all of three where able to take snapshots and compact the log. This satisfy our hypothesis of https://github.com/zeebe-io/zeebe-chaos/issues/20 .
These benchmarks run overnight without bigger issues. This means all of three where able to take snapshots and compact the log. This satisfy our hypothesis of https://github.com/camunda/zeebe-chaos/issues/20 .

| Default | Without exporters |
|---|---|
Expand Down
2 changes: 1 addition & 1 deletion chaos-days/blog/2020-10-06-toxi-proxy/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Actually I would expect here an error instead of just returning null.
Peter volunteered for automating a new chaos experiment, where we put high load on a broker and expect that we have no leader change. This was previous an issue, since the leaders were not able to send heartbeats in time. Related issue #7.

### Time reset
I wanted to work on the clock reset [#3](https://github.com/zeebe-io/zeebe-chaos/issues/3).
I wanted to work on the clock reset [#3](https://github.com/camunda/zeebe-chaos/issues/3).
This seems to be not easily possible in kubernetes or at least with our current images, since we need for that root privilges.

```sh
Expand Down
6 changes: 3 additions & 3 deletions chaos-days/blog/2020-10-13-multiple-leader-changes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ authors: zell

Today I wanted to add new chaostoolkit experiment, which we can automate.
We already have experiments like restarting followers and leaders for a partition, but in the past what also caused issues was multiple restarts/leader changes
in a short period of time. This is the reason why I created [#39](https://github.com/zeebe-io/zeebe-chaos/issues/39).
in a short period of time. This is the reason why I created [#39](https://github.com/camunda/zeebe-chaos/issues/39).

<!--truncate-->

Expand All @@ -35,7 +35,7 @@ We requesting the Topology, determine the leader for partition one restart that

### Result

The corresponding experiment was added via this [commit](https://github.com/zeebe-io/zeebe-chaos/commit/11c3a96fc87991f649fb1559363ba335b2bf42a1).
The corresponding experiment was added via this [commit](https://github.com/camunda/zeebe-chaos/commit/11c3a96fc87991f649fb1559363ba335b2bf42a1).
We were able to prove that our hypothesis is true. we are able to handle multiple leader changes even in a short period of time.

#### Metrics
Expand Down Expand Up @@ -101,7 +101,7 @@ Put high load on the cluster for several minutes, via creating workflow instance

### Result

@pihme create a new PR to add the experiment [#41](https://github.com/zeebe-io/zeebe-chaos/pull/41)
@pihme create a new PR to add the experiment [#41](https://github.com/camunda/zeebe-chaos/pull/41)


#### Metrics
Expand Down
8 changes: 4 additions & 4 deletions chaos-days/blog/2020-10-20-non-graceful-shutdown/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ I did that on Wednesday (21-10-2020).
## PR Merge

I tried again the new chaos experiment with a Production M cluster, before merging. It worked quite smooth.
PR is merged [#41](https://github.com/zeebe-io/zeebe-chaos/pull/41) :tada:
PR is merged [#41](https://github.com/camunda/zeebe-chaos/pull/41) :tada:

## Non-graceful shutdown

Currently in our experiments we do a normal `kubectl delete pod`, which does an graceful shutdown. The application has time to stop it's services etc. It would be interesting how Zeebe handles non-graceful shutdowns. In order to achieve that we can use the option `--grace-period=0`. For more information you can read for example [this](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#force-deletion)

I added additional experiments to our normal follower and leader restarts experiments, such that we have both graceful and non-graceful restarts.
Both seem to work without any issues. I was also able to fix some bash script error with the help of [shellcheck](https://github.com/koalaman/shellcheck). Related issue https://github.com/zeebe-io/zeebe-chaos/issues/42.
Both seem to work without any issues. I was also able to fix some bash script error with the help of [shellcheck](https://github.com/koalaman/shellcheck). Related issue https://github.com/camunda/zeebe-chaos/issues/42.


Example output:
Expand Down Expand Up @@ -56,8 +56,8 @@ Example output:

Related commits:

* [Restart leader non-gracefully](https://github.com/zeebe-io/zeebe-chaos/commit/e6260cb8612a983c8ed74fd2a37a249987ad3d3d)
* [Restart follower non-gracefully](https://github.com/zeebe-io/zeebe-chaos/commit/63c481c0c7dd7026f03be4e51d61a918613b0140)
* [Restart leader non-gracefully](https://github.com/camunda/zeebe-chaos/commit/e6260cb8612a983c8ed74fd2a37a249987ad3d3d)
* [Restart follower non-gracefully](https://github.com/camunda/zeebe-chaos/commit/63c481c0c7dd7026f03be4e51d61a918613b0140)

## Participants

Expand Down
4 changes: 2 additions & 2 deletions chaos-days/blog/2020-11-03-investigate-failing-tests/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ To run all experiments in a loop I used in the `chaos-experiments/kubernetes` fo
while [ $? -eq 0 ]; do for ex in */experiment.json; do chaos run $ex; done; done
```
During running the experiments I found a bug in our chaos experiments, where it seems that some experiments are not executed correctly, see [#43](https://github.com/zeebe-io/zeebe-chaos/issues/43).
During running the experiments I found a bug in our chaos experiments, where it seems that some experiments are not executed correctly, see [#43](https://github.com/camunda/zeebe-chaos/issues/43).


It took a while, but at some point the experiments start to fail. Interesting is that if you look at the pods all seem to be ready, but in the metrics we can see that one partition is unhealthy (Partition one this time).
Expand Down Expand Up @@ -88,7 +88,7 @@ tar -xvf broker-2-data.tar.gz

## New Issues

* Gateway experiments are not executed [#43](https://github.com/zeebe-io/zeebe-chaos/issues/43)
* Gateway experiments are not executed [#43](https://github.com/camunda/zeebe-chaos/issues/43)
* Deployment Reprocessing inconsistencies [#5753](https://github.com/zeebe-io/zeebe/issues/5753)

## Participants
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ authors: zell

# Chaos Day Summary

Today I wanted to finally implement an experiment which I postponed for long time, see [#24](https://github.com/zeebe-io/zeebe-chaos/issues/24).
Today I wanted to finally implement an experiment which I postponed for long time, see [#24](https://github.com/camunda/zeebe-chaos/issues/24).
The problem was that previous we were not able to determine on which partition the message was published, so we were not able to assert that it was published on the correct partition. With this [#4794](https://github.com/zeebe-io/zeebe/issues/4794) it is now possible, which was btw an community contribution. :tada:

<!--truncate-->
Expand Down Expand Up @@ -72,8 +72,8 @@ $ chaos run production-m/msg-correlation/experiment.json
```

Experiment added to all cluster plans:
* https://github.com/zeebe-io/zeebe-chaos/commit/adeab53915e12b4a76fd4d49bb359684619b117f
* https://github.com/zeebe-io/zeebe-chaos/commit/93daf11864fdd851267dae67fdfc31e0ea78b407
* https://github.com/camunda/zeebe-chaos/commit/adeab53915e12b4a76fd4d49bb359684619b117f
* https://github.com/camunda/zeebe-chaos/commit/93daf11864fdd851267dae67fdfc31e0ea78b407


## New Issues
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ authors: zell

Happy new year everyone :tada:

This time I wanted to verify the following hypothesis `Disconnecting Leader and one Follower should not make cluster disruptive` ([#45](https://github.com/zeebe-io/zeebe-chaos/issues/45)).
This time I wanted to verify the following hypothesis `Disconnecting Leader and one Follower should not make cluster disruptive` ([#45](https://github.com/camunda/zeebe-chaos/issues/45)).
But in order to do that we need to extract the Leader and Follower node for a partition from the Topology. Luckily in December we got an [external contribution](https://github.com/zeebe-io/zeebe/pull/5943) which allows us to print `zbctl status` as json.
This gives us now more possibilities, since we can extract values much better out of it.

Expand Down Expand Up @@ -224,7 +224,7 @@ function getIndexOfPodForPartitionInState()
The previous function worked only with homogeneous clusters, which means where the partitions are equally distributed. This caused issues on experiments on Production L clusters, where partitions are heterogeneous distributed, see related issue [zeebe-io/zeebe-cluster-testbench#154](https://github.com/zeebe-io/zeebe-cluster-testbench/issues/154). With this new utility we can create some new experiments also for Production - L clusters.
I wrote a new script based on the [older disconnect/connect gateway scripts](https://github.com/zeebe-io/zeebe-chaos/blob/master/chaos-experiments/scripts/disconnect-standalone-gateway.sh), where we disconnect the gateway with the brokers. The new one disconnects an leader for an partition with the follower and vice-versa.
I wrote a new script based on the [older disconnect/connect gateway scripts](https://github.com/camunda/zeebe-chaos/blob/master/chaos-experiments/scripts/disconnect-standalone-gateway.sh), where we disconnect the gateway with the brokers. The new one disconnects an leader for an partition with the follower and vice-versa.
Disconnect Leader-Follower:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ We were able to enhance the deployment distribution experiment and run it in the

## Chaos Experiment

We already had a [prepared chaos experiment](https://github.com/zeebe-io/zeebe-chaos/blob/master/chaos-experiments/helm/deployment-distribution/experiment.json), but we needed to enhance that. Deepthi was so kind to create [PR](https://github.com/zeebe-io/zeebe-chaos/pull/50) for that.
We already had a [prepared chaos experiment](https://github.com/camunda/zeebe-chaos/blob/master/chaos-experiments/helm/deployment-distribution/experiment.json), but we needed to enhance that. Deepthi was so kind to create [PR](https://github.com/camunda/zeebe-chaos/pull/50) for that.

### Enhancement
The changes contain a new step before creating the network partition on the deployment distribution experiment, see [here](https://github.com/zeebe-io/zeebe-chaos/blob/master/chaos-experiments/camunda-cloud/production-l/deployment-distribution/experiment.json#L25-L35).
The changes contain a new step before creating the network partition on the deployment distribution experiment, see [here](https://github.com/camunda/zeebe-chaos/blob/master/chaos-experiments/camunda-cloud/production-l/deployment-distribution/experiment.json#L25-L35).

```json
{
Expand Down Expand Up @@ -185,7 +185,7 @@ Thanks for participating [Deepthi](https://github.com/deepthidevaki).

##### Re-connecting might fail

We realized during testing the experiment that the re-connecting might fail, because the pod can be rescheduled and then a ip route can't be delete since it no longer exist. [This is now fixed](https://github.com/zeebe-io/zeebe-chaos/blob/master/chaos-experiments/scripts/connect-leaders.sh#L45-L48). We check for existence of the command `ip`, if this doesn't exist we know the pod was restarted and we ignore it.
We realized during testing the experiment that the re-connecting might fail, because the pod can be rescheduled and then a ip route can't be delete since it no longer exist. [This is now fixed](https://github.com/camunda/zeebe-chaos/blob/master/chaos-experiments/scripts/connect-leaders.sh#L45-L48). We check for existence of the command `ip`, if this doesn't exist we know the pod was restarted and we ignore it.


*Before:*
Expand Down
2 changes: 1 addition & 1 deletion chaos-days/blog/2021-03-30-set-file-immutable/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Unfortunately I found out that our test chaos cluster was in a way broken, that
Because of these circumstances I thought about different things to experiment with, and I remembered that in the [last chaos day](/2021-03-23-camunda-cloud-network-partition/index.md) we worked with patching running deployments, in order to add more capabilities.
This allowed us to create ip routes and experiment with the zeebe deployment distribution. During this I have read the [capabilities list of linux](https://man7.org/linux/man-pages/man7/capabilities.7.html), and found out that we can mark files as immutable, which might be interesting for a chaos experiment.

In this chaos day I planned to find out how marking a file immutable affects our brokers and I made the hypothesis that: *If a leader has a write error, which is not recoverable, it will step down and another leader should take over.* I put this in our hypothesis backlog ([zeebe-chaos#52](https://github.com/zeebe-io/zeebe-chaos/issues/52)).
In this chaos day I planned to find out how marking a file immutable affects our brokers and I made the hypothesis that: *If a leader has a write error, which is not recoverable, it will step down and another leader should take over.* I put this in our hypothesis backlog ([zeebe-chaos#52](https://github.com/camunda/zeebe-chaos/issues/52)).

In order to really run this kind of experiment I need to find out whether marking a file immutable will cause any problems and if not how I can cause write errors such that affects the broker.
Unfortunately it turned out that immutable files will not cause issues on already opened file channels, but I found some other bugs/issues, which you can read below.
Expand Down
Loading

0 comments on commit 9f60c83

Please sign in to comment.