Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support slow log tailing sidcar for tidb instance #290

Merged
merged 10 commits into from
Mar 4, 2019
4 changes: 4 additions & 0 deletions charts/tidb-cluster/templates/config/_tidb-config.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,11 @@ format = "text"
disable-timestamp = false

# Stores slow query log into separated files.
{{- if .Values.tidb.separateSlowLog }}
slow-query-file = "/tmp/log/tidb/slowlog"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be changed to /var/log/tidb/slowlog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
BTW, this is kind of a code smell but I can't figure out a better solution. I'll document this caveat or any better ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean user may override the configuration by setting tidb.config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can specify the command line argument -log-slow-query of tidb-server, Its priority is higher than the configuration file.

{{- else }}
slow-query-file = ""
{{- end }}

# Queries with execution time greater than this value will be logged. (Milliseconds)
slow-threshold = 300
Expand Down
7 changes: 7 additions & 0 deletions charts/tidb-cluster/templates/tidb-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,10 @@ spec:
{{- end }}
binlogEnabled: {{ .Values.binlog.pump.create | default false }}
maxFailoverCount: {{ .Values.tidb.maxFailoverCount | default 3 }}
separateSlowLog: {{ .Values.tidb.separateSlowLog | default false }}
slowLogTailer:
image: {{ .Values.tidb.slowLogTailer.image }}
imagePullPolicy: {{ .Values.tidb.slowLogTailer.imagePullPolicy | default "IfNotPresent" }}
{{- if .Values.tidb.slowLogTailer.resources }}
{{ toYaml .Values.tidb.slowLogTailer.resources | indent 6 }}
{{- end }}
10 changes: 10 additions & 0 deletions charts/tidb-cluster/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,16 @@ tidb:
exposeStatus: true
# annotations:
# cloud.google.com/load-balancer-type: Internal
# separateSlowLog: true
aylei marked this conversation as resolved.
Show resolved Hide resolved
slowLogTailer:
image: busybox:1.26.2
resources:
limits: {}
# cpu: 50m
# memory: 10Mi
requests: {}
# cpu: 50m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify a request. Very few resources should be needed for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cpu 20m and memory 5MB a good default for both limit and request? (I've tested locally and seems that 10m cpu, 1MB memory is fairly enough)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use a limit or make the limit a bit higher at least until we have observed it in production..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, maybe 50Mi memory limit is enough. This is pretty small comparing to TiDB itself.

# memory: 10Mi

# mysqlClient is used to set password for TiDB
mysqlClient:
Expand Down
27 changes: 27 additions & 0 deletions docs/operation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,33 @@ Then open your browser at http://localhost:3000 The default username and passwor

The Grafana service is exposed as `NodePort` by default, you can change it to `LoadBalancer` if the underlining Kubernetes has load balancer support. And then view the dashboard via load balancer endpoint.

### View TiDB Slow Query Log

For default setup, tidb is configured to export slow query log to STDOUT along with normal server logs. You can obtain the slow query log by `grep` the keyword `SLOW_QUERY`:

```shell
$ kubectl logs -n ${namespace} ${tidbPodName} | grep SLOW_QUERY
```

Optionally, you can output slow query log in a separate sidecar by enabling `separateSlowLog`:

```yaml
# Uncomment the following line to enable separate output of the slow query log
# separateSlowLog: true
```

Run `helm upgrade` to apply the change, then you can obtain the slow query log from the sidecar named `slowlog`:

```shell
$ kubectl logs -n ${namespace} ${tidbPodName} -c slowlog
```

To retrieve logs from multiple pods, [`stern`](https://github.com/wercker/stern) is recommended.

```shell
$ stern -n ${namespace} tidb -c slowlog
```

## Backup

Currently, TiDB Operator supports two kinds of backup: incremental backup via binlog and full backup(scheduled or ad-hoc) via [Mydumper](https://github.com/maxbube/mydumper).
Expand Down
1 change: 1 addition & 0 deletions images/tidb-operator-e2e/tidb-cluster-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ tidb:
exposeStatus: true
# annotations:
# cloud.google.com/load-balancer-type: Internal
separateSlowLog: true

# mysqlClient is used to set password for TiDB
mysqlClient:
Expand Down
25 changes: 17 additions & 8 deletions pkg/apis/pingcap.com/v1alpha1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,10 @@ const (
TiDBMemberType MemberType = "tidb"
// TiKVMemberType is tikv container type
TiKVMemberType MemberType = "tikv"
//PushGatewayMemberType is pushgateway container type
// PushGatewayMemberType is pushgateway container type
PushGatewayMemberType MemberType = "pushgateway"
// SlowLogTailerMemberType is tidb log tailer container type
SlowLogTailerMemberType MemberType = "slowlog"
// UnknownMemberType is unknown container type
UnknownMemberType MemberType = "unknown"
)
Expand Down Expand Up @@ -117,13 +119,20 @@ type PDSpec struct {
// TiDBSpec contains details of PD member
type TiDBSpec struct {
ContainerSpec
Replicas int32 `json:"replicas"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
NodeSelectorRequired bool `json:"nodeSelectorRequired,omitempty"`
StorageClassName string `json:"storageClassName,omitempty"`
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
BinlogEnabled bool `json:"binlogEnabled,omitempty"`
MaxFailoverCount int32 `json:"maxFailoverCount,omitempty"`
Replicas int32 `json:"replicas"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
NodeSelectorRequired bool `json:"nodeSelectorRequired,omitempty"`
StorageClassName string `json:"storageClassName,omitempty"`
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
BinlogEnabled bool `json:"binlogEnabled,omitempty"`
MaxFailoverCount int32 `json:"maxFailoverCount,omitempty"`
SeparateSlowLog bool `json:"separateSlowLog,omitempty"`
SlowLogTailer TiDBSlowLogTailerSpec `json:"slowLogTailer,omitempty"`
}

// TiDBSlowLogTailerSpec represents an optional log tailer sidecar with TiDB
type TiDBSlowLogTailerSpec struct {
ContainerSpec
}

// TiKVSpec contains details of PD member
Expand Down
18 changes: 18 additions & 0 deletions pkg/apis/pingcap.com/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions pkg/controller/controller_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ var (
const (
// defaultPushgatewayImage is default image of pushgateway
defaultPushgatewayImage = "prom/pushgateway:v0.3.1"
// defaultTiDBSlowLogImage is default image of tidb log tailer
defaultTiDBLogTailerImage = "busybox:1.26.2"
)

// RequeueError is used to requeue the item, this error type should't be considered as a real error
Expand Down Expand Up @@ -129,6 +131,13 @@ func GetPushgatewayImage(cluster *v1alpha1.TidbCluster) string {
return defaultPushgatewayImage
}

func GetSlowLogTailerImage(cluster *v1alpha1.TidbCluster) string {
if img := cluster.Spec.TiDB.SlowLogTailer.Image; img != "" {
return img
}
return defaultTiDBLogTailerImage
}

// PDMemberName returns pd member name
func PDMemberName(clusterName string) string {
return fmt.Sprintf("%s-pd", clusterName)
Expand Down
9 changes: 9 additions & 0 deletions pkg/controller/controller_utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,15 @@ func TestGetPushgatewayImage(t *testing.T) {
g.Expect(GetPushgatewayImage(tc)).To(Equal("image-1"))
}

func TestGetSlowLogTailerImage(t *testing.T) {
g := NewGomegaWithT(t)

tc := &v1alpha1.TidbCluster{}
g.Expect(GetSlowLogTailerImage(tc)).To(Equal(defaultTiDBLogTailerImage))
tc.Spec.TiDB.SlowLogTailer.Image = "image-1"
g.Expect(GetSlowLogTailerImage(tc)).To(Equal("image-1"))
}

func TestPDMemberName(t *testing.T) {
g := NewGomegaWithT(t)
g.Expect(PDMemberName("demo")).To(Equal("demo-pd"))
Expand Down
123 changes: 78 additions & 45 deletions pkg/manager/member/tidb_member_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
package member

import (
"fmt"
"strconv"

"github.com/pingcap/tidb-operator/pkg/apis/pingcap.com/v1alpha1"
Expand All @@ -30,6 +31,12 @@ import (
corelisters "k8s.io/client-go/listers/core/v1"
)

const (
slowQueryLogVolumeName = "slowlog"
slowQueryLogDir = "/var/log/tidb"
slowQueryLogFile = slowQueryLogDir + "/slowlog"
)

type tidbMemberManager struct {
setControl controller.StatefulSetControlInterface
svcControl controller.ServiceControlInterface
Expand Down Expand Up @@ -240,6 +247,76 @@ func (tmm *tidbMemberManager) getNewTiDBSetForTidbCluster(tc *v1alpha1.TidbClust
},
}

var containers []corev1.Container
if tc.Spec.TiDB.SeparateSlowLog {
// mount a shared volume and tail the slow log to STDOUT using a sidecar.
vols = append(vols, corev1.Volume{
Name: slowQueryLogVolumeName,
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{},
},
})
volMounts = append(volMounts, corev1.VolumeMount{Name: slowQueryLogVolumeName, MountPath: slowQueryLogDir})
containers = append(containers, corev1.Container{
Name: v1alpha1.SlowLogTailerMemberType.String(),
Image: controller.GetSlowLogTailerImage(tc),
ImagePullPolicy: tc.Spec.TiDB.SlowLogTailer.ImagePullPolicy,
Resources: util.ResourceRequirement(tc.Spec.TiDB.SlowLogTailer.ContainerSpec),
VolumeMounts: []corev1.VolumeMount{
{Name: slowQueryLogVolumeName, MountPath: slowQueryLogDir},
},
Command: []string{
"sh",
"-c",
fmt.Sprintf("touch %s; tail -n0 -f %s;", slowQueryLogFile, slowQueryLogFile),
tennix marked this conversation as resolved.
Show resolved Hide resolved
},
})
}

containers = append(containers, corev1.Container{
Name: v1alpha1.TiDBMemberType.String(),
Image: tc.Spec.TiDB.Image,
Command: []string{"/bin/sh", "/usr/local/bin/tidb_start_script.sh"},
ImagePullPolicy: tc.Spec.TiDB.ImagePullPolicy,
Ports: []corev1.ContainerPort{
{
Name: "server",
ContainerPort: int32(4000),
Protocol: corev1.ProtocolTCP,
},
{
Name: "status", // pprof, status, metrics
ContainerPort: int32(10080),
Protocol: corev1.ProtocolTCP,
},
},
VolumeMounts: volMounts,
Resources: util.ResourceRequirement(tc.Spec.TiDB.ContainerSpec),
Env: []corev1.EnvVar{
{
Name: "CLUSTER_NAME",
Value: tc.GetName(),
},
{
Name: "TZ",
Value: tc.Spec.Timezone,
},
{
Name: "BINLOG_ENABLED",
Value: strconv.FormatBool(tc.Spec.TiDB.BinlogEnabled),
},
},
ReadinessProbe: &corev1.Probe{
Handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/status",
Port: intstr.FromInt(10080),
},
},
InitialDelaySeconds: int32(10),
},
})

tidbLabel := label.New().Instance(instanceName).TiDB()
tidbSet := &apps.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Expand All @@ -264,51 +341,7 @@ func (tmm *tidbMemberManager) getNewTiDBSetForTidbCluster(tc *v1alpha1.TidbClust
label.New().Instance(instanceName).TiDB(),
tc.Spec.TiDB.NodeSelector,
),
Containers: []corev1.Container{
{
Name: v1alpha1.TiDBMemberType.String(),
Image: tc.Spec.TiDB.Image,
Command: []string{"/bin/sh", "/usr/local/bin/tidb_start_script.sh"},
ImagePullPolicy: tc.Spec.TiDB.ImagePullPolicy,
Ports: []corev1.ContainerPort{
{
Name: "server",
ContainerPort: int32(4000),
Protocol: corev1.ProtocolTCP,
},
{
Name: "status", // pprof, status, metrics
ContainerPort: int32(10080),
Protocol: corev1.ProtocolTCP,
},
},
VolumeMounts: volMounts,
Resources: util.ResourceRequirement(tc.Spec.TiDB.ContainerSpec),
Env: []corev1.EnvVar{
{
Name: "CLUSTER_NAME",
Value: tc.GetName(),
},
{
Name: "TZ",
Value: tc.Spec.Timezone,
},
{
Name: "BINLOG_ENABLED",
Value: strconv.FormatBool(tc.Spec.TiDB.BinlogEnabled),
},
},
ReadinessProbe: &corev1.Probe{
Handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/status",
Port: intstr.FromInt(10080),
},
},
InitialDelaySeconds: int32(10),
},
},
},
Containers: containers,
RestartPolicy: corev1.RestartPolicyAlways,
Tolerations: tc.Spec.TiDB.Tolerations,
Volumes: vols,
Expand Down
12 changes: 12 additions & 0 deletions pkg/manager/member/tidb_member_manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,18 @@ func TestTiDBMemberManagerSyncUpdate(t *testing.T) {
g.Expect(err).NotTo(HaveOccurred())
},
},
{
name: "enable separate slowlog on the fly",
modify: func(tc *v1alpha1.TidbCluster) {
tc.Spec.TiDB.SeparateSlowLog = true
},
errWhenUpdateStatefulSet: false,
err: false,
expectStatefulSetFn: func(g *GomegaWithT, set *apps.StatefulSet, err error) {
g.Expect(err).NotTo(HaveOccurred())
g.Expect(set.Spec.Template.Spec.Containers).To(HaveLen(2))
},
},
}

for i := range tests {
Expand Down