Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Stop server startup if an unrecognized option is found in a config file #9855

Merged
merged 17 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package config
import (
"crypto/tls"
"crypto/x509"
"fmt"
"io/ioutil"
"time"

Expand Down Expand Up @@ -371,10 +372,22 @@ func GetGlobalConfig() *Config {

// Load loads config options from a toml file.
func (c *Config) Load(confFile string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When upgrade from old version tidb to a newer version, some old config item maybe is not used in the newer version tidb. The rolling upgrade may meet error in this case, then the rolling upgrade failed.

How about only check the config items when -config-check is provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this suggestion. I think the behavior should be that tidb-server simply refuses to start if there are unknown configuration options, for the reasons I provided above.

A couple points:

  1. Now that there's a --config-check option, it can be a part of the recommended procedure for a rolling upgrade, so that the only people running into the issue you describe are those not following the recommended procedure.

  2. The benefit of a rolling upgrade is that you never have to take the entire cluster offline. When upgrading the first node, it will refuse to start with this error. That's a fine time to address the problem, apply the change to the config files, get that first node upgraded, and then proceed with the rolling upgrade.

@morgo, your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because our own CI broke, we may need to ease this change in slowly. Here would be my suggestion:

  1. Keep --config-check as is, it is independently very useful.
  2. Change from refuse to start to print a WARN about each incorrect config item.
  3. Add a new flag for --config-strict, which defaults to FALSE. We can document it that enabling it is a best practice, and we may change the default to TRUE in the future. This will result in errors for invalid options.

It is not bulletproof of course, because the default log level is INFO it is possible that the warnings may not be seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the very fact our own CI broke is evidence of the value of this behavior change. Even we weren't keeping track of whether our config files were well-formed!

  1. Your suggestion above is difficult, because the logging system isn't set up yet when the config file is parsed. Either we have to keep around the list of undecoded items and check them later after logging is possible, or we have to pass some other string/structure/flag around, or we have to simply print something to stderr, or ...? @coocood Morgan mentioned that you may have some thoughts on this?

  2. I dislike this. Adding this new flag means that people may start relying on it, and then we have to keep it around as a no-op if we later make that behavior the default.

Copy link
Contributor

@winkyao winkyao Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with Morgan's first and second suggestions. Our users may use old version ansible to upgrade, that doesn't pre-check the config using tidb-server --config-check, we need to keep the cluster stable when rolling upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, it sounds like there's a consensus. Can someone offer a suggestion of the best way to technically implement a "warning" before the logging system is set up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolbe I will create a pull request to your branch later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winkyao I added some comments on your PR. I re-wrote my own interpretation of how I see this working and pushed it to my branch of my fork, please take a look.

Basically I added a kind of unsettling amount of additional stuff to create a custom error in config.go, then use that in the case of failed config validation. That enables checking for the specific error type in main.go, and if configStrict is not enabled, it'll get the string from the warning and keep that until ater logging has been set up.

This is kind of a lot of extra work and adds some ugly things, but the whole idea behind strict config checking (and the --config-strict option) is that this is a temporary situation until we make --config-strict behavior the default (and hopefully only!) behavior of TiDB Server in the future. I think at that point we'd simply remove all this extra stuff with the custom error and strange handling in the server.

Thoughts?

_, err := toml.DecodeFile(confFile, c)
metaData, err := toml.DecodeFile(confFile, c)
if c.TokenLimit <= 0 {
c.TokenLimit = 1000
}

// If any items in confFile file are not mapped into the Config struct, issue
// an error and stop the server from starting.
undecoded := metaData.Undecoded()
if len(undecoded) > 0 && err == nil {
undecodedItems := ""
for _, item := range undecoded {
undecodedItems += fmt.Sprintf(" %s \n", item.String())
kolbe marked this conversation as resolved.
Show resolved Hide resolved
}
err = errors.Errorf("config file %s contained unknown configuration options:\n%s", confFile, undecodedItems)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error generated by errors.Errorf() is attached with a call stack, we don't need to wrap a errors.Trace() in line 391.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me, but the trace is not my code (it was there before, so that a non-existent config file for example would be traced for some reason).

}

return errors.Trace(err)
}

Expand Down
13 changes: 13 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,19 @@ func (s *testConfigSuite) TestConfig(c *C) {

f, err := os.Create(configFile)
c.Assert(err, IsNil)

// Make sure the server refuses to start if there's an unrecognized configuration option
_, err = f.WriteString(`
unrecognized-option-test = true
`)
c.Assert(err, IsNil)
c.Assert(f.Sync(), IsNil)

c.Assert(conf.Load(configFile), ErrorMatches, "(?:.|\n)*unknown configuration option(?:.|\n)*")

f.Truncate(0)
f.Seek(0, 0)

_, err = f.WriteString(`[performance]
[tikv-client]
commit-timeout="41s"
Expand Down
10 changes: 8 additions & 2 deletions tidb-server/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ import (
const (
nmVersion = "V"
nmConfig = "config"
nmConfigCheck = "config-check"
nmStore = "store"
nmStorePath = "path"
nmHost = "host"
Expand All @@ -89,8 +90,9 @@ const (
)

var (
version = flagBoolean(nmVersion, false, "print version information and exit")
configPath = flag.String(nmConfig, "", "config file path")
version = flagBoolean(nmVersion, false, "print version information and exit")
configPath = flag.String(nmConfig, "", "config file path")
configCheck = flagBoolean(nmConfigCheck, false, "check config file validity and exit")

// Base
store = flag.String(nmStore, "mocktikv", "registered store name, [tikv, mocktikv]")
Expand Down Expand Up @@ -142,6 +144,10 @@ func main() {
loadConfig()
overrideConfig()
validateConfig()
if *configCheck {
kolbe marked this conversation as resolved.
Show resolved Hide resolved
fmt.Println("config check successful")
os.Exit(0)
}
setGlobalVars()
setupLog()
setupTracing() // Should before createServer and after setup config.
Expand Down