-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fresh single-host install: "etcdserver: publish error: etcdserver: request timed out" and "Error: context deadline exceeded" #9670
Comments
Does this still happen without LXC? |
Hm. Unfortunately, I'm not sure right now, because all the test hosts I have access to in the lab are LXCs. I'll dig up some spare hardware and see what I can figure out. |
Okay. Looks like it doesn't happen without LXC. Using the same config file (save for adjusting hostnames and data-dir path appropriately for the different host), the log looks like:
|
Okay, that gave me an idea and I realized I've been barking up the wrong tree. It's not the LXC that does it; it's the location of the data directory. /mnt/nas is a CIFS filesystem mounted from our big fileserver. Whether in-LXC or not, putting the data directory on the network filesystem triggers the error, and putting it locally does not. |
What error do you see from etcd? All etcd needs is write access to the filesystem whether it's mounted locally or over the network. I don't think this is an etcd issue. If you can provide an easy way to reproduce, we can take a look. Please reopen or create a new one if you still have issues. Thanks. |
@gyuho The error in question is the Also, it appears that etcd does have write access to the filesystem, as the data directory with its subdirectories (member, with subdirectories in turn snap and wal) are created successfully as are the files in them, appearing identical to those in a local data directory. File permissions back this up, and while etcd's running, these files show as opened in As such, and given that I haven't had other issues reading or writing the mounted filesystem so far, I'm fairly confident there's something etcd-specific going on. All I have to do to reproduce this is: 1, Install etcd (latest 3.3.4 release) on a freshly-installed and updated Ubuntu Xenial.
...hm. Okay, on the Ubuntu test host, when it's creating the data directory on first run, it produces a different error (which did not occur in the LXC):
Running it a second time immediately (or subsequently) reverts to the former error as seen above. (This might suggest this is related to #6984 ?) I've attached I believe the error on first-run is relating to the known issue that the CIFS filesystem doesn't support |
(Evidently can't reopen this myself; I'll not create a new one for now to avoid repo spam.) |
@cerebrate Try this on your mounted disk? package main
import (
"fmt"
"os"
)
func main() {
p := "aaa.txt"
f, err := os.OpenFile(p, os.O_WRONLY|os.O_CREATE, 0600)
if err != nil {
panic(err)
}
fmt.Println("sync:", f.Sync())
} And see if it returns the same error message? |
@gyuho No, doesn't seem so:
|
To verify this, let me add more detailed error logging around fsync code paths. |
@cerebrate Could you try again with current master branch? To build, you need Go 1.10+. You can just run Also need to enable new logger for more detailed error message:
Thanks! |
Sure, can do. First run:
Second run:
|
https://github.com/coreos/etcd/blob/2ad0acdea8bb6cc820256e8c2f6f87547303f28b/wal/wal.go#L200-L210 This verifies that:
Not sure how etcd would work around this... |
As far as I can tell from my digging, other projects that have had this issue (with CIFS and other network file systems) work around it by logging a warning telling you to set the That'd certainly work for me, seeing as the NAS in question has all the battery backup it needs to prevent data loss at its end, but I can understand if you don't want to do that. On the other hand, my other option is to use NFS, which on Linux supports but dummies out fsync() on directories anyway (per https://github.com/torvalds/linux/blob/master/fs/nfs/dir.c#L928-L931 ), so it's not like using the network file system that supposedly supports fsync() would gain anyone anything. |
Given that we will keep |
Sure, I can do that. |
@cerebrate and @gyuho Because when I tried to bring up just etcd server (in above mentioned version) on cifs without cache, I get the below error. |
I've just started playing with etcd, so I've attempted to install the latest release (at this time, 3.3.4) on a single host (actually, an LXC container) to experiment with. Trouble is, I'm getting this error ("etcdserver: publish error: etcdserver: request timed out") showing up repeatedly in the log once the server is started, and attempting to use the etcd with, for example,
etcdctl put foo bar
produces the error "Error: context deadline exceeded".The log is as follows:
...and the configuration file I am using is as follows:
Curiously,
netstat -a
does not seem to show any listening sockets on tcp ports 2379 and 2380 for IPv4, only for IPv6:Any ideas as to what might be going on?
The text was updated successfully, but these errors were encountered: