Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node discovery failed due to large "findme" request packet #5177

Closed
immarvin opened this issue May 8, 2018 · 10 comments
Closed

node discovery failed due to large "findme" request packet #5177

immarvin opened this issue May 8, 2018 · 10 comments

Comments

@immarvin
Copy link
Contributor

immarvin commented May 8, 2018

this is a issue reported by customer

These nodes have many many attached disks... several hundreds. Seems like the xcat daemon could not handle such a large xml line... Essentially what happens when you put all these disks in one xml statement somehow xml handling broke in the daemon and it just ignored it. I will be surprised if anyone tested it with such a config with so many disk attached.

The maximum discovery packet size xcatd can handle is 1500 B,

 851                             xCAT::MsgUtils->message("S", "INFO xcatd: fail to notify $clientip that its 'findme' request is b     een processing");
 852                         }
 853
 854                     } else {    # for *now*, we'll do a tiny YAML subset
 855                         if ($data =~ /^resourcerequest: xcatd$/) {
 856                             $socket->send("ackresourcerequest\n", 0, $packets{$pkey}->[0]);
 857                             $tcclients->{$pkey} = { sockaddr => $packets{$pkey}->[0], timestamp => int(time()) }
 858                         }
 859                     }    # JSON maybe one day if important
 860                     if ($quit) { last; }
 861                     while (@hdls = $select->can_read(0)) { # grab any incoming requests during run
 862                         foreach my $hdl (@hdls) {
 863                             if ($hdl == $socket) {
 864                                 $part = $socket->recv($data, 1500);
 865                                 $packets{$part} = [ $part, $data ];
 866
 867                                 #} elsif ($hdl == $sslctl) {
 868                                 #       update_udpcontext_from_sslctl(udpcontext=>$udpcontext,select=>$select);
 869                             }
 870                         }
 871                     }

we need to look at the dodiscovery script, whether it is possible to reduce the packet size by throwing away unneeded information.

@zet809
Copy link

zet809 commented May 8, 2018

Hi, @cxhong , will you pls help to 1st check with customer how does the jumbo findme request create?

@cxhong
Copy link
Contributor

cxhong commented May 15, 2018

waiting for Wesley or his team to recreate this issue

@zet809
Copy link

zet809 commented May 21, 2018

hi, @cxhong , any update for this issue? Thx!

@cxhong
Copy link
Contributor

cxhong commented May 21, 2018

didn't recreate yet. Do u think we can recreated by add more disk to dodisover packet, then manually send to xcat? anyway, I will try this today.

@cxhong
Copy link
Contributor

cxhong commented May 21, 2018

created discopacket has larger size:

-rw-r--r-- 1 root root 15322 May 21 11:55 discopacket

manually ran dodiscover command had no issue.

 cat discopacket.gz | /opt/xcat/share/xcat/netboot/genesis/ppc64/fs/bin/udpcat.awk 172.20.253.31 3001

I don't think size matters here.

SOCK_STREAM: It doesn't really matter too much. If your protocol is a transactional / interactive one just pick a size that can hold the largest individual message / command you would reasonably expect (3000 is likely fine). If your protocol is transferring bulk data, then larger buffers can be more efficient - a good rule of thumb is around the same as the kernel receive buffer size of the socket (often something around 256kB).

@immarvin
Copy link
Contributor Author

hi @cxhong , the data transfered during discovery is compressed file, please run gzip -9 /tmp/discopacket against the larger packet and check whether the size of discopacket.gz exceeded 1500 B, thanks

@cxhong
Copy link
Contributor

cxhong commented May 22, 2018

here is gz file:

# gzip -9 discopacket
# ls -ltr
-rw-r--r-- 1 root root 1841 May 21 11:55 discopacket.gz

@zet809
Copy link

zet809 commented May 23, 2018

hi, @cxhong , does the size of discopacket.gz similar as what customer encountered? And proved no problem in our environment? If that is the case, there must be some other reason.
And, @immarvin , in line 812 of xcatd.pm, it will read 2000B from a socket, but then in line 864, it read 1500B if there are more packets, we shall keep the consistent, right?

@cxhong
Copy link
Contributor

cxhong commented May 23, 2018

the size of their discopacket is 12906B at that time. didn't know the size of gz file, but should exceed 1500B.

@cxhong
Copy link
Contributor

cxhong commented May 24, 2018

Thanks @zet809 , I think the issues is recreated.

  1. xcatd didn't receive "findme" request if discopacket.gz is more than 2000B
-rw-r--r-- 1 root root 2038 May 24 09:29 discopacket.gz
  1. xcatd received "fiindme" request if disopacket.gz file is less than 2000B
-rw-r--r-- 1 root root 1993 May 24 09:32 discopacket.gz
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants