Skip to content

Test Design of `goconserver` Support

Gᴏɴɢ Jie edited this page Nov 2, 2017 · 15 revisions

Introduction

A new console server facility, goconserver was introduced. Which is developed by CHENG Long. This goconserver is intend to replace the existing console server, conserver used as a part of xCAT for decade.

And the main purpose of this console server replacement is to overcome the functional problem and performance issue in the existing conserver.

The Existing Problem of conserver

  • Cost around 4MiB memory for each compute node. Thus, 8GiB memory consume for 2,000 compute nodes.
  • When a configuration change has to be made, the daemon need to be restart. This cause the console connection of all compute nodes get interrupted. See issue #4043.
  • When a compute node disconnected, the conserver keep retry every 3 seconds and generated a great deal of logs.
    • Around 150 byte/node/3 sec, roughly equals 4 MiB/node/day. For a bunch of disconnected nodes, it will fill up /var file system very quickly.
  • When the ssh authentication key is not configured on the OpenBMC side, there is no way to pass the ssh authentication with password. Thus, in this situation, the console server failed to work at all. See issue #4124.

Test Strategy

Function Verification Test

Scenario 1 - Normal console functionality

Test if the console can work normally in the following conditions

  • Console against OpenPOWER machine via OpenBMC
  • Console against OpenPOWER machine via IPMI
  • Console against KVM guest via ssh to KVM host
  • Console against x86-64 machine via IPMI
  • Console against IBM PowerVM LPAR via HMC

Scenario 2 - Recovery

  • Restart a compute node
  • Restart the OpenBMC on a OpenPOWER machine with OpenBMC
  • Restart the BMC on a OpenPOWER machine with IPMI
  • Disconnect the network between the console server and the OpenBMC/BMC

Scenario 3 - Multiplex

  • Multiple user connect to the console of the same compute node at the same time

Scenario 4 - Stability

  • Leave a compute node with no console outputs for a quite long period of time, say 10 days

Performance Test

Scenario 1 - Memory Cost

  • Measure the memory consume of goconserver for
    • 1 compute node,
    • 2 compute nodes,
    • 5 compute nodes,
    • 10 compute nodes,
    • 100 compute nodes.

Scenario 2 - Number of Open File Handles

  • Measure the number of open file handles goconserver used for
    • 1 compute node,
    • 2 compute nodes,
    • 5 compute nodes,
    • 10 compute nodes,
    • 100 compute nodes.

Stress and Volume Test

Scenario 1 - Throughput and IOPS of hard disk drive may be a bottleneck

For a regular 115,200 baud serial console port, with common 1 start bit, 8 data bits, no parity, and 1 stop bit settings, it may generate 11,520 bytes per second. For 2,000 compute nodes, it may generate around

11,520 byte/sec x 2,000 = 23,040,000 bytes/sec = 21.97 MiB/s

And for a typical enterprise level mechanical hard disk drive, the 4K random write speed is around 20 MiB/s. Thus the throughput may not enough.

And the IOPS of a typical enterprise level mechanical hard disk drive is around 175. Thus write 2,000 console log files may also a challenge.

  • Test against 2,000 compute nodes, each of them generate 11,520 bytes console outputs per seconds.
    • Measure the memory consume
    • Measure the CPU usage
    • Count the number of child processes the daemon generated
    • See if the daemon run stable
    • See if the I/O throughput is good enough

See what happen if the I/O throughput does exceed the ability of the hard disk drive.

Scenario 2 - Limit of Open Files

The default ulimit of open files is 1024. For 2,000 nodes, the console server daemon may need more than 1024 file handlers.

  • Test against 2,000 compute nodes, see what happens if the open files of the daemon reach the upper limit. Does the error handling works well?

Scenario 3 - Out of Memory

See what happens if malloc() failed.

  • Set a lower max memory size with ulimit -m

Environment Requirements

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally