Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic buffer calc] Support dynamic buffer calculation #6194

Merged
merged 10 commits into from
Dec 13, 2020

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Dec 11, 2020

- Why I did it
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules

- How I did it

  1. Introduce field buffer_model in DEVICE_METADATA|localhost to represent which buffer model is running in the system currently:
    • dynamic for the dynamic buffer calculation model
    • traditional for the traditional model in which the pg_profile_lookup.ini is used
  2. Add the tables required for the feature:
    • ASIC_TABLE in platform/<vendor>/asic_table.j2
    • PERIPHERAL_TABLE in platform/<vendor>/peripheral_table.j2
    • PORT_PERIPHERAL_TABLE on a per-platform basis in device/<vendor>/<platform>/port_peripheral_config.j2 for each platform with gearbox installed.
    • DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
    • Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
  3. Copy the newly introduced j2 files into the image and rendering them when the system starts
  4. Update the CLI options for buffermgrd so that it can start with dynamic mode
  5. Fetches the ASIC vendor name in orchagent:
    • fetch the vendor name when creates the docker and pass it as a docker environment variable
    • buffermgrd can use this passed-in variable
  6. Clear buffer related tables from STATE_DB when swss docker starts
  7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
  8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
    Update the buffer settings for dynamic buffer calculation

Signed-off-by: Stephen Sun stephens@nvidia.com

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

1. add tables required for the feature:
 - ASIC_TABLE in platform/mellanox/asic_table.j2 (Mellanox specific)
 - PERIPHERAL_TABLE in platform/mellanox/peripheral_table.j2 (Mellanox specific)
 - PORT_PERIPHERAL_TABLE on a per-platform basis in device/mellanox/x86_64-mlnx_msn3800-r0/port_peripheral_config.j2 (Mellanox specific)
 - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
 - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
2. copy the newly introduced j2 files into image and rendering them when system starts
3. update the CLI options for buffermgrd so that it can start with dynamic mode
4. Adjust the order in which swss daemons start, making buffermgrd start before orchagent.
   This is to make sure buffermgrd can get everything ready before orchagent starts especially during warm reboot.
5. Optimize the way in which orchagent fetches the asic vendor name:
 - fetch the vendor name when creates the docker and pass it as a docker environment variable
 - orchagent and buffermgrd can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
   Update the buffer settings for dynamic buffer calculation

Signed-off-by: Stephen Sun <stephens@nvidia.com>
1. If all the buffer configuration aligns the default, use dynamic
buffer calculation mode. Otherwise, use the traditional mode
2. Dynamic mode is adopted in switches newly installed from scratch and
traditional mode in switches installed from ninigraph
This is done by:
- introducing a wrapper to buffermgrd, which enables it to test which
mode is adopted and feed buffermgrd with corresponding CLI arguments
- preparing default buffer configuration templates for both dynamic and
traditional mode for each SKU
- introduce "buffer_model" in DEVICE_METADATA|localhost for mellanox platform

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- For Mellanox, dynamic by default
- For other vendors, traditional mode.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
…able

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
fi
if [ ! -f /etc/sonic/peripheral_table.json ] && [ -f /usr/share/sonic/device/$PLATFORM/port_peripheral_config.j2 ]; then
sonic-cfggen -d -t /usr/share/sonic/device/$PLATFORM/port_peripheral_config.j2 > /etc/sonic/peripheral_table.json
fi
Copy link
Collaborator

@qiluo-msft qiluo-msft Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this block into inside swss container? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the asic table and peripheral table won't be changed on-the-fly, I'd like to have them called once only when the docker is created instead of called each time the swss docker starts.
Let me know if there are other concerns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems here is called when the docker swss is started, not created.
You already check the files existing, so not too much time wasted after file created.


In reply to: 541481038 [](ancestors = 541481038)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.
Let's move to buffermgrd.sh. Put the code generating asic_table.json and peripheral_table.json just ahead of the place where it is used.

Copy link
Collaborator Author

@stephenxs stephenxs Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just investigated and realized it’s will be very difficult to move this part into swss docker because it requires asic_table.j2 and peripheral_table.j2 to be copied in docker swss. However, it’s defined on a per vendor basis and in platform/<vendor> dir, which makes it difficult to copy into docker in the Dockerfile.j2.
So we have to have this part in docker_image_control.
Meanwhile, I also noticed that the PLATFORM isn't defined in this function. This can be achieved by moving this part into start() function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "it’s defined on a per vendor basis and in platform/ dir, which makes it difficult to copy into docker in the Dockerfile.j2"
I checked docker inspect swss and found /usr/share/sonic/device/$PLATFORM/ is already mapped into swss container.

            {
                "Type": "bind",
                "Source": "/usr/share/sonic/device/x86_64-mlnx_msn2700-r0",
                "Destination": "/usr/share/sonic/platform",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },

Mounting is happened after running a container, so I did not propose modify Dockerfile.j2, instead, modify dockers/docker-orchagent/docker-init.sh.

I think you can treat asic_table.j2 the same way as port_peripheral_config.j2. Why they are copied into different folders?


In reply to: 541513109 [](ancestors = 541513109,541481038)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The port peripheral config is defined on a per platform basis because each platform has its unique configuration. But the asic_table.j2 is shared among all platforms of the same vendor. It should not be copied to each of the platform folders.

@@ -243,6 +250,11 @@ start() {
echo "Creating new ${DOCKERNAME} container with HWSKU $HWSKU"
{%- endif %}

{%- if docker_container_name == "swss" %}
# Obtain the vendor name
ASIC_VENDOR=`$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`
Copy link
Collaborator

@qiluo-msft qiluo-msft Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into inside swss container? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the previous one. I'd like call this once during docker creation instead of many times.

@@ -1,6 +1,7 @@
{
"DEVICE_METADATA": {
"localhost": {
"buffer_model": {% if default_buffer_model == "dynamic" %}"dynamic"{% else %}"traditional"{% endif %},
Copy link
Collaborator

@qiluo-msft qiluo-msft Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traditional [](start = 90, length = 11)

dynamic vs static? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic / traditional is the term we already agreed on, isn't it?

@@ -392,6 +392,19 @@ sudo cp $BUILD_TEMPLATES/buffers_config.j2 $FILESYSTEM_ROOT_USR_SHARE_SONIC_TEMP
# Copy the qos configuration template
sudo cp $BUILD_TEMPLATES/qos_config.j2 $FILESYSTEM_ROOT_USR_SHARE_SONIC_TEMPLATES/

# Copy the templates for dynamically buffer calculation
{% if sonic_asic_platform == "mellanox" or sonic_asic_platform == "vs" %}
Copy link
Collaborator

@qiluo-msft qiluo-msft Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonic_asic_platform [](start = 43, length = 19)

What is the future expectation on other platform?
If nothing, why need it for vs? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe other vendors will support the dynamic model in the future.
Meanwhile, we need this to verify this feature on vs platform.
By default, the dynamic model is disabled on vs platform and is enabled when the test cases of this feature are executed.

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs requested a review from qiluo-msft December 12, 2020 04:57
- remove vs related stuff from Mellanox specific asic_table.json and
peripheral_table.json
- fix minor errors

Signed-off-by: Stephen Sun <stephens@nvidia.com>
{%- if DEVICE_METADATA is defined and DEVICE_METADATA['localhost']['platform'] is defined %}
{%- set platform = DEVICE_METADATA['localhost']['platform'] %}
{%- else -%}
{%- set platform = "vs-platform" %}
Copy link
Collaborator

@qiluo-msft qiluo-msft Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue, move vs related part into platform/vs/peripheral_table.j2. Please don't mix platforms into mellanox folder. #Closed

Remove vs stuff from Mellanox related files and fix indent

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

retest vs please

@qiluo-msft qiluo-msft merged commit e010d83 into sonic-net:master Dec 13, 2020
@stephenxs stephenxs deleted the dynamic-buffer-calculation branch December 13, 2020 22:21
@@ -31,3 +31,5 @@ $(SYNCD)_RDEPENDS += $(MLNX_SAI)

# Inject mlnx sdk libs to platform monitor
$(DOCKER_PLATFORM_MONITOR)_DEPENDS += $(APPLIBS) $(SX_COMPLIB) $(SXD_LIBS) $(SX_GEN_UTILS) $(PYTHON_SDK_API) $(APPLIBS_DEV) $(SX_COMPLIB_DEV) $(SXD_LIBS_DEV) $(SX_GEN_UTILS_DEV)

export SONIC_BUFFER_MODEL=dynamic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change build time config in component makefile, if you need, change the pipeline command line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Qi,
Yes, it works to change the CLI option but what's the benefit to change a working code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants