Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification components make GHDL crash when too large objects are allocated #494

Open
umarcor opened this issue May 18, 2019 · 8 comments

Comments

@umarcor
Copy link
Member

umarcor commented May 18, 2019

There are some issues with GHDL and the allocation of large arrays: ghdl/ghdl#611, ghdl/ghdl#812, ghdl/ghdl#822, #342...

Usually, the user can work around them by carefully designing the testbench. However, sometimes VUnit tries to grow some internal variable above the limit, which makes GHDL crash. This has been reported in ghdl/ghdl#752, where @tgingold pointed out that the offending line is new_ptr := new string'(1 to length => character'low);.

The line is located in string_prt_pkg, which is used to create and handle general purpose arrays of bytes. It is used for creating/reallocating/resizing the arrays:

I found that, almost all the time, arrays are created with size/length 0. Therefore, the error is likely to be produced during resizing. This can be checked by adding a report to resize that prints the required size.

As commented, the fact that verification components can request some of their internal variables to grow with 'no limit', can make GHDL crash. This can be easily reproduced with a simple modification to example array_axis_vcs:

  constant block_length : natural := 256*128*4;
--    for y in 0 to m_I.height-1 loop
--      for x in 0 to m_I.width-1 loop
--        wait until rising_edge(clk);
--        if x = m_I.width-1 then last := '1'; else last := '0'; end if;
--        push_axi_stream(net, master_axi_stream, std_logic_vector(to_signed(m_I.get(x,y), data_width)) , tlast => last);
--      end loop;
--    end loop;

    for y in 0 to block_length-1 loop
      push_axi_stream(net, master_axi_stream, std_logic_vector(to_signed(0, data_width)) , tlast => last);
    end loop;
--    for y in 0 to m_O.height-1 loop
--      for x in 0 to m_O.width-1 loop
--        pop_axi_stream(net, slave_axi_stream, tdata => o, tlast => last);
--        if (x = m_O.width-1) and (last='0') then
--          error("Something went wrong. Last misaligned!");
--        end if;
--        m_O.set(x,y,to_integer(signed(o)));
--      end loop;
--    end loop;

    for y in 0 to block_length-1 loop
      pop_axi_stream(net, slave_axi_stream, tdata => o, tlast => last);
    end loop;

The testbench has two main processes: one for writing data to an axi stream master verification component, and another one for reading data from an axi stream slave verification component. The design just connects both verification components through a FIFO. Since the 'reading' process is delayed 50 clock cycles, and because push_axi_stream is not constrained, VUnit grows the model inside the master verification component up to 256*128*4-size_of_fifo bytes.

A possible workaround is to add a delay of 2 clock cycles between successive calls to push_axi_stream. This will ensure that the master verification component won't need to contain more than 50-size_of_fio bytes. However, in more complex setups it might not be as easy to spot where to apply the fix.

I wonder if there is some mechanism to set a limit to the size of the arrays that a verification component can allocate. In case the limit is reached, I'd expect calls to push_axi_stream to be blocking/blocked until enough data is read. I.e., I thought that verification components would behave as software FIFOs.

@umarcor
Copy link
Member Author

umarcor commented May 20, 2019

Upon further inspection, the example above works with v4.0.8 but it fails after 0500e20 (#420).

@LarsAsplund
Copy link
Collaborator

I have yet to recreate this and look into the details but are you aware that you can specify your own actor when creating a new VC? For that actor you can specify a limited inbox size that would create backpressure when full. Would that solve your issue?

@eschmidscs
Copy link
Contributor

We have just solved some GHDL related memory issues by increasing the stack size with ulimit - as proposed in at least one of the bug reports IIRC.
This can be tested with a simple loop allocating vunit memory in chunks. It will crash at some point with ghdl. This limit scales directly with ulimit.

While this does probably not solve the reported issue here, it might help to avoid similar problems.

@umarcor
Copy link
Member Author

umarcor commented May 20, 2019

I have yet to recreate this and look into the details but are you aware that you can specify your own actor when creating a new VC? For that actor you can specify a limited inbox size that would create backpressure when full. Would that solve your issue?

No, I was not aware. And yes, it might solve my issue. I will try it.

Nevertheless, I could work around it by reverting all the modifications to vunit/vhdl/verification_components/src/axi_stream_master.vhd since v4.0.8 (i.e. just before #420):

git checkout -b tmp origin/master
git checkout v4.0.8 -- vunit/vhdl/verification_components/src/axi_stream_master.vhd

Therefore, I think that some 'memory leak' might have been introduced in #420 which I have not been able to find. But it is strange to me that GHDL does not crash because it runs out of memory, instead it seems to fail when trying to allocate a queue (string) of size larger or equal to 2**23 (8MB).

We have just solved some GHDL related memory issues by increasing the stack size with ulimit - as proposed in at least one of the bug reports IIRC.

The issue here is that VUnit v4.0.8 works without any issue, but later versions (master) make GHDL crash. Even if it is a bug in GHDL, I think it is worth knowing which of the changes in VUnit triggers the crash.

Anyway, thanks a lot for the hint. Although I was aware of ulimit, I had not realized earlier that 8MB is precisely the stack limit (I read the wrong line and I thought that it was unlimited). I will try increasing it.

@umarcor
Copy link
Member Author

umarcor commented May 20, 2019

I can confirm that ulimit -s unlimited avoids the crash. Furthermore, the comment by @eschmidscs applies:

This can be tested with a simple loop allocating vunit memory in chunks. It will crash at some point with ghdl. This limit scales directly with ulimit.

With the default ulimit, the example above requires up to 48.7 MB of memory. When unlimited, it uses up to 56.7 MB. This is coherent with the fact that VUnit increments the allocated length for a given string by duplicating the length. If the buffer length is 4 times larger, up to 218.7 MB are allocated.

For that actor you can specify a limited inbox size that would create backpressure when full

I tried:

  constant actor: actor_t := new_actor(inbox_size => 64, outbox_size => 64);
  constant master_axi_stream : axi_stream_master_t := new_axi_stream_master(data_length => data_width, actor => actor);
  constant slave_axi_stream : axi_stream_slave_t := new_axi_stream_slave(data_length => data_width, actor => actor);

But it still crashes/fails.

@LarsAsplund
Copy link
Collaborator

@umarcor You need separate actors for the two VCs or they will share inbox and compete over trying to serve both push and pop commands.

@umarcor
Copy link
Member Author

umarcor commented May 20, 2019

@umarcor You need separate actors for the two VCs or they will share inbox and compete over trying to serve both push and pop commands.

I tried this now:

  constant master_axi_stream : axi_stream_master_t := new_axi_stream_master (data_length => data_width, actor => new_actor(inbox_size => 64, outbox_size => 64));
  constant slave_axi_stream  : axi_stream_slave_t  := new_axi_stream_slave  (data_length => data_width, actor => new_actor(inbox_size => 64, outbox_size => 64));
Starting lib.tb_axis_loop.test
Output file: vunit_out/test_output/lib.tb_axis_loop.test_260bb5c8c675e898eca5dc9024a4420ede12c0bc/output.txt
  DEBUG - Started process with pid=16571: '/usr/local/bin/ghdl --elab-run --std=08 --work=lib --workdir=/src/examples/vhdl/array_axis_vcs/vunit_out/ghdl/libraries/lib -P/src/examples/vhdl/array_axis_vcs/vunit_out/ghdl/libraries/vunit_lib -P/src/examples/vhdl/array_axis_vcs/vunit_out/ghdl/libraries/osvvm -P/src/examples/vhdl/array_axis_vcs/vunit_out/ghdl/libraries/lib -o /src/examples/vhdl/array_axis_vcs/vunit_out/test_output/lib.tb_axis_loop.test_260bb5c8c675e898eca5dc9024a4420ede12c0bc/ghdl/tb_axis_loop-tb tb_axis_loop tb -gtb_path=/src/examples/vhdl/array_axis_vcs/src/test/ -grunner_cfg=active python runner : true,enabled_test_cases : test,output path : /src/examples/vhdl/array_axis_vcs/vunit_out/test_output/lib.tb_axis_loop.test_260bb5c8c675e898eca5dc9024a4420ede12c0bc/,tb path : /src/examples/vhdl/array_axis_vcs/src/test/,use_color : true --assert-level=error'
  DEBUG - Process with pid=16571 terminated with code=255
fail (P=0 S=0 F=1 T=1) lib.tb_axis_loop.test (15.5 seconds)

  DEBUG - TestRunner: Leaving
==== Summary =================================
fail lib.tb_axis_loop.test (15.5 seconds)
==============================================
pass 0 of 1
fail 1 of 1
==============================================
Total time was 15.5 seconds
Elapsed time was 15.5 seconds
==============================================
Some failed!

I pushed the modifications to branch bug-stream-stack. You can test it in docker with:

mcode seems to work:

$(command -v winpty) docker run --rm -it ghdl/ext:vunit bash -c "\
apt update -y && apt install -y git && \
git clone --recurse-submodule -b bug-stream-stack https://github.com/dbhi/vunit.git && \
cd vunit/examples/vhdl/array_axis_vcs/ && \
python3 run.py -v --log-level debug"

LLVM fails:

$(command -v winpty) docker run --rm -it ghdl/ghdl:ubuntu18-llvm-5.0 bash -c "\
apt update -y && apt install -y git python3 python3-pip && \
pip3 install colorama && \
git clone --recurse-submodule -b bug-stream-stack https://github.com/dbhi/vunit.git && \
cd vunit && \
export PYTHONPATH=\$(pwd) && \
cd examples/vhdl/array_axis_vcs/ && \
python3 run.py -v --log-level debug"

@umarcor
Copy link
Member Author

umarcor commented Nov 30, 2019

Now both mcode and LLVM seem to fail:

@LarsAsplund, would you mind having a look at dbhi@1d12db7? According to the comments above, it should limit boxes to 64 elements; hence, either of the loops, the UUT or two of them should be blocked before GHDL crashing due to stack overflow. However, it is still crashing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants