Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential performance issue with 2D-array randomization #2

Open
KasperHesse opened this issue Nov 7, 2023 · 2 comments
Open

Potential performance issue with 2D-array randomization #2

KasperHesse opened this issue Nov 7, 2023 · 2 comments

Comments

@KasperHesse
Copy link

I've been trying to compare PyVSC and constrainedrandom's performance when it comes to randomization of 2D-arrays. The test case is a "checkerboard randomization", where each field can be 0 or 1, but cannot have the same value as its neighbours.

Consider the following implementations

PyVSC
import vsc
import timeit

@vsc.randobj
class Checkers():
  def __init__(self, N):
    self.N = N
    self.board = [vsc.rand_list_t(vsc.rand_uint8_t(), self.N) for _ in range(self.N)]

  @vsc.constraint
  def con_values(self):
    for row in self.board:
      with vsc.foreach(row) as c:
        0 <= c and c <= 1

  @vsc.constraint
  def constraints(self):
    for i in range(1, self.N):
      with vsc.foreach(self.board[i], idx=True) as j:
        if j == 0:
          pass
        (self.board[i][j] != self.board[i-1][j]) and (self.board[i][j] != self.board[i][j-1])

def benchmark():
  numRounds = 10
  for N in (4, 8, 16, 32, 64, 128, 256):
    t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
    print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()
constrainedrandom
from constrainedrandom import RandObj
import timeit

class Checkers(RandObj):
  def __init__(self, N):
    super().__init__(max_iterations=1000)
    self.N = N
    for i in range(self.N):
      self.add_rand_var(f"list{i}", domain=range(2), length=self.N, disable_naive_list_solver=True)

    for i in range(1, self.N):
      for j in range(1, self.N):
        self.add_constraint((lambda row1, row2: row1[j] != row2[j]), (f"list{i-1}", f"list{i}"))
        self.add_constraint((lambda row: row[j-1] != row[j]), (f"list{i}", ))

def benchmark():
  numRounds = 10
  for N in (4, 8, 16, 32, 64, 128, 256):
    t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
    print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()

The average runtimes reported for the two on my machine are as follows. The fields labeled "DNF" took so long that I didn't bother waiting for them to complete

N PyVSC constrainedrandom
4 0.008 0.001
8 0.029 1.265
16 0.112 45.786
32 0.305 DNF
64 1.208 DNF
128 5.371 DNF
256 22.421 DNF

As you can see, the performance of PyVSC seems vastly superior to the performance of constrainedrandom. However, I am unsure whether my implementation with constrainedrandom is optimal, but I haven't found a better way of implementing 2D-arrays and randomization thereof.

Is there a better way of constraining this problem that would lead to better performance with constrainedrandom?

@KasperHesse
Copy link
Author

I noticed that I had set max_iterations=1000 due to some issues with a previous constraint attempt. Removing that parameter seems to speed up the constraint solver, probably because it more quickly stops using the naive solver, but is still markedly slower than PyVSC.

N=   4: Runtime: 0.004006109999863838
N=   8: Runtime: 0.8024601899998742
N=  16: Runtime: 1.7624544599999354
N=  32: Runtime: 7.645777930000077
N=  64: Runtime: 73.60141107

@will-keen
Copy link
Contributor

Hi Kasper,

Thanks very much for raising this. 2D arrays (or lists of lists in python terms) aren't a use case we've yet required when using constrainedrandom internally in Imagination, so it's helpful to get your feedback.

You've raised a valid concern about performance here. In general, constrainedrandom isn't very good at cases where a small number of choices determine the only correct result for a large number of other variables. vsc does do better in these sorts of cases, because it considers the problem in a more interconnected way than constrainedrandom does (which is exactly the thing that slows it down in other cases).

I've tried to investigate this particular case and have written up what I found. Hopefully it's helpful, please let me know your thoughts.

Your checkerboard example

The example you've given is a bit of a corner-case in that the first choice you make for any cell in the checkerboard determines the result of all the other cells. If I choose a 1 or 0 to go anywhere, it effectively determines the results for every other cell immediately. vsc therefore performs better - constrainedrandom always really struggles with this kind of case due to its reliance on randomizing and checking.

I was able to reproduce the behaviour you showed in your original example with the code you provided.

My results from directly running your code:

vsc results
N=   4: Runtime: 0.003800742000021273
N=   8: Runtime: 0.01277666030000546
N=  16: Runtime: 0.04529272580039105
N=  32: Runtime: 0.18716902040032438
N=  64: Runtime: 1.0156315014006396
N= 128: Runtime: 4.4076643840002365
N= 256: Runtime: 24.720914476300095
constrainedrandom results
N=   4: Runtime: 0.00049386870014132
N=   8: Runtime: 1.050551228599943
N=  16: Runtime: 7.706638661499892
N=  32: Runtime: 47.63703318400003

(DNF for N=64, 128, 256)

Making constrainedrandom faster

By removing the max_iterations=1000 and the disable_naive_list_solver=True, I got better results:

constrainedrandom but faster code
class Checkers(RandObj):
    def __init__(self, N):
        super().__init__()
        self.N = N
        for i in range(self.N):
            self.add_rand_var(f"list{i}", domain=range(2), length=self.N)

        for i in range(1, self.N):
            for j in range(1, self.N):
                self.add_constraint((lambda row1, row2: row1[j] != row2[j]), (f"list{i-1}", f"list{i}"))
                self.add_constraint((lambda row: row[j-1] != row[j]), (f"list{i}", ))


def benchmark():
    numRounds = 10
    for N in (4, 8, 16, 32, 64, 128, 256):
        t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
        print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()
constrainedrandom but faster results
N=   4: Runtime: 0.0007822489002137445
N=   8: Runtime: 0.2898493679000239
N=  16: Runtime: 0.1423995040000591
N=  32: Runtime: 0.4734258158998273
N=  64: Runtime: 2.1201752419001423
N= 128: Runtime: 9.272784179399604
N= 256: Runtime: 44.2909998391995

This is still about 2x slower than vsc for N >= 64, but I think you'd agree it's quite a bit better than the "did not finish" behaviour in the original example.

Please can you let me know whether you get comparable results running this on your machine? I'm just running it locally on my laptop.

Personally I'd just put this down as a case where vsc wins in terms of performance. Based on current user experience I would trade off slowness in this kind of case for speed in other cases.

There are ways to make this go much faster by manually optimizing the problem, e.g. by constraining the input space for each row, but at that point you might as well just write it procedurally...

Procedural solution

To be 100% honest with you, the original problem you stated doesn't really sound like a problem well-suited to either vsc or constrainedrandom - there's really only one element to randomize, and it's quite sub-optimal to treat the whole thing as a randomization problem.

If I had that problem in a real software project, I would write something procedural, like this:

Procedural solution
import random
import timeit

class Checkers():
    def __init__(self, N):
        self.N = N
        self.board = []

    def randomize(self):
        self.board = []
        top_left = random.getrandbits(1)
        for i in range(self.N):
            self.board.append([None for _ in range(self.N)])
            for j in range(self.N):
                if i == 0 and j == 0:
                    self.board[i][j] = top_left
                else:
                    if j == 0:
                        self.board[i][j] = 0 if self.board[i-1][j] else 1
                    else:
                        self.board[i][j] = 0 if self.board[i][j-1] else 1


def benchmark():
    numRounds = 10
    for N in (4, 8, 16, 32, 64, 128, 256):
        t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
        print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()
Procedural results
N=   4: Runtime: 3.370100603206083e-06
N=   8: Runtime: 8.603600144851953e-06
N=  16: Runtime: 2.861399989342317e-05
N=  32: Runtime: 9.815220037125982e-05
N=  64: Runtime: 0.00036479769987636247
N= 128: Runtime: 0.0014662143003079109
N= 256: Runtime: 0.005748119200143264

This is orders of magnitude faster than using a constrained random approach for this problem.

A different two-dimensional array problem

I want to engage with the original point about two-dimensional array performance, so let me propose a slightly different case, where I think a constrained randomization problem is a much more user-friendly experience to write.

Let's take the checkerboard example, but let's say each element can have the value 0 to 4, and the sum of each element and all its neighbours must be less than or equal to 4.

I coded this in vsc and in constrainedrandom and got similar results:

vsc code
import vsc
import timeit


@vsc.randobj
class Checkers():
    def __init__(self, N):
        self.N = N
        self.board = [vsc.rand_list_t(vsc.rand_uint8_t(), self.N) for _ in range(self.N)]

    @vsc.constraint
    def con_values(self):
        for row in self.board:
            with vsc.foreach(row) as c:
                0 <= c and c <= 4

    @vsc.constraint
    def constraints(self):
        for i in range(1, self.N):
            with vsc.foreach(self.board[i], idx=True) as j:
                if j == 0:
                    pass
                (self.board[i][j] + self.board[i-1][j] < 5) and (self.board[i][j] + self.board[i][j-1] < 5)


def benchmark():
    numRounds = 10
    for N in (4, 8, 16, 32, 64, 128, 256):
        t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
        print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()

constrainedrandom code
from constrainedrandom import RandObj
import timeit


class Checkers(RandObj):
    def __init__(self, N):
        super().__init__()
        self.N = N
        for i in range(self.N):
            self.add_rand_var(f"list{i}", domain=range(5), length=self.N)

        for i in range(1, self.N):
            for j in range(1, self.N):
                self.add_constraint((lambda row1, row2: row1[j] + row2[j] < 5), (f"list{i-1}", f"list{i}"))
                self.add_constraint((lambda row: row[j-1] + row[j] < 5), (f"list{i}", ))


def benchmark():
    numRounds = 10
    for N in (4, 8, 16, 32, 64, 128, 256):
        t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals())
        print(f"N={N:4d}: Runtime: {t/numRounds}")

benchmark()

vsc results
N=   4: Runtime: 0.005554529099754291
N=   8: Runtime: 0.022379124100552872
N=  16: Runtime: 0.06850886350002838
N=  32: Runtime: 0.30377212050007074
N=  64: Runtime: 1.354998340599559
N= 128: Runtime: 5.976018746299815
N= 256: Runtime: 31.340037856700654
constrainedrandom results
N=   4: Runtime: 0.005820018600206822
N=   8: Runtime: 0.000641144199471455
N=  16: Runtime: 0.006594730899814749
N=  32: Runtime: 0.16832221820004634
N=  64: Runtime: 1.5169856319000246
N= 128: Runtime: 6.13921610949983
N= 256: Runtime: 27.28949799520051

As you can see, the results are pretty similar between the two, it's a narrow victory for constrainedrandom but not by a lot, so could be in the noise.

The reason to include this is to show that the class of problem matters a lot, not just the list dimensionality.

Other cases where constrainedrandom performs badly

Constrainedrandom performs badly when a constraint attempts to set a single legal value.

Consider a 32-bit integer, which we want to have the value 5. If we write this as a constrained randomization problem, vsc will beat constrainedrandom every time. In fact, constrainedrandom will fail to produce a result.

vsc code
import vsc

@vsc.randobj
class RandInt:

    def __init__(self):
        self.x = vsc.rand_int32_t()

    @vsc.constraint
    def val_c(self):
        self.x == 5

r = RandInt()
r.randomize()
constrainedrandom code
from constrainedrandom import RandObj

class RandInt(RandObj):

    def __init__(self):
        super().__init__()
        self.add_rand_var('x', bits=32)
        self.add_constraint(lambda a : a == 5, ('x'))

r = RandInt()
r.randomize()

If we provide the concrete value to constrainedrandom's randomize() method using randomize(with_values={'x': 5}) instead of adding a constraint, then it will just assign the variable and be done.

But you might also think, why not just create a variable and assign it the value 5? Why do I bring up this case which you wouldn't code as a randomization problem? I guess the point is just to say that constrainedrandom often loses to vsc in these kinds of cases where a variable is effectively assigned a value in a constraint.

I haven't looked to fix this behaviour, mainly because I'd just advise someone not to use randomization in this case.

You might say this is vsc being better at "constraint-based programming", which it is, but isn't really what constrainedrandom intends to be good at.

Summary

There are definitely cases where vsc will always beat constrainedrandom, and if you want a tool that performs "constraint-based programming", then vsc is definitely more complete than constrainedrandom. It's worth noting that vsc is much more user-friendly in handling lists of lists, too. Imagine if you had to do another dimension of lists in constrainedrandom - it would be very painful to define all the variable names!

However, I have optimized it for the cases that we generally use it for internally, which are problems where we want a randomized result. In benchmarking, it's always performed very favourably compared to vsc in the sorts of cases we use it for (e.g. see the instruction generation case in the benchmarks/ directory). (This does remind me I need to overhaul how those benchmarks are defined, as it's very clunky at the moment.)

Maybe it does need to support lists of lists better, though we haven't required this yet. Let me know if you have suggestions for this. For what it's worth, this is only deployed in one area internally and certainly doesn't replace doing constrained random stuff in SystemVerilog.

I hope that helps - please let me know any further thoughts you have, or any other performance cases you find where constrainedrandom is not adequate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants