Multithreading in Processing

Hello, I am working on a Processing-based editor for Conway’s Game of Life, basically like a Golly-lite with a few basic features like pattern loading, grid editing, speed up/down, etc..

Full disclosure, I have used ChatGPT for my code.

(For those interested in this question but don’t know what CGOL / cellular automata are: You essentially take a grid of cells which can be either “alive” or “dead”. With a pair of rules, you determine how many living neighbors a dead cell needs to come alive on the next frame, and how many living neighbors a living cell needs to stay alive on the next frame. In my code, these cells are represented in a boolean array)

The issue at hand is that it is not like a Golly-lite, it’s more of a Golly-magnum in terms of how incredibly unperformant it is. That’s because I am not very good at programming, but the point is: I’ve now dabbled in multithreading in the attempt to somewhat optimise my code. I figured that the greatest performance bottleneck is the fact that every single cell of the grid is checked every frame in one for loop, so- what if I can distribute the workload to multiple ones, depending on the amount of cores in the user’s CPU? Only that this has had no positive effect on my performance. Average frametimes are the same, if not worse in some cases, than the old code. So I do wonder why.

This is my old code. It counts the neighbors of every single cell and then writes its new status into a new boolean array, which is then used for the next frame.

for (int x = 0; x < cols; x++) {
      for (int y = 0; y < rows; y++) {
        int neighbors = countLiveNeighbors(x, y);
        if (grid[x][y]) {
          nextGrid[x][y] = inArray(neighbors, currentRule[1]);
        } else {
          nextGrid[x][y] = inArray(neighbors, currentRule[0]);
        }
      }
    }

 boolean[][] temp = grid;

 cache.addLast(cloneGrid(grid)); //This is for a rewind feature
 if (cache.size() > maxCache) {
   cache.removeFirst();
 }

 grid = nextGrid;
 nextGrid = temp;

There are several reasons why my old code is slow and I still need to figure out ways to deal with those, but one idea I had was multithreading, something I haven’t really actively implemented into a program before.

futures.clear();

    for (int i = 0; i < cores; i++) { //The grid is sliced into horizontal chunks, and each CPU core is supposed to handle one of these chunks.
      final int startY = i * sliceHeight;
      final int endY = (i == cores - 1) ? rows : startY + sliceHeight;

      print("Core " + i + " - StartY: " + startY + " endY: " + endY + "\n"); //Just a debug line to see if the multithreading was even working. Didn't impact performance

      Future<?> f = executor.submit(() -> { //Same deal as above, but now it's run by each core individually for their chunk
        for (int y = startY; y < endY; y++) {
          for (int x = 0; x < cols; x++) {
            int neighbors = countLiveNeighbors(x, y);
            if (grid[x][y]) {
              nextGrid[x][y] = inArray(neighbors, currentRule[1]);
            } else {
              nextGrid[x][y] = inArray(neighbors, currentRule[0]);
            }
          }
        }
      }
      );

      futures.add(f);
    }

    for (Future<?> f : futures) {
      try {
        f.get(); //maybe this is the pain point? but waiting for all of the threads should still be faster than loading it all on one, right?
      }
      catch (Exception e) {
        e.printStackTrace();
      }
    }

 boolean[][] temp = grid;

 cache.addLast(cloneGrid(grid)); //This is for a rewind feature
 if (cache.size() > maxCache) {
   cache.removeFirst();
 }

 grid = nextGrid;
 nextGrid = temp;

Do you need to know the dead/alive status of every cell at once.
Or would it be OK to check one cell and update its status, later another cell etc.

“Later” is relative, it depends on how fast everything works. But if you have to check a matrix of 1000x1000 cells and it takes 0.01 millisecond (as an example, I have no idea) per check it still takes 10 seconds to test all.

What are the number of rows and columns? I can try to figure something out.

Note:
No experience with multithreading so can’t advise on that.

1 Like
  1. Technically yes, since you can set custom rules that may include something like “a dead cell needs 0 living neighbors to come alive”, so just because of that, knowing the state of every cell is important. In practice, when using actual sensible rules (like the standard one), a dead cell that doesn’t have any living neighbors will definitely stay dead on the next frame as well, so only keeping track of living cells and their neighbors would actually be more efficient, yeah
  2. Well, the number of rows and columns can be customised as well, but my recommended standard is around 200 x 120 (so 24k cells in total.) With my old code, the frametime for that size was around 15ms. The maximum I allow is 480 x 480 (230k cells), which got around 170ms. Makes sense
  3. I actually found out what the big performance issue is, and the computing part of the equation is actually not an issue at all, it’s the DRAWING. For some reason, I made it so that every cell is drawn individually, no matter if living or dead. I changed the code to only draw living cells, which resulted in a massive performance boost. That being said, drawing every living cell on every frame is probably still pretty dumb. Idk

thank you for the help, though!

Maybe I cause some confusion. Let me try to clarify

Is it acceptable that

  1. You first check row 0 column 0 for alive neighbours and next update the frame.
  2. Next check cell row 0 column 1 for alive neighbours and next update the frame.
  3. Next check cell row 0 column 2 for alive neighbours and next update the frame.
    And so on.
    Once you have checked all cells in row 0 that way, move to row 1. Next to row 2 etc. Once you have done all rows, start at row 0 again

Can you post a basic example; it does not have to refresh the frame. At this moment I do not know what countLiveNeighbors and inArray do and how much time they take.

I have always been interested in Conway’s Game of Life since I first became aware of it and my website home page has battles between randomly selected spaceship patterns.

I have created many programs to animate GOL and my early attempts were very slow due to badly designed algorithms. Over the years I have improved the algorithms significantly and in 2021 I created a Processing Sketch to run large grids using the power of Java 8 to perform parallel (multi-core) processing and this video shows it in action.

The sketch allows the user to switch between multi-core and single-core processing live at runtime and shows the average time to perform a single generation calculation.

On my computer I got ~1.05 ms for multi-core and ~2.7 ms for single-core processing.

Since the life grid has 1 million cells 2.7ms for a standard Processing sketch is pretty awesome even if I do say so myself :innocent:

In fact it is probably fast enough except for the largest life grids. I did try the sketch with the breeder pattern on a grid with 6 million cells and I got (7.4 and ~22ms) which OK using multi-core but too slow with single-core.

The point is that quite large grids can be processed quickly enough without having to resort to multi-threading or multi-core processing if we use the right algorithms.

If you are like me you will probably want to have the satisfaction of creating your own sketch, in which case I can describe the algorithms I used. They are no harder to implement than anything you have done already, simply different.

Alternatively I can zip up the sketch and post a link for you and anyone else to download it.

2 Likes