Simulation of large and dense crowds on the GPU using OpenCL

This post provides videos related to my master thesis Simulation of large and dense crowds on the GPU using OpenCL.

The document contains detailed information about the implementation. It is based on the Continuum Crowds paper. Similar to March of the Froblins it runs on the graphics card, but uses OpenCL instead of shader programs. It also uses the original cost function of Continuum Crowds and expresses lane formation on the gradient field level.

It supports walls and tight environments. No velocity obstacle technique has been implemented (like the one in Froblins), but that would be possible to improve the movement. A basic binning algorithm is used for collision resolution.

The agent movement in the videos below shows very nervous agents. This can be tweaked/improved with the cost weights. But you might lose other properties doing so. Unfortunately I did not have the time to experiment much with different weights. The document gives an explanation of how extreme weights visually influence the movement.

The source for the application and the thesis document is available at

The following videos contain several scenes starting at 4096 agents on a 256×256 grid to more than 1 million (1048576) agents on a 1024×1024 grid.

Here is a short overview of the GUI interface for a scene with 4096 agents on a 256×256 grid.

  • 0:00 startup from the console
  • 0:10 simulation play, pause, single-stepping
  • 0:25 control panel (visualization, computation)
  • 0:32 view area, zooming, panning
  • 0:44 log window (scene information, system information)
  • 0:47 profiler window (detailed OpenCL kernel profiling)
  • 1:00 memory window (VRAM usage)
  • 1:09 counters for how many agents are active for each of the 4 agent groups, how many are parked, total agent count, field value under pointer, position, zoom

Visualization of the fields used for computing the navigation data (again the same scene with 4096 agents on a 256×256 grid).

  • 0:05 map overlay and discomfort field (infinite walls in white, low discomfort in red)
  • 0:15 density field (low density in red, high density in blue)
  • 0:22 average velocity direction. Color and arrows indicate direction.
  • 0:37 anisotropic speed field (values for each of the four directions north, east, south, west)
  • 1:02 anisotropic cost field
  • 1:17 potential field (zero potential at goal areas, potential increasing outwards over domain)
  • 1:43 gradient direction field (color coded, and later arrow overlay, nearest neighbor and bilinear filtering (as used by agents during movement update))
    The agents move against these gradients to their destinations.

Additional visualizations, including the tile update mechanism (again 4096 agents on a 256×256 grid).

  • 0:09 agent sprite rendering
  • 0:14 splat area visualization (areas that the agents contribute density and velocity to)
  • 0:24 tiles used during potential computation
  • 0:34 cells
  • 0:52 tile updates during potential computation
  • 1:53 outer iteration step influence on potential quality (step count has to be found empirically)

16384 agents on a 256×256 grid in a scene with different discomfort. Agents leave the area through their group exits in the upper left corner. They respawn on the right and bottom edge. In the lower right corner is a zone with discomfort forming a hill. Large discomfort separates the scene in the shape of a river. Later, a discomfort spot (brush) is set. It can be moved with the mouse to interact with the agents. The agents immediately start to evade the area of high discomfort.

  • 0:12 discomfort field (white are infinite walls, river of high discomfort, hills with varying discomfort)
  • 0:30 placing the discomfort brush that adds additional discomfort into the scene and allows interacting with agents
  • 0:35 agents evade the area of additional discomfort

65536 agents on a 256×256 grid starting to form a circle after some time. I accelerated part of the video to 4 x speed. Please excuse the stuttering/hickups, I could not prevent that from occurring during video processing. The video also does not show the complete sequence. The desktop recording application always aborted recording half way through, so I had to record the ending separately, and cut the video together.

  • 0:06 the 4 agent groups each have their goal area in a diagonal. Upon reaching the goal, an agent switches to the next group and goal area.
  • 0:18 nice lane formations
  • 0:25 video accelerated to 4x speed (please excuse the hickups originating from video encoding)
  • 1:25 after a few minutes a circle forms and you can clearly see the agents of the different groups
  • 1:40 detail view of agents changing group at goal area

More than 1 million (1048576) agents on a 1024×1024 grid and an otherwise empty scene. Agents are heading to the lower and left edges. Running at about 2.5 updates per second. Adding a discomfort brush that the agents evade. Showing the CPU usage, the computation runs primarily on the graphics card.


  • In Figure 21 the line leading into the Costs buffer should start at DiscomfortSum not at the DensitySum in the Mixed Buffer
  • Page 56: “The reason is that each Stream Core can load 32 consecutive  floats during a cycle, as mentioned in Section 3.2.2.” should be “The reason is that each Compute Unit can load 32 consecutive floats during a cycle, as mentioned in Section 3.2.2.”
  • Page 59: “IsConverged – Same as for Sleep, but with α = IsConverged in all
    state transitions.” is wrong, Figure 32 shows the correct state transitions.
  • Page 62: “To be more accurate, it would be necessary to also check the cost in the case on the right, because the cost also decides from which side the wavefront reached the central grid cell.” should be “To be more accurate, it would be necessary to also check the cost, because the cost also decides from which side the wavefront reached the central grid cell.”


  • Should’ve pointed out more clearly that March of the Froblins uses local navigation to get lane formation as an emergent phenomenon. In Continuum Crowds and this thesis the lane formation is based on the gradient field. Currently I don’t know all the implications of this difference.

3 thoughts on “Simulation of large and dense crowds on the GPU using OpenCL

  1. Hi, in Chapter 5 of your thesis, you mention on pg 47 that you are using the same difference scheme as in Continuum Crowds. If so, equation 5.7, followed by the expansion to the quadratic form would be incorrect. Is that what you intended to do?

    • Hey,

      on p.47 the part “A difference scheme based on [55] turns out to be a working discretization technique … This thesis uses a modified variant that includes the cost in the decision process, as mentioned in Continuum Crowds.” is probably misleading. What I meant is that the thesis uses a modified variant of the scheme in [55], but not that this modified variant is also the variant mentioned in the Continuum Crowds paper.

      When I started, the first thing I did was trying to get the variant in the Continuum Crowds paper to work, with the additional help of the information provided by March of the Froblins. But I ran into problems, please see the footnote on p. 50 about these. While looking for a solution I found [55], which uses a different cost function. By merging the approach of [55] with the way the directional cost is found in Continuum Crowds, I managed to get around these problems. This merger is equation 5.7.

      As I am no native English speaker, probably more of my explanations are confusing. Additionally, I learned so much during this thesis, that structuring and explaining it all became very difficult for me. Still I wanted to write down all the details so that it makes sense, especially for others like me who approach this topic with limited knowledge about the eikonal equation. Hence I tried to not leave out any steps, unlike some papers that expect a lot of previous knowledge. Still, my style of writing might not make it a whole lot easier to understand 😀

      There are also corrections at the bottom of the blog post, for mistakes that somehow slipped into the paper. Make sure to check on those, if something doesn’t make sense in the paper.

      Please feel free to ask again if any additional questions arise or point out any errors you may find.



      EDIT: replied to wrong post

      • Thanks for the reply. I found your thesis quite easy to understand, but a few clarifications do no harm. The details of your thesis really helps for newcomers, as you are right, a lot of papers expect a lot of previous knowledge. I’ve been having similar problems with the NaN’s, and was comparing your thesis with my implementation, therefore, this clarification helps. I’m going to try and look at using the Continuum Crowds approach, with eq.5.10 that you use as an alternative, as I’m not currently trying to use the GPU.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s