Concurrent Rendering Queue

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Concurrent Rendering Queue

      Hey Guys,

      I have been working on a rendering engine lately for use with my game engine, as well as for learning graphics programming more thoroughly, and have set up my render system with a concurrent render loop which simply processes a queue of 'Render Operations'. I have been mulling over how I implemented the operation side of things however and would like to get some feedback, I could think of 2 ways to do the actual operations.

      1. Pass an operation class which contains high level structures, for instance I have an operation meant for setting up and clearing a viewport. I really don't like this though but perhaps it is common to do it like this? Essentially the viewport has functions on it which activate its render target, set up the viewport and scissor area,set up the clear colours, and than clears the viewport. To me I think I am passing too much information to the render thread.

      Source Code

      1. -- Main Thread
      2. -- Queue a Setup Viewport operation, passing the viewport in question
      3. -- Signal the render thread that things are ready to go
      4. -- Render Thread
      5. -- Wait for signal
      6. -- Grab the current queue, and begin processing it
      7. -- For each operation, call its 'apply' method
      8. -- Setup viewport operation, call the viewports 'clear' method


      2. Pass granular operations to the render queue, which are built up from the main thread. These would be each individual operation that the first method would have used. I think this may also make it easier to sort and remove redundant operations.

      Source Code

      1. -- Main Thread
      2. -- Queue the following operations for a viewport
      3. -- Activate render target
      4. -- Enable Scissor Test
      5. -- Set Viewport, passing it the dimensions of the viewport
      6. -- Set Scissor, passing it the dimensions of the viewport
      7. -- Set Clear Colour, passing it the colours to use
      8. -- Clear, passing the buffer bits
      9. -- Signal the render thread that things are ready to go
      10. -- Render Thread
      11. -- Wait for signal
      12. -- Grab the current queue, and begin processing it
      13. -- For each operation, call its 'apply' method
      14. -- Set Viewport
      15. -- Set Scissor
      16. -- Set Clear Colour
      17. -- Clear
      Display All


      What do you guys think, and what is more common? Rez I know I heard you talking about having a multithreaded render queue :D.
      PC - Custom Built
      CPU: 3rd Gen. Intel i7 3770 3.4Ghz
      GPU: ATI Radeon HD 7959 3GB
      RAM: 16GB

      Laptop - Alienware M17x
      CPU: 3rd Gen. Intel i7 - Ivy Bridge
      GPU: NVIDIA GeForce GTX 660M - 2GB GDDR5
      RAM: 8GB Dual Channel DDR3 @ 1600mhz
    • RE: Concurrent Rendering Queue

      I'm a little unclear as to what exactly is happening in your different threads - such as "Set Scissor" is an apparent operation in both.

      But, I'd recommend a few things to think about:
      1. Consider how your system handles rendering in different passes associated what kind of processing needs to happen when - such as when to do rough object culling and occlusion, which objects are processed and sorted into a transparent rendering pass, and post processing effects like motion blur.
      2. How does your system manage and optimize drawing using different shaders to minimize CPU processing but at the same time minimize state changes in the GPU.
      3. Carefully consider the benefits and risks associated with using threading, or co-operative tasking in your render pipeline. I'm not the best person to ask about this, but I expect the cost associated with lock/unocking shared data could really slow down your system. Besides, the GPU runs independently of the CPU anyway, so if it were me I'd concentrate my efforts on a single threaded CPU solution that very carefully meets out calls to the GPU.
      4. Are you using OpenGL or DirectX, and which version? That will have a huge impact on your design.

      (In typical fashion I totally recognize I just brought up more questions, instead of answering any...sorry about that!)

      :)
      Mr.Mike
      Author, Programmer, Brewer, Patriot
    • No I am using OpenGL.
      PC - Custom Built
      CPU: 3rd Gen. Intel i7 3770 3.4Ghz
      GPU: ATI Radeon HD 7959 3GB
      RAM: 16GB

      Laptop - Alienware M17x
      CPU: 3rd Gen. Intel i7 - Ivy Bridge
      GPU: NVIDIA GeForce GTX 660M - 2GB GDDR5
      RAM: 8GB Dual Channel DDR3 @ 1600mhz
    • OpenGL is outside my experience, such as it is...maybe someone else on the forum has experience with it that can help you.
      Mr.Mike
      Author, Programmer, Brewer, Patriot
    • My version is closer to the second one. Rendering is done via RenderCommand objects that are passed from the main thread to the rendering thread. There are two queues, one that the main thread pushes RenderCommand objects to and the other that the render thread is actively processing.

      The main thread does all the game logic. Anything that needs to render this frame builds up a RenderCommand object and pushes it to the queue. At the end of the frame, the logic thread waits for a sync event.

      The render thread processes it's current queue, actually rendering all of the appropriate objects. Once it's done, it waits for the sync event.

      When both threads have received the sync event, the game enters into the sync phase and all atomic operations that need to occur can happen. The most important is swapping the queues (literally just swapping an index variable). Then the sync phase ends and both threads are allowed to run again. The main thread processes another frame while the render thread renders all the objects built up from the previous frame.

      There are several advantages to this method. For example, I'm guaranteed to never lock the render thread more than once per frame. There are no critical sections of mutexes here, just the sync phase at the end of each frame. Forcing this sync event also keeps me from having to deal with runaway threads, which can happen if you have one thread processing much faster than the other, causing one to get several frames ahead (this was the source of some interesting bugs on The Sims Medieval). In my system, the render thread is always exactly one frame behind.

      One disadvantage is that I do waste a little time. The length of a single frame is equal to the slowest thread. It's basically:

      Source Code

      1. max(renderThreadTime, mainThreadTime) + syncTime


      This is still better than:

      Source Code

      1. renderThreadTime + mainThreadTime


      It's not as fast as possible and the faster thread will go to sleep until the slower thread finishes, but it greatly simplifies the architecture and is fast enough for my needs.

      Hope that helps!

      -Rez
    • Thanks, I actually implemented what you just explained to a 'T' haha, it seemed like the best way and I think you talked about it before.
      PC - Custom Built
      CPU: 3rd Gen. Intel i7 3770 3.4Ghz
      GPU: ATI Radeon HD 7959 3GB
      RAM: 16GB

      Laptop - Alienware M17x
      CPU: 3rd Gen. Intel i7 - Ivy Bridge
      GPU: NVIDIA GeForce GTX 660M - 2GB GDDR5
      RAM: 8GB Dual Channel DDR3 @ 1600mhz
    • rezination
      How do you solve case when submitted rendering command can use object which is deleted by the main thread moment before rendering thread will use it? shared_ptr?

      That is:
      main thread pushes MeshA, MeshB, MeshC alongside with materials and stuff to the rendering queue. When signals happen, rendering thread starts rendering those meshes. Meanwhile main thread decided that MeshC is no longer needed and must be deleted. Theoretically this may happen before rendering thread rendered that MeshC.

      Couple of ways I see are: use shared_ptr in rendering queue or only delete objects in-between frames during sync.
      Looking for a job!
      My LinkedIn Profile
    • Yes and no. I'm not using a smart pointer system, I have a complex resource caching system that handles all resources. The first time someone attempts to acquire a resource, it will be loaded in a background thread. As long as there's a lock on the resource, it will stick around. Any further attempts to acquire the resource will increment a reference count. Whenever a resource is released, it decrements the reference count. When the reference count reaches 0, the resource is destroyed.

      Conceptually, this is very much like a smart pointer except that the resource is managed in a central location. I can directly manipulate it if I want.

      So, to answer your question, when the render command is built up and a resource is assigned to the command, it explicitly locks the resource (which just increments the ref count). This happens on the main thread when the render command is being populated with data. That way, even if the object is destroyed, it will still get rendered that frame. The resource is then unlocked in the render command's destructor, which is called when the render commands are destroyed by the main thread at the start of the next frame.

      The idea behind render commands is that they contain all the data necessary to render the object. They never refer back to the game object or any components; they exist on a lower layer. In fact, render commands are used by everything that needs to render. This includes game objects, UI, tiles, debug stuff, text, etc.

      -Rez