Bug in concurrent_queue::wait_and_pop()?

Merad · Dec 20th 2012, 8:28am

The original code for the wait_and_pop method was

Source Code

void wait_and_pop(Data& popped_value)
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.empty())
{
the_condition_variable.wait(lock);
}
popped_value=the_queue.front();
the_queue.pop();
}

Display All

When the boost condition variable waits, it takes the lock object as a parameter so that it can unlock the mutex until the condition variable is notified.

The code in GCC4 however is:

Source Code

void wait_and_pop(Data& popped_value)
{
ScopedCriticalSection locker(m_cs);
while(the_queue.empty())
{
WaitForSingleObject(m_dataPushed);
}
popped_value=the_queue.front();
the_queue.pop();
}

Display All

I'm not extremely familiar with windows-specific threading, but it doesn't look like there's any way for WaitForSingleObject to unlock the critical section... So won't a call to wait_and_pop() end up blocking the queue by indefinitely locking the critical section?

rezination · Dec 25th 2012, 11:55pm

I'm not sure I understand what you're saying, so I'll explain what I expect would happen with this code.

There are two concepts at work here. The first is the critical section, which keeps two threads from entering the same block of code at the same time. In this case, it enters the critical section at the top of the function and exits at the bottom. The second is the WaitForSingleObject() call, which puts that thread to sleep until the object becomes signaled (i.e. an external thread wakes it up).

Both of these concepts should play nicely together as long as you don't call wait_and_pop() in a thread that can't be blocked (you'll notice it's not called anywhere in GCC).

The typical use-case for this type of function is to use this as a loading queue. You have a loader thread whose whole job is to load stuff, so you make a concurrent_queue object and have the main process of loader thread call wait_and_pop() in a loop that grabs the next item and loads it. When it runs out of stuff to load, the thread goes to sleep and takes zero CPU time. In the main thread, you call push() to add stuff to the queue and when you're ready, you signal the loader thread and have it process in the background.

You never, ever call wait_and_pop() from the same thread that pushes data to it or the deadlock issue you're talking about would occur, where you'd have two threads locked because one is waiting for data and the other is waiting to execute the wait_and_pop() function.

Incidentally, this would happen in the boost version too. The functions are basically identical.

-Rez

Merad · Dec 26th 2012, 11:55pm

Ok, lets say we have a reader and writer thread. The reader thread starts first:

Source Code

void wait_and_pop(Data& popped_value)
{
ScopedCriticalSection locker(m_cs); // m_cs is locked here
while(the_queue.empty()) // yep, the queue is empty (writer hasn't done anything yet)
{
WaitForSingleObject(m_dataPushed); // reader thread is now waiting here, but wait, m_cs is still locked....
}
// more code...
}

So now the writer thread kicks off, and tries to push something:

Source Code

void push(Data const& data)
{
{
ScopedCriticalSection locker(m_cs); // oops, m_cs is still locked by the reader thread, so we're stuck waiting in EnterCriticalSection...
the_queue.push(data);
}
PulseEvent(m_dataPushed);
}

Basically, if wait_and_pop is ever called on an empty queue, it locks that queue up forever. The boost version doesn't have this problem because, as the documentation for boost::condition_variable::wait() tells us,

Notice that the lock is passed to wait: wait will atomically add the thread to the set of threads waiting on the condition variable, and unlock the mutex. When the thread is woken, the mutex will be locked again before the call to wait returns. This allows other threads to acquire the mutex in order to update the shared data, and ensures that the data associated with the condition is correctly synchronized.

Here's a full program demonstrating the problem:

C Source Code

#define WIN32_LEAN_AND_MEAN
#include <iostream>
#include <cstdlib>
#include <windows.h>
#include "CriticalSection.h"
concurrent_queue<int> g_queue;
DWORD WINAPI Reader(LPVOID param)
{
while (1)
{
int val;
g_queue.wait_and_pop(val);
std::cout << "Read " << val << std::endl;
}
return TRUE;
}
DWORD WINAPI Writer(LPVOID param)
{
while (1)
{
int val = rand();
g_queue.push(val);
Sleep(val % 500);
}
return TRUE;
}
int main(int argc, char** argv)
{
HANDLE reader = CreateThread(0, 0, Reader, 0, 0, 0);
HANDLE writer = CreateThread(0, 0, Writer, 0, 0, 0);
Sleep(15000);
TerminateThread(writer, 0);
TerminateThread(reader, 0);
return 0;
}

Display All

There doesn't seem to be an easy solution that exactly duplicates the functionality of the boost code, so it looks like it would be simplest to just manually control the critical section. I'd propose the following changes to wait_and_pop:

Source Code

void wait_and_pop(Data& popped_value)
{
m_cs.Lock();
while(the_queue.empty())
{
// unlock to allow other threads to push data
m_cs.Unlock();
WaitForSingleObject(m_dataPushed);
// relock to test emptiness and pop data
m_cs.Lock();
}
// critical section is already locked before exiting the while loop
popped_value=the_queue.front();
the_queue.pop();
m_cs.Unlock();
}

Display All

rezination · Dec 27th 2012, 7:47am

Ahh yes, I missed the fact that wait_and_pop() was also using the same critical section. You're correct, they'll deadlock pretty much every time.

As I think about it more, I feel like the actual fix is to delete the wait_and_pop() function. I don't think it's the best way of handling a loading queue. One of the rules of thumb for multi-threaded programming is to enter a mutex or critical section as rarely as possible.

My own engine has a loader thread that handles all of my loading. When it gets a message telling it that stuff is available, it wakes up, locks the queue (entering a critical section), and copies it to an internal queue. Then it clears the queue and unlocks it. This allows the main thread to continue to add things to the queue as it processes while the loader thread loads what it has. At the end, it checks to see if there's more in the main thread's queue. If so, it does the process again. If not, it goes to sleep until it gets another message.

If I used a wait_and_pop() type method, I would be locking the queue many times to do the read and potentially stall the main thread multiple times. This double-buffered approach has worked really well for me. I also use it for my render thread, which never directly enters a critical section. The disadvantage to this method is the slight memory overhead of copying the queue, though it's cleared immediately after. There's also the CPU cost of performing the copy, but it's much smaller than if I end up locking the main thread for multiple frames (reading from the hard drive takes long time compared to processing a frame).

Your fix seems fine if you want to use the wait_and_pop() method, but I honestly don't think it's the right way to go. I'm planning on deleting it from the concurrent_queue class since it's never used and doesn't seem worth fixing.

-Rez

Source Code

Source Code

Source Code

Source Code

C Source Code

Source Code

Share