what's the purpose of compiler barrier?

The following is excerpted from Concurrent Programming on windows, Chapter 10 Page 528~529, a c++ template Double check implementation

T getValue(){
    if (!m_pValue){
        EnterCriticalSection(&m_crst);
        if (! m_pValue){
            T pValue = m_pFactory();
            _WriteBarrier();
            m_pValue = pValue;                  
        }
        LeaveCriticalSection(&m_crst);
    }
      _ReadBarrier();
  return m_pValue;
}

As the author state:

A _WriteBarrier is found after instantiating the object, but before writing a pointer to it in the m_pValue field. That's required to ensure that writes in the initialization of the object never get delayed past the write to m_pValue itself.

Since _WriteBarrier is compile barrier, I don't think it is useful if compiles know the semantics of LeaveCriticalSection. Compiles probably omit writing to pValue, but never optimize such that moving assignment before the function call, otherwise it would violate the program semantics. I believe LeaveCriticalSection has implicit hardware fence. And hence any writing before assignment to m_pValue will be synchronized.

On the other hand, if compiles don't know the semantics of LeaveCriticalSection, the _WriteBarrier will be needed in all platform to prevent compiles from moving assignment out of critical section.

And for _ReadBarrier, the author said

Similarly, we need a _ReadBarrier just before returning m_value so that loads after the call to getValue are not reordered to occur before the call.

First, if this function is included in a library, and no source code available, how do compiles know whether there is a compile barrier or not?

Second, it would be placed the wrong location if it is needed, I think we need place it right after EnterCriticalSection to express acquire fence. Similar with what i wrote above, it depends on whether compile understand EnterCriticalSection's semantics or not.

And the author also said:

However, I will also point out that neither fence is required on X86, Intel64, and AMD64 processors. It's unfortunate that weak processors like IA64 have muddied the waters

As I analysis above, if we need those barriers in certain platform, then we need them in all platform, because those barriers are compile barriers, it just make sure that compile can do the correct optimization, in case if they don't understand the semantics of some functions.

Please correct me if I am wrong.

Another question, is there any reference for msvc and gcc to point out which functions they understand their sync semantics?

Update 1: According to the answer(m_pValue will be accessed out of critical section), and run the sample codes from here, I think:

  1. I think what the author mean here is the hardware fence other than compile barrier, see following quote from MSDN.
  2. I believe hardware fence also has implicit compile barrier(disable compile optimization), but not vice versa(see here,using cpu fence will not see any reorder,but not vice versa)

A Barrier is not a fence.. It should be noted that a Barrier effects everything in cache. A fence effects a single cache line.

You should not be adding barriers unless absolutely necessary. To use a fence, you can select one of the _Interlocked intrinsic functions.

As author wrote: "neither fence is required on X86 Intel64, and AMD64 processors", this is because those platforms just allow store-load reorder.

There still remain a question, Does compiles understand the semantics of call to Enter/Leave critical section? if it doesn't, then it may doing optimization as in the follow answer, that will cause bad behavior.

Thanks

Answers


tl;dr: The factory call could well take several steps that may be moved after the assignment to m_pValue. The expression !m_pValue would return false before the factory call is complete, giving an incomplete return value in the second thread.

Explanation:

Compiles probably omit writing to pValue, but never optimize such that moving assignment before the function call, otherwise it would violate the program semantics.

Not necessarily. Consider T to be int*, and the factory method creates a new int and initializes it with 42.

int* pValue = new int(42);
m_pValue = pValue;         
//m_pValue now points to anewly allocated int with value 42.

For the compiler the new expression would be several steps that could be moved before another. It's semantics are allocation, initialization, and then assignment of the address to pValue:

int* pTmp = new int;
*pTmp = 42;
int* pValue = *pTmp;

In a sequential program, the semantics would not change if some of the commands are moved after others. Especially the assignment can be moved freely between the memory allocation and the first access, i.e. the first dereferencing of one of the pointers, including after the assignment of the pointer values after the new expression:

int* pTmp = new int;
int* pValue = *pTmp;
m_pValue = pValue;  
*pTmp = 42;
//m_pValue now points to a newly allocated int with value 42.

The compiler will probably do that to optimize most of the temporary pointers away:

m_pValue = new int;  
*m_pValue = 42;
//m_pValue now points to a newly allocated int with value 42.

This is correct semantics for a sequential program.

I believe LeaveCriticalSection has implicit hardware fence. And hence any writing before assignment to m_pValue will be synchronized.

No. The fence is after the assignment to m_pValue, but the compiler can still move the integer assignment between that and the fence:

m_pValue = new int;  
*m_pValue = 42;
LeaveCriticalSection();

And that's too late, because Thread2 does not need to enter the CriticalSection:

Thread 1:                | Thread 2:
                         |
m_pValue = new int;      | 
                         | if (!m_pValue){     //already false
                         | }
                         | return m_pValue;
                         | /*use *m_pValue */
*m_pValue = 42;          |
LeaveCriticalSection();  |

_ReadBarrier and _WriteBarrier

Joe Duffy believes that _ReadBarrier and _WriteBarrier compiler intrinsics are both compiler and processor level fences. In Concurrent Programming on windows, page 515, he writes

A set of compiler intrinsics forces both compiler and processor level fences in VC++: _ReadWriteBarrier emits a full fence, _ReadBarrier emits a read-only fence, and _WriteBarrier emits a write-only fence.

The author relies on _ReadBarrier and _WriteBarrier compiler intrinsics to prevent both compiler and hardware re-ordering.

MSDN documentation for _ReadWriteBarrier compiler intrinsics does not support the assumption that the compiler intrisics affect the hardware level. MSDN documentation for Visual Studio 2010 and Visual Studio 2008 clearly deny that the compiler intrinsics apply to the hardware level:

The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent only compiler re-ordering. To prevent the CPU from re-ordering read and write operations, use the MemoryBarrier macro.

MSDN documentation for Visual Studio 2005 and Visual Studio .NET 2003 does not have such a note. It says nothing about whether the intrinsics apply to hardware level or not.

If _ReadBarrier and _WriteBarrier really does not enforce hardware fences, the code is incorrect.

About the term "fence"

Joe Duffy in his book use the term fence for both hardware and memory fences. On the page 511, he writes:

It's common for fences to be called barriers also. Intel seems to prefer the "fence" terminology, while AMD prefers "barrier." I also prefer "fence," so that's what I use in this book.

Hardware fences

I believe hardware fence also has implicit compile barrier(disable compile optimization)

The Synchronization and Multiprocessor Issues article confirms that hardware barriers affect the compiler as well:

These instructions (memory barriers) also ensure that the compiler disables any optimizations that could reorder memory operations across the barriers.

However, MSDN documentation for MemoryBarrier macro suggests that compiler reorderings are not always prevented:

Creates a hardware memory barrier (fence) that prevents the CPU from re-ordering read and write operations. It may also prevent the compiler from re-ordering read and write operations.

Actually, I do not understand how it is possible to use a hardware fence if the compiler can reorder memory operations around it. We would not be sure the fence takes the right place.


Need Your Help

TFS Taskboard for Writing Specifications?

tfs specifications user-stories tfs2015 requirements-management

Our team has just moved to TFS 2015 for the start of a new project and I've set up the Kanban board for the backlog items.