Do C++ compilers optimize return-by-value member variables

I have a pretty good grasp on C++ return value optimizations when the variable being returned goes out of scope with the function, but what about returning member variables? Consider the following code:

#include <iostream>
#include <string>

class NamedObject {
 public:
  NamedObject(const char* name) : _name(name) {}
  std::string name() const {return _name;}

 private:
  std::string _name;
};

int main(int argc, char** argv) {
  NamedObject obj("name");
  std::cout << "name length before clear: " << obj.name().length() << std::endl;
  obj.name().clear();
  std::cout << "name length after clear: " << obj.name().length() << std::endl;
  return 0;
}

Which outputs:

name length before clear: 4
name length after clear: 4

Clearly, the obj.name().clear() acts on a temporary copy, but what about the calls to obj.name.length()? std::string::length() is const member function and therefore guaranteed not to modify the string's state. It seems reasonable, then, that compilers should be allowed to not copy the member variable and just use it directly for calls to const member functions. Do modern C++ compilers make this optimization? Is there any reason why it shouldn't or can't be made?

EDIT:

To clarify, I'm not asking if the standard return value optimization works here; I understood why it doesn't at the time I originally asked the question. RVO the way it's usually defined doesn't work here simply because the value being returned doesn't go out of scope with the function.

What I'm asking is: if the compiler at the time of the call can determine that call will have no side-effects, is it allowed to skip the copy? i.e., can it act as if

obj.name().length()

were

obj._name.length()

Answers


Short answer:

Unless the compiler sees into the implementation of the copy constructor and the length() method at the time of compiling main through inlining or some compiler specific magic, it will not be able to optimize away that copy.

Long answer:

The C++ standard typically never directly prescribes which optimizations should or should not be performed. In fact, an optimization is, almost by definition, something that does not alter the behavior of a well formed program.

If the compiler is able to prove that a particular invocation of obj.name results in a copy whose existence is unprovable by the observer, it is free to elide the copy. That could be your case as well with just a bit of inlining, so this copy elision is theoretically allowed here, because you don't print out or use its effects in any way.

Now, upon a closer look, Clause 12.8 of the standard does list four additional circumstances (related to exception handling, callee's return value such as the inside of name in your case, and binding a temporary to a reference). I list them in this post for easy reference, but none of them matches your case where the temporary is received from a call and is used to invoke a const method.

Therefore, these explicit "exceptions" do not allow optimizing away the copy simply by inspection of main and noticing the const qualifier of length().

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects. In such cases, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization. This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):

— in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cvunqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

— in a throw-expression, when the operand is the name of a non-volatile automatic object (other than a function or catch-clause parameter) whose scope does not extend beyond the end of the innermost enclosing try-block (if there is one), the copy/move operation from the operand to the exception object (15.1) can be omitted by constructing the automatic object directly into the exception object

— when a temporary class object that has not been bound to a reference (12.2) would be copied/moved to a class object with the same cv-unqualified type, the copy/move operation can be omitted by constructing the temporary object directly into the target of the omitted copy/move

— when the exception-declaration of an exception handler (Clause 15) declares an object of the same type (except for cv-qualification) as the exception object (15.1), the copy/move operation can be omitted by treating the exception-declaration as an alias for the exception object if the meaning of the program will be unchanged except for the execution of constructors and destructors for the object declared by the exception-declaration.


The name() function returns by value, which means that all operations are performed on temporaries.

It seems reasonable, then, that compilers should be allowed to not copy the member variable and just use it directly for calls to const member functions.

This assumption is incorrect on many grounds. When a function is declared as const you are telling the compiler that you will not modify the state of the object so that the compiler can help you verify that. The return type is part of the checks that the compiler can do for you. If you, for example, change the return type to be:

std::string& name() const { return _name; }

The compiler will complain: you promised that name() would not modify the state, but you are providing a reference through which others can do it. Additionally, the semantics of that function is that it provides a copy that the caller can modify. If the copy was elided (it is impossible to elide it, but for the sake of argument), then calling code could modify what seems like a local copy and actually modify the state of your object.

In general, when providing accessors that are const, you should return references to the members rather than copies.

I have a pretty good grasp on C++ return value optimizations for temporaries, [...] Do modern C++ compilers make this optimization? Is there any reason why it shouldn't or can't be made?

I have the feeling that you don't really have a good grasp on what the return value optimizations are, or else you would not make the second question. Lets work this on an example. When user code has:

std::string foo() {
   std::string result;
   result = "Hi";
   return result;
}
std::string x = foo();

In the code above there are potentially three strings: result inside foo, the return value (lets call it __ret) and x, and two possible optimizations that can be applied: NRVO and generic copy-elision. NRVO is an optimization performed by the compiler when processing the function foo, and it consists in mergint result and __ret by placing them over the same location and creating a single object. The second part of the optimization must be done in the caller side, and again it merges the locations of the two objects x and __ret.

As of the actual implementation, I will start with the second one. The caller (in most calling conventions) is responsible for the allocation of the memory for the returned object. Without optimizations (and in a kind of pseudocode) this is what goes on on the caller:

[uninitialized] std::string __ret;
foo( [hidden arg] &__ret );          // Initializes __ret
std::string x = __ret;

Now, because the compiler knows that the temporary __ret will only life to initialize x it transforms the code into:

[uninitialized] std::string x;
foo( [hidden arg] &x );             // Initializes x

And the copy of the caller is elided. The copy inside foo is elided in a similar way. The transformed (to comply with the calling convention) function is:

void foo( [hidden uninitialized] std::string* __ret ) {
   std::string result;
   result = "Hi";
   new (__ret) std::string( result );   // placement new: construct in place
   return;
}

Now the optimization in this case is exactly the same. Since result lives only to be able to initialize the returned object, it can reuse the same space, rather than creating a new object:

void foo( [hidden uninitialized] std::string* __ret ) {
   new (__ret) std::string();
   (*__ret) = "Hi";
   return;
}

Now going back to your original problem, because the member variable exists before the member function is called, this optimization cannot be applied. The compiler cannot place the return value on the same location that the member attribute is, since that variable is already alive in a known location that is not the address of __ret (provided by the caller).

I wrote about NRVO and copy elision in the past. You might be interested in reading those articles.


is const member function and therefore guaranteed not to modify the string's state

That's not true. std::string may have a mutable data member, and any function may cast the const off this or any member of it.


The best way to learn what optimizations your compiler makes is to look at the assembly it generates and see exactly what the compiler actually does. It's very hard to predict what kind of optimizations a given compiler may or may not make under every situation, and most people are usually either too pessimistic or way too optimistic.

On the other hand, by just examining the compiler's output, you can see exactly what it does without needing any guesswork.

In Visual Studio, you can get a helpful output of assembly interleaved with the source code through setting project properties -> C/C++ -> Output Files -> Assembler Output -> "Assembly with source code", or just supplying /Fas to the command line. You can tell GCC to output assembly with -S, but that won't correlate the lines of assembly with lines of source; for that you have to use objdump or maybe the -fverbose-asm commandline option if it happens to work in your version.

For example, one of the blocks from your code (compiled under full release in MSVC) is :

; 23   :    obj.name().clear();

    lea ecx, DWORD PTR _obj$[esp+92]
    push    ecx
    lea esi, DWORD PTR $T23719[esp+96]
    call    ?name@NamedObject@@QBE?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ ; NamedObject::name
    mov DWORD PTR [eax+16], ebx
    cmp DWORD PTR [eax+20], edi
    jb  SHORT $LN70@main
    mov eax, DWORD PTR [eax]
$LN70@main:
    mov BYTE PTR [eax], bl
    mov ebx, DWORD PTR __imp_??3@YAXPAX@Z
    cmp DWORD PTR $T23719[esp+112], edi
    jb  SHORT $LN84@main
    mov edx, DWORD PTR $T23719[esp+92]
    push    edx
    call    ebx
    add esp, 4
$LN84@main:

; 24   :    std::cout << "name length after clear: " << obj.name().length() << std::endl;

    lea eax, DWORD PTR _obj$[esp+92]
    push    eax
    lea esi, DWORD PTR $T23720[esp+96]
    call    ?name@NamedObject@@QBE?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ ; NamedObject::name
    mov BYTE PTR __$EHRec$[esp+100], 2
    mov ecx, DWORD PTR __imp_?endl@std@@YAAAV?$basic_ostream@DU?$char_traits@D@std@@@1@AAV21@@Z
    mov eax, DWORD PTR [eax+16]
    mov edx, DWORD PTR __imp_?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A
    push    ecx
    push    eax
    push    OFFSET ??_C@_0BK@PFKLDML@name?5length?5after?5clear?3?5?$AA@
    push    edx
    call    ??$?6U?$char_traits@D@std@@@std@@YAAAV?$basic_ostream@DU?$char_traits@D@std@@@0@AAV10@PBD@Z ; std::operator<<<std::char_traits<char> >
    add esp, 8
    mov ecx, eax
    call    DWORD PTR __imp_??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@I@Z
    mov ecx, eax
    call    DWORD PTR __imp_??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@P6AAAV01@AAV01@@Z@Z
    cmp DWORD PTR $T23720[esp+112], edi
    jb  SHORT $LN108@main
    mov eax, DWORD PTR $T23720[esp+92]
    push    eax
    call    ebx
    add esp, 4

(You can undecorate MSVC symbol names with undname.exe) As you can see, in this case it calls the NamedObject::name() function both before .clear() and again before .length().


The return value optimizations are about eliminating the implicit copy in the return statement by eliminating the temporary or object with scope local to the function and using the object being eliminated as an alias for the return object.

Obviously, this only applies if the function is constructing the object being used in the return statement. If the object being returned already exists, there is no extra object being created so the object being returned must be copied to the return object. There is no other object construction in the function that can be eliminated.

Not withstanding all of the above, the compiler can make any optimizations it sees fit so long as no difference in behaviour can be observed by a conforming program so anything (non-observable) is possible.


Need Your Help

ActionBar List Navigation Overlapping Fragments

android android-fragments android-actionbar

I'm trying to implement the Android Action Bar in list navigation mode, it successfully changes fragments when an item is selected from the list, but the fragments overlap and I can see the content...