Undefined/Unspecified/Implementation-defined behaviour warnings?

Can't a compiler warn (even better if it throws errors) when it notices a statement with undefined/unspecified/implementation-defined behaviour?

Probably to flag a statement as error, the standard should say so, but it can warn the coder at least. Is there any technical difficulties in implementing such an option? Or is it merely impossible?

Reason I got this question is, in statements like a[i] = ++i; won't it be knowing that the code is trying to reference a variable and modifying it in the same statement, before a sequence point is reached.

Answers


It all boils down to

  • Quality of Implementation: the more accurate and useful the warnings are, the better it is. A compiler that always printed: "This program may or may not invoke undefined behavior" for every program, and then compiled it, is pretty useless, but is standards-compliant. Thankfully, no one writes compilers such as these :-).

  • Ease of determination: a compiler may not be easily able to determine undefined behavior, unspecified behavior, or implementation-defined behavior. Let's say you have a call stack that's 5 levels deep, with a const char * argument being passed from the top-level, to the last function in the chain, and the last function calls printf() with that const char * as the first argument. Do you want the compiler to check that const char * to make sure it is correct? (Assuming that the first function uses a literal string for that value.) How about when the const char * is read from a file, but you know that the file will always contain valid format specifier for the values being printed?

  • Success rate: A compiler may be able to detect many constructs that may or may not be undefined, unspecified, etc.; but with a very low "success rate". In that case, the user doesn't want to see a lot of "may be undefined" messages—too many spurious warning messages may hide real warning messages, or prompt a user to compile at "low-warning" setting. That is bad.

For your particular example, gcc gives a warning about "may be undefined". It even warns for printf() format mismatch.

But if your hope is for a compiler that issues a diagnostic for all undefined/unspecified cases, it is not clear if that should/can work.

Let's say you have the following:

#include <stdio.h>
void add_to(int *a, int *b)
{
    *a = ++*b;
}

int main(void)
{
    int i = 42;
    add_to(&i, &i); /* bad */
    printf("%d\n", i);
    return 0;
}

Should the compiler warn you about *a = ++*b; line?

As gf says in the comments, a compiler cannot check across translation units for undefined behavior. Classic example is declaring a variable as a pointer in one file, and defining it as an array in another, see comp.lang.c FAQ 6.1.


gcc does warn in that situation (at least with -Wall):

#include <stdio.h>

int main(int argc, char *argv[])
{
  int a[5];
  int i = 0;

  a[i] = ++i;

  printf("%d\n", a[0]);

  return 0;
}

Gives:

$ make
gcc -Wall main.c -o app
main.c: In function ‘main’:
main.c:8: warning: operation on ‘i’ may be undefined

Edit:

A quick read of the man page shows that -Wsequence-point will do it, if you don't want -Wall for some reason.


Different compilers trap different conditions; most compilers have warning level options, GCC specifically has many, but -Wall -Werror will switch on most of the useful ones, and coerce them to errors. Use \W4 \WX for similar protection in VC++.

In GCC You could use -ansi -pedantic, but pedantic is what it says, and will throw up many irrelevant issues and make it hard to use much third party code.

Either way, because compilers catch different errors, or produce different messages for the same error, it is therefore useful to use multiple compilers, not necessarily for deployment, but as a poor-man's static analysis. Another approach for C code is to attempt to compile it as C++; the stronger type checking of C++ generally results in better C code; but be sure that if you want C compilation to work, don't use the C++ compilation exclusively; you are likely to introduce C++ specific features. Again this need not be deployed as C++, but just used as an additional check.

Finally, compilers are generally built with a balance of performance and error checking; to check exhaustively would take time that many developers would not accept. For this reason static analysers exist, for C there is the traditional lint, and the open-source splint. C++ is more complex to statically analyse, and tools are often very expensive. One of the best I have used is QAC++ from Programming Research. I am not aware of any free or open source C++ analysers of any repute.


Contrarily, compilers are not required to make any sort of diagnosis for undefined behavior:

§1.4.1: The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior.”

Emphasis mine. While I agree it may be nice, the compiler's have enough problem trying to be standards compliant, let alone teach the programmer how to program.


GCC warns as much as it can when you do something out of the norms of the language while still being syntactically correct, but beyond the certain point one must be informed enough.

You can call GCC with the -Wall flag to see more of that.


If your compiler won't warn of this, you can try a Linter.

Splint is free, but only checks C http://www.splint.org/

Gimpel Lint supports C++ but costs US $389 - maybe your company c an be persuaded to buy a copy? http://www.gimpel.com/


Need Your Help

Getting Allen Bauer's TMulticastEvent<T> working

delphi generics events delphi-2009

I've been mucking around with Allen Bauer's code for a generic multicast event dispatcher (see his blog posts about it here).

Need input on a potentially NP-hard maximal edge-weighted multi-cycle graph

algorithm graph cycle np-hard

Sorry for the complex name - it's kind of well-deserved. Let me present the problem.