Disabling NUL-termination of strings in GCC

Is it possible to globally disable NUL-terminated strings in GCC?

I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.

However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!

I defined a macro to add those stack-allocated strings to my struct:

#define AppendString(ppDest, pSource) \
  AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)

Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?

Answers


This is pretty awful, but you can explicitly specify the length of every character array constant:

char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);

wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));

I understand you're only dealing with strings declared in your program:

 ....
 char str1[10];
 char str2[12];
 ....

and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.

Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.

Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?


I can't remember the details, but when I do

char my_constant[5]

it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.

It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.


If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.

No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...

Declare your "not quite C strings" with just the needed space;

struct NotQuiteCString { /* ... */ };

struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */

Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.

It seems most proper way to do thing you are talking about is:

  • To prepare some minimal 'listing' file in a form of:
    string1_constant_name "str1"
    string2_constant_name "str2"
    ...
  • To construct utility which processes your file and generates declarations such as
    const char string1_constant[4] = "str1";

Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.

So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.

Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.

Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).


Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.


The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.

While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.

Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.


Need Your Help

Group list by equivalence relation

list haskell

I have a equivalence relation R on a set A. How can I build equivalence classes on A? It's something like groupBy do, but between all the elements, not only neighbors.

ALU-n Procedure in Scheme

scheme racket alu

I'm a beginner to the Scheme language, so I'm having trouble writing a procedure to take in an n-bit number and put it into an ALU. The ALU is supposed to be constructed using 1-bit ALU's.