Correct way to read a text file into a buffer in C?

<>

This question already has an answer here:

Answers


char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}

There are quite a few things wrong with this code:

  1. It is very slow (you are extracting the buffer one character at a time).
  2. If the filesize is over sizeof(source), this is prone to buffer overflows.
  3. Really, when you look at it more closely, this code should not work at all. As stated in the man pages:

The strcat() function appends a copy of the null-terminated string s2 to the end of the null-terminated string s1, then add a terminating `\0'.

You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The only time I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of strcat().

The following are two alternatives to consider using instead.

If you know the maximum buffer size ahead of time:

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '\0'; /* Just to be safe. */
    }

    fclose(fp);
}

Or, if you do not:

#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '\0'; /* Just to be safe. */
        }
    }
    fclose(fp);
}

free(source); /* Don't forget to call free() later! */

Yes - you would probably be arrested for your terriable abuse of strcat !

Take a look at getline() it reads the data a line at a time but importantly it can limit the number of characters you read, so you don't overflow the buffer.

Strcat is relatively slow because it has to search the entire string for the end on every character insertion. You would normally keep a pointer to the current end of the string storage and pass that to getline as the position to read the next line into.


See this article from JoelOnSoftware for why you don't want to use strcat.

Look at fread for an alternative. Use it with 1 for the size when you're reading bytes or characters.


Why don't you just use the array of chars you have? This ought to do it:

   source[i] = getc(fp); 
   i++;

Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;

If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()

http://linux.die.net/man/2/stat

so you might have

#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}

This avoids seeking to the beginning and end of the file.


Methinks you want fread:

http://www.cplusplus.com/reference/clibrary/cstdio/fread/


Have you considered mmap()? You can read from the file directly as if it were already in memory.

http://beej.us/guide/bgipc/output/html/multipage/mmap.html


Need Your Help

Ruby / Rails - Change the timezone of a Time, without changing the value

ruby-on-rails ruby datetime time timezone

I have a record foo in the database which has :start_time and :timezone attributes.

Find index of last occurrence of a sub-string using T-SQL

sql-server string tsql search sql-server-2000

Is there a straightforward way of finding the index of the last occurrence of a string using SQL? I am using SQL Server 2000 right now. I basically need the functionality that the .NET System.Str...