What's the most efficient way of implementing ReadLine() on a binary stream?

Please feel free to correct me if I am wrong at any point...

I am trying to read a CSV (comma separated values) file using .NET file I/O classes. Now the problem is, this CSV file may contain some fields with soft carriage returns (i.e. solitary \r or \n markers rather than the standard \r\n used in text files to end a line) within some fields and the standard text mode I/O class StreamReader does not respect the standard convention and treats the soft carriage returns as hard carriage returns thus compromising the integrity of the CSV file.

Now using the BinaryReader class seems to be the only option left but the BinaryReader does not have a ReadLine() function hence the need to implement a ReadLine() on my own.

My current approach reads one character from the stream at a time and fills a StringBuilder until a \r\n is obtained (ignoring all other characters including solitary \r or \n) and then returns a string representation of the StringBuilder (using ToString()).

But I wonder: is this is the most efficient way of implementing the ReadLine() function? Please enlighten me.

Answers


It probably is. In terms of order, it goes through each char once only, so it would be O(n) (where n is the length of the stream) so that's not a problem. To read a single character a BinaryReader is your best bet.

What I would do is make a class

public class LineReader : IDisposable
{
    private Stream stream;
    private BinaryReader reader;

    public LineReader(Stream stream) { reader = new BinaryReader(stream); }

    public string ReadLine()
    {
        StringBuilder result = new StringBuilder();
        char lastChar = reader.ReadChar();
        // an EndOfStreamException here would propogate to the caller

        try
        {
            char newChar = reader.ReadChar();
            if (lastChar == '\r' && newChar == '\n')
                return result.ToString();

            result.Append(lastChar);
            lastChar = newChar;
        }
        catch (EndOfStreamException)
        {
            result.Append(lastChar);
            return result.ToString();
        }
    }

    public void Dispose()
    {
        reader.Close();
    }
}

Or something like that.

(WARNING: the code has not been tested and is provided AS IS without warranty of any kind, expressed or implied. Should this program prove defective or destroy the planet, you assume the cost of all necessary servicing, repair or correction.)


You might want to look at using an ODBC/OleDB connection to do this. If you point the data source of an oledb connection to a directory containing csv files, you can then query it as if each CSV was a table. check http://www.connectionstrings.com/?carrier=textfile>connectionstrings.com for the correct connection string


Here an extension method for BinaryReader class :

using System.IO;
using System.Text;

public static class BinaryReaderExtension
{
    public static string ReadLine(this BinaryReader reader)
    {
        if (reader.IsEndOfStream())
            return null;

        StringBuilder result = new StringBuilder();
        char character;
        while(!reader.IsEndOfStream() && (character = reader.ReadChar()) != '\n')
            if (character != '\r' && character != '\n')
                result.Append(character);

        return result.ToString();
    }

    public static bool IsEndOfStream(this BinaryReader reader)
    {
        return reader.BaseStream.Position == reader.BaseStream.Length; 
    }
}

I didn't test in all conditions but this code worked for me.


How about simply preprocessing the file?

Replace the soft carriage returns with something unique.

For the record, CSV files with linefeeds in the data, that's bad design.


You could read a bigger chunk at a time, unencode it to a string using Encoder.GetString and then split into lines using string.Split("\r\n") or even picking out the head of the string using string.Substring(0,string.IndexOf("\r\n")) and leaving the rest for processing of the next line. Remember to add the next read operation to your last line from the previous read.


Your approach sounds fine. One way to improve the efficiency of your method might be to store each line as you're building it in a regular string (i.e. not a StringBuilder), and then append the entire-line-string to your StringBuilder. See this article for a further explanation - StringBuilder is not automatically the best choice here.

It probably will matter little, though.


Here's a faster alternative with encoding support. It extends BinaryReader, so you can use it to do both, read binary chunks and also perform StreamReader like ReadLine directly on a binary stream.

public class LineReader : BinaryReader
{
    private Encoding _encoding;
    private Decoder _decoder;

    const int bufferSize = 1024;
    private char[] _LineBuffer = new char[bufferSize];

    public LineReader(Stream stream, int bufferSize, Encoding encoding)
        : base(stream, encoding)
    {
        this._encoding = encoding;
        this._decoder = encoding.GetDecoder();
    }

    public string ReadLine()
    {
        int pos = 0;

        char[] buf = new char[2];

        StringBuilder stringBuffer = null;
        bool lineEndFound = false;

        while(base.Read(buf, 0, 2) > 0)
        {
            if (buf[1] == '\r')
            {
                // grab buf[0]
                this._LineBuffer[pos++] = buf[0];
                // get the '\n'
                char ch = base.ReadChar();
                Debug.Assert(ch == '\n');

                lineEndFound = true;
            }
            else if (buf[0] == '\r')
            {
                lineEndFound = true;
            }                    
            else
            {
                this._LineBuffer[pos] = buf[0];
                this._LineBuffer[pos+1] = buf[1];
                pos += 2;

                if (pos >= bufferSize)
                {
                    stringBuffer = new StringBuilder(bufferSize + 80);
                    stringBuffer.Append(this._LineBuffer, 0, bufferSize);
                    pos = 0;
                }
            }

            if (lineEndFound)
            {
                if (stringBuffer == null)
                {
                    if (pos > 0)
                        return new string(this._LineBuffer, 0, pos);
                    else
                        return string.Empty;
                }
                else
                {
                    if (pos > 0)
                        stringBuffer.Append(this._LineBuffer, 0, pos);
                    return stringBuffer.ToString();
                }
            }
        }

        if (stringBuffer != null)
        {
            if (pos > 0)
                stringBuffer.Append(this._LineBuffer, 0, pos);
            return stringBuffer.ToString();
        }
        else
        {
            if (pos > 0)
                return new string(this._LineBuffer, 0, pos);
            else
                return null;
        }
    }

}

Need Your Help

Qt4 Designer to create a dialog with a QTableWidget

c++ qt qt-designer

I have what is hopefully a simple problem using a QTableWidget in Qt Designer with Qt v4.7.1. I would like to create a dialog with a QTableWidget set to a 3x3 grid, then set the inititial size of ...