Pandas equivalent of Python's readlines function
With python's readlines() function I can retrieve a list of each line in a file:
with open('dat.csv', 'r') as dat: lines = dat.readlines()
I am working on a problem involving a very large file and this method is producing a memory error. Is there a pandas equivalent to Python's readlines() function? The pd.read_csv() option chunksize seems to append numbers to my lines, which is far from ideal.
In : lines =  In : for df in pd.read_csv('s.csv', chunksize = 100): ...: lines.append(df) In : lines Out: [ hello here is a line 0 here is another line 1 here is my last line] In : with open('s.csv', 'r') as dat: ...: lines = dat.readlines() ...: In : lines Out: ['hello here is a line\n', 'here is another line\n', 'here is my last line\n'] In : cat s.csv hello here is a line here is another line here is my last line
You should try to use the chunksize option of pd.read_csv(), as mentioned in some of the comments.
This will force pd.read_csv() to read in a defined amount of lines at a time, instead of trying to read the entire file in one go. It would look like this:
>> df = pd.read_csv(filepath, chunksize=1, header=None, encoding='utf-8')
In the above example the file will be read line by line.
Now, in fact, according to the documentation of pandas.read_csv, it is not a pandas.DataFrame object that is being returned here, but a TextFileReader object instead.
- chunksize : int, default None
Return TextFileReader object for iteration. See IO Tools docs for more information on iterator and chunksize.
Therefore, in order to complete the exercise, you would need to put this in a loop like this:
In : cat data_sample.tsv This is a new line This is another line of text And this is the last line of text in this file In : lines =  In : for line in pd.read_csv('./data_sample.tsv', encoding='utf-8', header=None, chunksize=1): lines.append(line.iloc[0,0]) .....: In : print(lines) ['This is a new line', 'This is another line of text', 'And this is the last line of text in this file']
I hope this helps!