Using Pandas to read CSV containing some missing values

I use Python 2.7 with Anaconda. I have a .csv file:

  action_type                action_detail secs_elapsed
0        data             similar_listings        255.0
1        data             similar_listings        183.0
2       click  change_trip_characteristics     175570.0
3         NaN                          NaN         86.0
4        data      wishlist_content_update       1535.0

The file contains some missing values and data types of each column are not necessarily similar. I used Pandas to load this .csv

for chunk in pd.read_csv('the_file_name.csv', chunksize=1000, 
                         dtype={'action_type': str, 'action_detail': str,
                                'secs_elapsed': str})

For each chunk, I found that data type of some rows are not my instructions in function pd.read_csv. Let me show an example

chunk.ix[3, 'action_type']
Out[1]: nan
type(chunk.ix[3, 'action_type'])
Out[2]: float

My questions are

  1. I want all datas type like my instruction, how could I do that?
  2. I also want to replace these missing values, I have used pandas.filna() but it doesn't effect. I think it's due to data type. Could you please give nay hints for this?

Thank you

Answers


Use converters instead of dtype:

for chunk in pd.read_csv('the_file_name.csv', chunksize=1000, delim_whitespace=True,
    converters={'action_type': str, 'action_detail': str,'secs_elapsed': str}):

>>> type(chunk.ix[3, 'action_type'])
str

Also, for your file example you need to set delim_whitespace=True. Unless the real file is comma separated.


Need Your Help

Choose root DNS server to use

java scala dns jvm dnsjava

I want to choose what DNS server to use. I will make potentially multiple choices in the same JVM. I want to resolve IP addresses from hostnames.

Safe Markup for Dynamic Insertions Based Upon User Input

html dynamic markup

Is there any tag that allows safe markup? For instance: