Custom delimiter csv reader spark

I would like to read in a file with the following structure with Apache Spark.

628344092\t20070220\t200702\t2007\t2007.1370

The delimiter is \t. How can I implement this while using spark.read.csv()?

The csv is much too big to use pandas because it takes ages to read this file. Is there some way which works similar to

pandas.read_csv(file, sep = '\t')

Thanks a lot!

Answers


Use spark.read.option("delimiter", "\t").csv(file) or sep instead of delimiter.

If it's literally \t, not tab special character, use double \: spark.read.option("delimiter", "\\t").csv(file)


Need Your Help

How to set up [ ZeroMQ ] for use in a Visual Studio 2015 Enterprise?

c++ visual-studio visual-studio-2015 zeromq

While my primary domain of expertise is not Visual Studio 2015 setup / project configuration, I have experienced troubles on loading / configuring ZeroMQ project.

Converting a heap to a BST in O(n) time?

algorithm data-structures big-o binary-search-tree binary-heap

I think that I know the answer and the minimum complexity is O(nlogn).