Custom delimiter csv reader spark
I would like to read in a file with the following structure with Apache Spark.
The delimiter is \t. How can I implement this while using spark.read.csv()?
The csv is much too big to use pandas because it takes ages to read this file. Is there some way which works similar to
pandas.read_csv(file, sep = '\t')
Thanks a lot!
Use spark.read.option("delimiter", "\t").csv(file) or sep instead of delimiter.
If it's literally \t, not tab special character, use double \: spark.read.option("delimiter", "\\t").csv(file)