Pointing a file to the hadoop cluster
I have a file stored in a server. I want the file to be pointed on the Hadoop cluster upon running spark. What I have is that I can point the spark context to the hadoop cluster but the data cannot be accessed in Spark now that it is pointing to the cluster. I have the data stored locally so in order for me to access the data, I have to point it locally. However, this causes a lot of memory error. What I hope to do is to point Spark on the cluster but at the same time accessed my data stored locally. Please provide me some ways how I can do this.
Spark (on Hadoop) cannot read a file stored locally. Remember spark is a distributed system running on multiple machines, thus it cannot read data on one of the nodes (other than localhost) directly.
You should put the file on HDFS and have spark read it from there.
To access it locally you should use hadoop fs -get <hdfs filepath> or hadoop fs -cat <hdfs filepath> command.