What is the difference between destroy() and unpersist()?
Spark is shipped with Broadcast variables, wich allows us to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.
Of course, when the "broadcasted variable" isn't going to be used anymore, it is natural to delete this variable. But from the documentation, it seems that there's two way of deleting broadcasted variables wich are:
unpersist() //Destroy all data and metadata related to this broadcast variable. destroy() //Asynchronously delete cached copies of this broadcast on the executors.
I am not sure to properly undestand everything, does unpersist() does the same as delete() but synchronously? This is unclear for me.
- destroy is blocking (awaits for confirmation) while unpersist is by default non-blocking
- destroy removes persisted blocks from the driver while unpersist doesn't
Otherwise these use the same logic of the BlockMangerMaster.