What is the difference between destroy() and unpersist()?

Spark is shipped with Broadcast variables, wich allows us to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Of course, when the "broadcasted variable" isn't going to be used anymore, it is natural to delete this variable. But from the documentation, it seems that there's two way of deleting broadcasted variables wich are:

unpersist() //Destroy all data and metadata related to this broadcast variable.
destroy() //Asynchronously delete cached copies of this broadcast on the executors.

I am not sure to properly undestand everything, does unpersist() does the same as delete() but synchronously? This is unclear for me.

Answers


As it is implemented in the two currently available concrete implementations (HttpBroadcast and TorrentBroadacst) there are two differences:

  • destroy is blocking (awaits for confirmation) while unpersist is by default non-blocking
  • destroy removes persisted blocks from the driver while unpersist doesn't

Otherwise these use the same logic of the BlockMangerMaster.


Need Your Help

LMS API account creation resulting in an error

php api curl canvas canvas-lms

I am working on Canvas LMS and have access token. I need to create an user account using web service in PHP. I have tried to do it using CURL (post method) but getting an error in response. However...

Reusing a PreparedStatement multiple times

java jdbc prepared-statement

in the case of using PreparedStatement with a single common connection without any pool, can I recreate an instance for every dml/sql operation mantaining the power of prepared statements?