How do I delete duplicate rows from these web logs

I am currently analyzing some Apache web logs. Some rows contain duplicates (not complete duplicates, as the datetime can be some seconds apart.) as you can see on the image below. I am mostly using SQL within Spark. I want to keep only one.

See Image here


You can use 'dropDuplicates' method to remove the duplicates instead of a group by within query.

'weblogs_filter_bekijk = sqlContext.sql("select endpoint from basetable5 where ip_address = ''").dropDuplicates'

This should help you.You can refer to below link for detailed explanation of this method.

You can use group by command in a SQL query, for example:

select * from table where x = y group by x_column 

