How do I obfuscate a column in a Hive view?

I have created a view for a table as:

CREATE VIEW anonymous_table
AS SELECT id, value FROM sensitive_table

and would like the id field of sensitive table to be obfuscated somehow, like an MD5 hash or something similar so that people querying the view can't see the actual id. What is a good way to do this in Hive?

Answers


Some options:

  • Don't include ID in your view at all:

    CREATE VIEW something AS SELECT "HIDDEN ID", value from sensitive_table;

  • If you still need there to be a distinct key available for each record, you could write a UDF to do whatever transformation you like:

    ADD JAR mycode.jar; CREATE TEMPORARY FUNCTION hash as 'com.example.MyUDF'; CREATE VIEW something as SELECT hash(id), value from sensitive_table;

  • BONUS: Seeing as your users can just look at the sensitive table anyway, you could hash the IDs before they arrives in hive? This is probably the best option honestly.

Either way, if you're processing the ID's, having a stable hashing function would be what you need if people still need to rely on the ID's for joining / aggregation, etc.

Here is the link to how to create a UDF


Need Your Help

Navbar Fixed To top without content under

jquery css twitter-bootstrap sticky-footer

I'm trying to setup a sticky navbar on a site for a school project. Here: http://www.divisionforty.com/aloevera/

Read Geo Coordinates from EXIF data of Photos

java android gps exif android-location

I am a .net developer trying to learn Android. I am trying to develop a mobile application using any Android simulator from a Windows desktop.