Coverting Date Format in Hive

I have downloaded twitter data in hdfs and have successfully created hive table on it. Twitter gives the date format 'Thu Dec 10 06:21:00 +0000 2015' in the json file and i want to convert this format into 'yyy-MM-dd' i have tried built in functions provided by hive but they are not accepting the format provided by twitter. Can anyone help me convert this format.

Answers


Maybe an UDF is overkill and all you want is a quick fix, e.g. some RegExp massaging of that ugly format (caveat: it's your job to test it thoroughly against all kinds of Twitter dates, and adapt the massaging accordingly)

select TWITTER_VOMIT
 , regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 3) as YYYY
 , regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 1) as MON
 , regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 2) as DD
 , concat(regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 3)
         ,case regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 1)
            when 'Jan' then '-01-'
            when 'Feb' then '-02-'
            when 'Mar' then '-03-'
            when 'Apr' then '-04-'
            when 'May' then '-05-'
            when 'Jun' then '-06-'
            when 'Jul' then '-07-'
            when 'Aug' then '-08-'
            when 'Sep' then '-09-'
            when 'Oct' then '-10-'
            when 'Nov' then '-11-'
            when 'Dec' then '-12-'
            else '-**-'
          end
         ,regexp_extract(TWITTER_VOMIT, '^[A-Z][a-z]* ([A-Z][a-z]*) ([0-9][0-9]*) .* ([0-9]*)$', 2)
         ) as YYYY_MM_DD
from TWITTER_FEED

For your example

twitter_vomit                   yyyy  mon  dd  yyyy_mm_dd
Thu Dec 10 06:21:00 +0000 2015  2015  Dec  10  2015-12-10

You need to develop java based UDF similar to this: Once it is developed, you need to compile to jar ship it to cluster, add the jar file and create temporary function.

import java.text.ParseException;
import java.text.SimpleDateFormat;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class DateTranslate extends UDF {
    public String evaluate(Text str) {
        SimpleDateFormat formatter = new SimpleDateFormat("<SOURCE_DATE_FORMAT>");
        String transactionDate = null;
        try {
            transactionDate = (new SimpleDateFormat("yyyy-MM-dd") //2013-01-03
                    .format(formatter.parse(str.toString())))
                    .toString();
        } catch (ParseException e) {
            e.printStackTrace();
        }

        return transactionDate;

    }

}

Create a custom UDF:

package org.apache.hadoop.hive.contrib.dateparser;

import java.text.ParseException;
import java.text.SimpleDateFormat;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class dateparserHive extends UDF {
    public String evaluate(Text str) {
        SimpleDateFormat formatter = new SimpleDateFormat("EEE MMM dd HH:mm:ss ZZZZZ yyyy");
        String twitterDate = null;
        try {
            twitterDate = (new SimpleDateFormat("yyyy-MM-dd")
                    .format(formatter.parse(str.toString())))
                    .toString();
        } catch (ParseException e) {
            e.printStackTrace();
        }
        return twitterDate;

    }

}

Step 1: Create a jar file out of this. Copy this jar file in local file system

Step 2: Open Hive Terminal:

Step 3: ADD JAR /home/oxigen/dateparser.jar;

Step 4: create temporary function dateparsertwitter as 'org.apache.hadoop.hive.contrib.dateparser.dateparserHive';

Step 5: select dateparsertwitter('Wed Mar 02 09:56:41 +0000 2016') from twitter limit 1;

This should work fine!


Need Your Help

Why doesn't my file search work?

vba file-io

I'm doing a check to make sure that my code is able to see my file before I move to the next step of my program. This is my code, but it always displays as the path not existing. Did I do something...

C++ template typename iterator

c++ templates iterator typename

Consider the following header file: