Adding column headers to hive result set

I am using a hive script on Amazon EMR to analyze some data.

And I am transferring the output to Amazon s3 bucket. Now the results of hive script do not contain column headers.

I have also tried using this:

 set hive.cli.print.header=true;

But it does not help. Can you help me out?

Answers


Exactly what does your hive script look like?

Does the output from your hive script have the header data in it? Is it then being lost when you copy the output to your s3 bucket?

If you could provide some more details about exactly what you are doing that would be helpful.

Without knowing those details, here is something that you could try.

Create your hive script as follows:

USE dbase_name:
SET hive.cli.print.header=true;
SELECT some_columns FROM some_table WHERE some_condition;

Then run your script:

$ hive -f hive_script.hql > hive_output

Then copy your output to your s3 bucket

$ aws s3 cp ./hive_output s3://some_bucket_name/foo/hive_output

I guess that direct way is still impossible (HIve: writing column headers to local file?). Some solution would be export result of DESCRIBE table_name to file:

$ hive -e 'DESCRIBE table_name' > file

And write some script that add column names into your data file. GL!


I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

It's a little bulky, but at least you can get what you need with a single query.

Hope this helps!


It might be just a typo (or a version-dependent change), but the following works for me:

set hive.cli.print.headers=true;

It's "headers" instead of "header"


Need Your Help

What kind of application would serve as a dedicated application server?

java application-server

In a very popular ecommerce store, I'd imagine the actual processing of the credit card would be moved to some sort of dedicated application server, and made into more of a asynchronous process.

detect pip in setup.py

python pip easy-install

Gist: What is the best way to detect in setup.py that we are being triggered by pip install package?