Adding column headers to hive result set
I am using a hive script on Amazon EMR to analyze some data.
And I am transferring the output to Amazon s3 bucket. Now the results of hive script do not contain column headers.
I have also tried using this:
But it does not help. Can you help me out?
Exactly what does your hive script look like?
Does the output from your hive script have the header data in it? Is it then being lost when you copy the output to your s3 bucket?
If you could provide some more details about exactly what you are doing that would be helpful.
Without knowing those details, here is something that you could try.
Create your hive script as follows:
USE dbase_name: SET hive.cli.print.header=true; SELECT some_columns FROM some_table WHERE some_condition;
Then run your script:
$ hive -f hive_script.hql > hive_output
Then copy your output to your s3 bucket
$ aws s3 cp ./hive_output s3://some_bucket_name/foo/hive_output
I guess that direct way is still impossible (HIve: writing column headers to local file?). Some solution would be export result of DESCRIBE table_name to file:
$ hive -e 'DESCRIBE table_name' > file
And write some script that add column names into your data file. GL!
I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.
create table new_table as select field1, field2, field3 from ( select 0 as sort_col, --header row gets lowest number 'field1_name' as field1, 'field2_name' as field2, 'field3_name' as field3 from some_small_table --table needs at least 1 row limit 1 --only need 1 header row union all select 1 as sort_col, --original query goes here field1, field2, field3 from main_table ) a order by sort_col --make sure header row is first
It's a little bulky, but at least you can get what you need with a single query.
Hope this helps!
It might be just a typo (or a version-dependent change), but the following works for me:
It's "headers" instead of "header"