normalian blog

Let's talk about Microsoft Azure, ASP.NET and Java!

How to create Hive tables via Ambari on Microsoft Azure HDInsight

As you know, HDInsight is powerful service to analyze, manage and process BigData on Microsoft Azure. You can create Hadoop, Storm, Spark and other clusters pretty easily! In this article, I will introduce how to create Hive tables via Ambari with cvs files stored in Azure Storage.
At first, you have to create your HDInsight cluster associated an Azure Storage account. In this article, I create a Spark 2.1.x cluster as HDInsight cluster.

Store CSV files into your Azure Storage

Upload your CSV files into the Azure Storage account. In this article, I upload Nikkei Average CSV file like below.

DATE,CLOSE,START,HIGH,LOW
2012/1/5,8488.71,8515.66,8519.16,8481.83
2012/1/6,8390.35,8488.98,8488.98,8349.33
2012/1/10,8422.26,8422.99,8450.59,8405.18
2012/1/11,8447.88,8440.96,8463.72,8426.03
2012/1/12,8385.59,8423.1,8426.83,8360.33
2012/1/13,8500.02,8471.1,8509.76,8458.68
2012/1/16,8378.36,8409.79,8409.79,8352.23
2012/1/17,8466.4,8420.12,8475.66,8413.22
2012/1/18,8550.58,8458.29,8595.78,8446.09
2012/1/19,8639.68,8596.68,8668.94,8596.68
2012/1/20,8766.36,8751.18,8791.39,8725.32
2012/1/23,8765.9,8753.91,8795.27,8744.54
2012/1/24,8785.33,8815.36,8825.09,8768.51
2012/1/25,8883.69,8842.01,8911.62,8816.09
2012/1/26,8849.47,8890.49,8894.6,8834.93
2012/1/27,8841.22,8851.02,8886.02,8810.89
2012/1/30,8793.05,8803.79,8832.48,8774.23
2012/1/31,8802.51,8781.44,8836.68,8776.65

It's stored into Azure Storage account associated with HDInsight cluster, and its path is described as "https://"storage-account-name"."spark-container-name".blob.core.windows.net/financedata/nikkei_stock_average_daily_en.csv". You can specify the path in Hive query as "wasb://"spark-container-name"@"storage-account-name".blob.core.windows.net/financedata/nikkei_stock_average_daily_en.csv".

Create Hive tables from your CSV files

Open https://portal.azure.com/ and choose your HDInsight cluster. You can open Ambari portal to click a link of "https://'your-cluster-name'.azurehdinsight.net" in "Overview" page. Next, click "Hive View 2.0" button like below.
f:id:waritohutsu:20171002124722p:plain

Now, you can execute Hive query using below portal site.
f:id:waritohutsu:20171002124752p:plain
Copy below query and execute it into the site.

CREATE EXTERNAL TABLE DEFAULT.NIKKEIAVERAGE_TABLE(
  `DATE` STRING,
  `CLOSE` STRING,
  `START` STRING,
  `HIGH` STRING,
  `LOW` STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb://"spark-container-name"@"storage-account-name".blob.core.windows.net/financedata/nikkei_stock_average_daily_en.csv';

LOAD DATA INPATH 'wasb://"spark-container-name"@"storage-account-name".blob.core.windows.net/financedata/nikkei_stock_average_daily_en.csv' INTO TABLE DEFAULT.NIKKEIAVERAGE_TABLE;

Check your query result

After executing your query, run below query to check the data in Ambari.

SELECT * FROM DEFAULT.NIKKEIAVERAGE_TABLE;

So, you can get below result.
f:id:waritohutsu:20171002150636p:plain