APACHE HIVE: Hive Commands

Creation and dropping of Database in Hive:
reate Database:
For creating database in Hive shell, we have to use the command as shown in syntax below:-
Syntax:
Create database <DatabaseName>
Example: -Create database "MANMA"
Displaying existing databases using "show" command
Example:- Show databases;
Drop Database:
For Dropping database in Hive shell, we have to use the "drop" command as shown in syntax below:-
Syntax:
Drop database <DatabaseName>
Example:-
Drop database MANMA
Table Operations such as Creation, Altering, and Dropping tables in Hive can be observed in this tutorial.
In the Below screenshot, we are creating a table with columns and altering the table name.
1. Creating table manma_sample with two column names such as "empid" and "empname"
Example : create table manma_sample(empid int, empname string);
2. Displaying tables in databases
Example: show tables;
3. Altering table "manma_sample" as "manma_sampleNew"
Example : ALTER TABLE manma_sample to manma_sampleNew;
4. Dropping table manma_sampleNew.
Example : drop table manma_sampleNew

Table types and its Usage:
Coming to Tables it's just like the way that we create in traditional Relational Databases. The functionalities such as filtering, joins can be performed on the tables.
Hive deals with two types of table structures like Internal and External tables depending on the loading and design of schema in Hive.
Internal tables

Internal Table is tightly coupled in nature.In this type of table, first we have to create table and load the data.
We can call this one as data on schema.
By dropping this table, both data and schema will be removed.
The stored location of this table will be at /user/hive/warehouse.

When to Choose Internal Table:

If the processing data available in local file system
If we want Hive to manage the complete lifecycle of data including the deletion
Sample code Snippet for Internal Table

1. To create the internal table
CREATE TABLE manmahive_internaltable (id INT,Name STRING)
Row format delimited
Fields terminated by '\t';
2. Load the data into internal table

LOAD DATA INPATH '/user/manma99hive/data.txt' INTO table manmahive_internaltable

3. Display the content of the table

select * from manmahive_internaltable;

4. To drop the internal table

DROP TABLE manmahive_internaltable;

If you dropped the manmahive_internaltable, including its metadata and its data will be deleted from Hive.

External tables

External Table is loosely coupled in nature. Data will be available in HDFS.The table is going to create on HDFS data.
In other way, we can say like its creating schema on data.
At the time of dropping the table it drops only schema, the data will be still available in HDFS as before.
External tables provide an option to create multiple schemas for the data stored in HDFS instead of deleting the data every time whenever schema updates

When to Choose External Table:

If processing data available in HDFS
Useful when the files are being used outside of Hive

Sample code Snippet for External Table

1. Create External table

CREATE EXTERNAL TABLE manmahive_external(id INT,Name STRING)

Row format delimited

Fields terminated by '\t'

LOCATION '/user/manma99hive/manmahive_external;

2. If we are not specifying the location at the time of table creation, we can load the data manually

LOAD DATA INPATH '/user/manma99hive/data.txt' INTO TABLE manmahive_external;

3. Display the content of the table

select * from manmahive_external;

4. To drop the internal table

DROP TABLE manmahive_external;

Difference between Internal Vs External tables

Feature	Internal	External
Schema	Data on Schema	Schema on Data
Storage Location	/usr/hive/warehouse	HDFS location
Data availability	Within local file system	Within HDFS

What is Partitions?

Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based on partition keys.

Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements for determining how the data is stored in the table.

For Example: -

Client having some E –commerce data which belongs to India operations in which each state (29 states) operations mentioned in as a whole. If we take state column as partition key and perform partitions on that India data as a whole, we can able to get Number of partitions (29 partitions) which is equal to number of states (29) present in India. Such that each state data can be viewed separately in partitions tables.

Sample Code Snippet for partitions

1. Creation of Table all states

create table all states(state string, District string,Enrollment string)

row format delimited

fields terminated by ',';

2. Loading data into created table all states

Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;

3. Creation of partition table

create table state_part(District string,Enrolments string) PARTITIONED BY(state string);

4. For partition we have to set this property

sethive.exec.dynamic.partition.mode=nonstrict

5. Loading data into partition table

INSERT OVERWRITE TABLE state_partPARTITION(state) SELECT district, enrolments, state from allstates;

From the above code, we do following things

1. Creation of table all states with 3 column names such as state, district, and enrollment

2. Loading data into table all states

3. Creation of partition table with state as partition key

4. In this step Setting partition mode as non-strict( This mode will activate dynamic partition mode)

5. Loading data into partition table state_part

6. Actual processing and formation of partition tables based on state as partition key

7. There is going to 29 partition outputs in HDFS storage with the file name as state name. We will check this in this step.

In This step, we seeing the 29 partition outputs in HDFS

What is Buckets?

Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for efficient querying.

The data i.e. present in that partitions can be divided further into Buckets
The division is performed based on Hash of particular columns that we selected in the table.
Buckets use some form of Hashing algorithm at back end to read each record and place it into buckets
In Hive, we have to enable buckets by using the set.hive.enforce.bucketing=true;

Step 1) Create Bucket

create table samplebucket {first_name string, job_id int, department string, salary string, country string}

clustered by (country) into 4 buckets

row format delimited

fields terminated by ',';

Step 2) Loading Data into table sample bucket

Assuming that "Employees table" already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.

Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.

Here we are loading data into sample bucket from employees table.

Assuming that"Employees table" already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.

Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.

Here we are loading data into sample bucket from employees table.

from employees

insert overwrite table samplebucket

select first_name, job_id, department, salary, country;

Once insert operation completed you will see data into 4 buckets which we have created at the time of table creation..

What is a View?
Views are similar to tables, which are generated based on the requirements.

We can save any result set data as a view in Hive
Usage is similar to as views used in SQL
All type of DML operations can be performed on a view

Creation of View:
Syntax:
Create VIEW < VIEWNAME> AS SELECT
Example:
Hive>Create VIEW Sample_View AS SELECT * FROM employees WHERE salary>25000
In this example, we are creating view Sample_View where it will display all the row values with salary field greater than 25000.

What is Index?

Indexes are pointers to particular column name of a table.

The user has to manually define the index

Wherever we are creating index, it means that we are creating pointer to particular column name of table

Any Changes made to the column present in tables are stored using the index value created on the column name.

Syntax:

Create INDEX < INDEX_NAME> ON TABLE < TABLE_NAME(column names)>

Example:

Create INDEX sample_Index ON TABLE manmahive_internaltable(id)

Here we are creating index on table manmahive_internaltable for column name id.

APACHE HIVE

Wednesday, 31 July 2019

Hive Commands

No comments:

Post a Comment

Hive with Semi structured data

Search This Blog