Creation and dropping of Database in Hive:
reate Database:
For creating database in Hive shell, we have to use the command as shown in syntax below:-
Syntax:
Create database <DatabaseName>
Example: -Create database "MANMA"
Displaying existing databases using "show" command
Example:- Show databases;
Drop Database:
For Dropping database in Hive shell, we have to use the "drop" command as shown in syntax below:-
Syntax:
Drop database <DatabaseName>
Example:-
Drop database MANMA
Table Operations such as Creation, Altering, and Dropping tables in Hive can be observed in this tutorial.
In the Below screenshot, we are creating a table with columns and altering the table name.
1. Creating table manma_sample with two column names such as "empid" and "empname"
Example : create table manma_sample(empid int, empname string);
2. Displaying tables in databases
Example: show tables;
3. Altering table "manma_sample" as "manma_sampleNew"
Example : ALTER TABLE manma_sample to manma_sampleNew;
4. Dropping table manma_sampleNew.
Example : drop table manma_sampleNew
Table types and its Usage:
Coming to Tables it's just like the way that we create in traditional Relational Databases. The functionalities such as filtering, joins can be performed on the tables.
Hive deals with two types of table structures like Internal and External tables depending on the loading and design of schema in Hive.
Internal tables
CREATE TABLE manmahive_internaltable (id INT,Name STRING)
Row format delimited
Fields terminated by '\t';
2. Load the data into internal table
What is a View?
Views are similar to tables, which are generated based on the requirements.
Creation of View:
Syntax:
Create VIEW < VIEWNAME> AS SELECT
Example:
Hive>Create VIEW Sample_View AS SELECT * FROM employees WHERE salary>25000
In this example, we are creating view Sample_View where it will display all the row values with salary field greater than 25000.
reate Database:
For creating database in Hive shell, we have to use the command as shown in syntax below:-
Syntax:
Create database <DatabaseName>
Example: -Create database "MANMA"
Displaying existing databases using "show" command
Example:- Show databases;
Drop Database:
For Dropping database in Hive shell, we have to use the "drop" command as shown in syntax below:-
Syntax:
Drop database <DatabaseName>
Example:-
Drop database MANMA
Table Operations such as Creation, Altering, and Dropping tables in Hive can be observed in this tutorial.
In the Below screenshot, we are creating a table with columns and altering the table name.
1. Creating table manma_sample with two column names such as "empid" and "empname"
Example : create table manma_sample(empid int, empname string);
2. Displaying tables in databases
Example: show tables;
3. Altering table "manma_sample" as "manma_sampleNew"
Example : ALTER TABLE manma_sample to manma_sampleNew;
4. Dropping table manma_sampleNew.
Example : drop table manma_sampleNew
Table types and its Usage:
Coming to Tables it's just like the way that we create in traditional Relational Databases. The functionalities such as filtering, joins can be performed on the tables.
Hive deals with two types of table structures like Internal and External tables depending on the loading and design of schema in Hive.
Internal tables
- Internal Table is tightly coupled in nature.In this type of table, first we have to create table and load the data.
- We can call this one as data on schema.
- By dropping this table, both data and schema will be removed.
- The stored location of this table will be at /user/hive/warehouse.
- If the processing data available in local file system
- If we want Hive to manage the complete lifecycle of data including the deletion
- Sample code Snippet for Internal Table
CREATE TABLE manmahive_internaltable (id INT,Name STRING)
Row format delimited
Fields terminated by '\t';
2. Load the data into internal table
LOAD DATA INPATH '/user/manma99hive/data.txt' INTO table manmahive_internaltable
3. Display the content of the table
select * from manmahive_internaltable;
4. To drop the internal table
DROP TABLE manmahive_internaltable;
If you dropped the manmahive_internaltable, including its metadata and its data will be deleted from Hive.
External tables
- External Table is loosely coupled in nature. Data will be available in HDFS.The table is going to create on HDFS data.
- In other way, we can say like its creating schema on data.
- At the time of dropping the table it drops only schema, the data will be still available in HDFS as before.
- External tables provide an option to create multiple schemas for the data stored in HDFS instead of deleting the data every time whenever schema updates
When to Choose External Table:
- If processing data available in HDFS
- Useful when the files are being used outside of Hive
Sample code Snippet for External Table
1. Create External table
CREATE EXTERNAL TABLE manmahive_external(id INT,Name STRING)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/manma99hive/manmahive_external;
2. If we are not specifying the location at the time of table creation, we can load the data manually
LOAD DATA INPATH '/user/manma99hive/data.txt' INTO TABLE manmahive_external;
3. Display the content of the table
select * from manmahive_external;
4. To drop the internal table
DROP TABLE manmahive_external;
Difference between Internal Vs External tables
Feature
|
Internal
|
External
|
Schema
|
Data on Schema
|
Schema on Data
|
Storage Location
|
/usr/hive/warehouse
|
HDFS location
|
Data availability
|
Within local file system
|
Within HDFS
|
What is Partitions?
Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based on partition keys.
Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements for determining how the data is stored in the table.
For Example: -
Client having some E –commerce data which belongs to India operations in which each state (29 states) operations mentioned in as a whole. If we take state column as partition key and perform partitions on that India data as a whole, we can able to get Number of partitions (29 partitions) which is equal to number of states (29) present in India. Such that each state data can be viewed separately in partitions tables.
Sample Code Snippet for partitions
1. Creation of Table all states
create table all states(state string, District string,Enrollment string)
row format delimited
fields terminated by ',';
2. Loading data into created table all states
Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;
3. Creation of partition table
create table state_part(District string,Enrolments string) PARTITIONED BY(state string);
4. For partition we have to set this property
sethive.exec.dynamic.partition.mode=nonstrict
5. Loading data into partition table
INSERT OVERWRITE TABLE state_partPARTITION(state) SELECT district, enrolments, state from allstates;
From the above code, we do following things
1. Creation of table all states with 3 column names such as state, district, and enrollment
2. Loading data into table all states
3. Creation of partition table with state as partition key
4. In this step Setting partition mode as non-strict( This mode will activate dynamic partition mode)
5. Loading data into partition table state_part
6. Actual processing and formation of partition tables based on state as partition key
7. There is going to 29 partition outputs in HDFS storage with the file name as state name. We will check this in this step.
In This step, we seeing the 29 partition outputs in HDFS
What is Buckets?
Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for efficient querying.
- The data i.e. present in that partitions can be divided further into Buckets
- The division is performed based on Hash of particular columns that we selected in the table.
- Buckets use some form of Hashing algorithm at back end to read each record and place it into buckets
- In Hive, we have to enable buckets by using the set.hive.enforce.bucketing=true;
Step 1) Create Bucket
create table samplebucket {first_name string, job_id int, department string, salary string, country string}
clustered by (country) into 4 buckets
row format delimited
fields terminated by ',';
Step 2) Loading Data into table sample bucket
Assuming that "Employees table" already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.
Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.
Here we are loading data into sample bucket from employees table.
Assuming that"Employees table" already created in Hive system. In this step, we will see the loading of Data from employees table into table sample bucket.
Before we start moving employees data into buckets, make sure that it consist of column names such as first_name, job_id, department, salary and country.
Here we are loading data into sample bucket from employees table.
from employees
insert overwrite table samplebucket
select first_name, job_id, department, salary, country;
Once insert operation completed you will see data into 4 buckets which we have created at the time of table creation..
Views are similar to tables, which are generated based on the requirements.
- We can save any result set data as a view in Hive
- Usage is similar to as views used in SQL
- All type of DML operations can be performed on a view
Creation of View:
Syntax:
Create VIEW < VIEWNAME> AS SELECT
Example:
Hive>Create VIEW Sample_View AS SELECT * FROM employees WHERE salary>25000
In this example, we are creating view Sample_View where it will display all the row values with salary field greater than 25000.
What is Index?
Indexes are pointers to particular column name of a table.
The user has to manually define the index
Wherever we are creating index, it means that we are creating pointer to particular column name of table
Any Changes made to the column present in tables are stored using the index value created on the column name.
Syntax:
Create INDEX < INDEX_NAME> ON TABLE < TABLE_NAME(column names)>
Example:
Create INDEX sample_Index ON TABLE manmahive_internaltable(id)
Here we are creating index on table manmahive_internaltable for column name id.

