Differences between External and Internal tables in Hive
There are two ways to create tables in the Hive context and this post we'll show the differences, advantages and disadvantages.
Internal tables are known as Managed tables and we'll understand the reason in the following. Now, let's create an internal table using SQL in the Hive context and see the advantages and disadvantages.
create table coffee_and_tips_table (name string, age int, address string) stored as textfile;
To be honest I wouldn't say that it's an advantage but Internal tables are managed by Hive
Internal tables can't access remote storage services for example in clouds like Amazon AWS, Microsoft Azure and Google Cloud.
Dropping Internal tables all the data including metadata and partitions will be lost.
External tables has some interesting features compared to Internal tables and it's a good and recommended approach when we need to create tables.
In the script below you can see the difference between Internal table creation and External table related to the last section. We just added the reserved word external in the script.
create external table coffee_and_tips_external (name string, age int, address string) stored as textfile;
The data and metadata won't be lost if drop table
External tables can be accessed and managed by external process
External tables allows access to remote storage service as a source location
Again, I wouldn't say that it's a disadvantage but if you need to change schema or dropping a table, probably you'll need to run a command to repair the table as shown below.
msck repair table <table_name>
Depending on the volume, this operation may take some time to complete.
To check out a table type, run the following command below and you'll see at the column table_type the result.
hive> describe formatted <table_name>
That's it, I hope you guys enjoy it!