Hive Metastore
Meta store is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using meta store service API. Hive meta store consists of two fundamental units:
- A service that provides meta store access to other Apache Hive services.
- Disk storage for the Hive metadata which is separate from HDFS storage.
There are 3 modes for Hive Meta store deployment
- Embedded Metastore
- Local Metastore
- Remote Metastore
1. Embedded Metastore
Both the meta store service and the Hive service runs in the same JVM by default using an embedded Derby Database instance where metadata is stored in the local disk. This is called embedded meta store configuration. In this case, only one user can connect to meta store database at a time. If you start a second instance of Hive driver, you will get an error. This is good for unit testing, but not for the practical solutions.
2. Local Metastore
Hive is the data-warehousing framework, so hive does not prefer single session. To overcome this limitation of Embedded Meta store, Local Meta store was introduced. This configuration allows us to have multiple Hive sessions i.e. Multiple users can use the meta store database at the same time. This is achieved by using any JDBC compliant database like MySQL which runs in a separate JVM or a different machine than that of the Hive service and meta store service which are running in the same JVM as shown above. In general, the most popular choice is to implement a MySQL server as the meta store database.
3. Remote Metastore Â
In the remote meta store configuration, the meta store service runs on its own separate JVM and not in the Hive service JVM. In this mode, meta store runs on its own separate JVM, not in the Hive service JVM. If other processes want to communicate with the metastore server they can communicate using Thrift Network APIs. We can also have one more meta store servers in this case to provide more availability. The main advantage of using remote meta store is you do not need to share JDBC login credential with each Hive user to access the meta store database.
Databases Supported by Hive
Hive supports 5 back end databases which are as follows:
- Derby
- MySQL
- MS SQL Server
- Oracle
- Postgres