ZooKeeper in Hadoop is an open source project developed by Apache. Zookeeper provides a centralized infrastructure and its related services that ensures synchronization across a cluster.
ZooKeeper is used to maintain common objects needed in large cluster environments. It is used to store data in a centralized location with great accessibility. ZooKeeper runs on a cluster of servers known as an ensemble that shares the state of data.
ZooKeeper comes with a command-line client (CLI) for interactive user experience. The namespace in ZooKeeper is similar to the standard document framework. That means a name can be a combination of path components separated by a ‘/’ or simply a slash.
ZooKeeper in Hadoop has a hierarchical namespace identical to a distributed file system. Every node of the namespace is connected to its every children node. The file is also a directory.
ZooKeeper Uses in Apache Hadoop
- ZooKeeper is used by Apache Kafka to manage configurations.
- ZooKeeper keeps Access control lists (ACL) for all data topics that are maintained in ZooKeeper.
- ZooKeeper is used for maintaining centralized configuration information, naming, providing distributed synchronization, and providing group services.
Znodes in ZooKeeper
- Each node in ZooKeeper is called as a Znode. Znode maintains a stat structure. The namespace consists of data registers which are known as Znodes. Developer accesses these Znodes in the ZooKeeper for development.
- Each Znode has a timestamp associated to it. Version number and timestamp permits the ZooKeeper to accept cache and to organize updates.
Features of Znodes
- Watches (one time triggers)
- Data Access
- Ephermal Nodes
- Sequence Nodes (Unique Naming)
How ZooKeeper in Hadoop Tracks Time?
Version number
Whenever a change takes place in a node, a new version number is created. Version numbers can be classified as follows:
- Version – number of changes made to data of Znode
- aversion – number of changes made to children of Znode
- cversion – number of changes made to ACL of a Znode)
Zxid
Any change in ZooKeeper state is showcased by a stamp. The format of stamp is Zxid ( ZooKeeper Transaction ID). It is used to calculate the total number of changes theat occurred in a sequence. Each change has a unique Zxid.
Ticks
ZooKeeper servers use ticks to characterize timing of events such as status upload, session timeout, connection timeouts etc. If a client wants session timeout less than the minimum timeout, server indicates that the current session timeout is the minimum session timeout.
ZooKeeper Data Storage
ZooKeeper in Hadoop is designed to store the informative data or simply metadata such as status, configurations, locations etc. This type of data is measured in kilobytes so that the space is effectively used. Only small bits of data are stored.