What is File system?
A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.
What is Distributed file system?
File systems that manage the storage across a network of machines are called distributed file systems. A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer.
When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user’s computer while the data is being processed and is then returned to the server. A distributed file system is designed to hold a large amount of data and provide access to this data to many clients distributed across a network.
What is HDFS?
- Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem.
- It is the world’s most reliable storage system.
- HDFS is a distributed file system which provides storage in Hadoop in a distributed fashion.
- It is a Filesystem of Hadoop designed for storing very large files running on a cluster of commodity hardware (non-expensive, low-end hardware, used for daily purpose).
- This file system is designed on principle of storage of less number of large files rather than the huge number of small files.
- HDFS provides redundant storage space for storing files which are huge in sizes; files which are in the range of Terabytes and Petabytes.
- Files are broken into blocks and distributed across nodes in a cluster. After that each block is replicated, means copies of blocks are created on different machines. Hence if a machine goes down or gets crashed, then also we can easily retrieve and access our data from different machines. By default, 3 copies of a file are created on different machines. Hence it is highly fault-tolerant.
- It stores data reliably even in the case of hardware failure.
- HDFS provides faster file read and writes mechanism, as data is stored in different nodes in a cluster. Hence the user can easily access the data from any machine in a cluster. Hence HDFS is highly used as a platform for storing huge volume and different varieties of data worldwide.
- It provides high throughput access to application data by providing the data access in parallel.