Restricting Hadoop Slave node storage using the LVM concept

4 min readSep 6, 2022

In this article, we will see how to contribute limited storage from data node(slave node) to Hadoop Cluster using the Logical Volume Partition concept of Linux.

Setup:

On AWS, 2 instances of RedHat Linux Image launched named ‘Master’ and ‘Slave’. And, installed JDK and Hadoop softwares in both the instances.
Attached an EBS volume of 1 GiB to the Slave node. Created Partition in the attached volume and mounted it to the Data-node directory.

After the instances are launched to do remote login I used PuTTY software.
Now after this we will create EBS volume of 1GiB by clicking on Volumes in AWS Console. I named it slave-volume. Notice here the 2 volumes of 10GiB are Root Volumes.

Now, We will attach our slave-volume to our datanode instance running. It is just like we insert pendrive in our local system or we create a virtual harddisk and attach to it. Mainly we have to give the instance id only while attaching.
Attaching the volume is so simple, we just need to select that volume first, click on Actions Button and after that we have to provide instance id.

After attaching a volume to data-node we can see that volume is in in-use status.
As we remotely logged in to the data-node instance previously, now we can see a similar Linux terminal, and to confirm that EBS volume we created or not we can use fdisk -l command.

Now the main concept of Partitions comes into play. To share limited storage we are going to create a partition in that 1GiB volume attached. Here we created a partition of 512M(512 MiB). Here We can see the device /dev/xvdf1 created.

Now we know we share the storage of Data-node to Name-Node to solve the problem of storage we have in BigData. So, For storing any data in this partition, we have to first Format it.
To format that partition we created we need to run a command.

mkfs.ext4 /dev/xvdf1

Now, We have to mount it on the same directory of data-node we will be using in Hadoop Cluster.
To mount the partition on the desired directory, we have to run the following command:

mount /dev/xvdf1 /dn2

After mounting we can confirm or see the size using df -hT command.

Here all the steps of partitioning and mounting are completed.
Now we have to configure the hdfs-site.xml file and core-site.xml file in Name-node and Data-node.

Remember, In Namenode we have we have to format /nn directory we created and After that start the service of NameNode using hadoop-daemon.sh start namenode.
Now we have to also configure the same files in Datanode also in the similar way, just remember to give IP of Name-node in core-site.xml file.

Atlast, start the service of datanode also.
Finally, Our all configurations are completed here.
Now by command hadoop dfsadmin -report ,We can see the status of hadoop cluster. This Command can be run from any of the node.
Finally,we have set the limitation to data-node storage size.

Thank you! keep learning! keep growing! keep sharing!

If you enjoyed this give it a clap, follow me on Medium for more
Let’s connect on LinkedIn

Restricting Hadoop Slave node storage using the LVM concept

Setup:

Written by Krithika Sharma