Diary of a Programmer: hdfs

Tuesday, November 4, 2014

Hadoop Tips: File system manipulation / modification commands

If you read my previous post about Hadoop useful URL, I promise you to write about file system manipulation commands like add new file/directory, renaming file/directory, or deleting file/directory. You can do easily if you familiar with Linux command.

In Hadoop 2.5.0, all filesystem manipulation command can be done using the 'hdfs' files that you can find in hadoop bin/ directory. The usage pattern of that command is as below:

$ hdfs dfs -<command>

Here are the commands that you need to know:

1. ls

"ls" command let you to show the content of your current directory. you can add -R option to show the content all of your directories recursively.

$ hdfs dfs -ls [-R] [-h] [-d]
$ hdfs dfs -ls -R

2. put

"put" command can be use to put or upload your local file/directory into HDFS. If you not specify the file, it will put all your directory content to the HDFS destination directory. Here's the example to use it:

$ hdfs dfs -put <localpath> <hdfs path>
$ hdfs dfs -put local-file.txt destination-file.txt

3. mkdir

You create a directory in your HDFS by using mkdir command.

$ hdfs dfs -mkdir <destinationpath>/<directory name>
$ hdfs dfs -mkdir /user/username/new-directory
$ hdfs dfs -mkdir new-directory

4. mv

Just like in Linux command, you can use "mv" command to move file or directory from one location to another location. Or you can also rename file or directory using this command. Here are the examples:

$ hdfs dfs -mv <hdfs old location> <hdfs new location>
$ hdfs dfs -mv /user/username/something.txt /user/username/otherdirectory/
$ hdfs dfs -mv /user/username/onedirectory /user/username/otherdirectory/
$ hdfs dfs -mv <hdfs old name> <hdfs new name>
$ hdfs dfs -mv /user/username/something.txt /user/username/newthing.txt
$ hdfs dfs -mv /user/username/olddirectory /user/username/newdirectory

5. rm

To delete files or directories you can use "rm" command. You can add [-R] option to do the delete recursively into the directory.

$ hdfs dfs -rm [-R] <file/directory to be deleted>
$ hdfs dfs -rm somefile.txt
$ hdfs dfs -rm -R directory

I think that 5 commands will give you "power" to manipulate the HDFS files/directories :)

If you want more complete list, you can refer to this documentation. There will be "cat", "touchz", "cp", and many other command.

If you find my post useful, please leave a comment below. Thanks for reading.

Hadoop Tips: Useful url in Hadoop system

For this several weeks I have installed and played with hadoop system. And a lot of thing I need to learn about it. So, I want to make this post so I don't forget what I have learn so far. For installation tutorial you can follow this (hadoop 2.5.0) good tutorial.

There are some url that is useful for administrating Hadoop 2.5.0 after you run the system using start-all.sh located in sbin directory. I want to write down the list down below:

1. NameNode (NN) Web UI: localhost:50070

There are several tabs in this website. In NN UI Overview tab you can see the NN status, how much storage do you have in total, used space, free space, and other statistics about your system. In the Datanode tab you can find information about all of your functioning datanodes and decomissioned datanode. The Snapshot tab contains information about your created Snapshot. You can see your startup progress in Startup tab. The last tab, Utilities, is also very useful, you can find links to the file system browser and the log browser in that tab.

You can access your hadoop file system (HDFS) browser from http://localhost:50070/explorer.html. You can see your created directories structure from here, but you can't do things like deleting, renaming, or modifying your file system, it only let you to see your directories and files. If you want to edit your directories or files, you can read my other post later (I will write it for you :D). The last link is about log explorer that you can find in http://localhost:50070/logs/. You can find all logs created by datanode, namenode, secondary namenode, resource manager, etc.

2. ResourceManager Web UI: localhost:8088

In this Resource manager you can see a lot information about you cluster, nodes, applications, scheduler, and many more.

--------------------------------------
I haven't explore all of Hadoop feature, but I hope you can find this post useful. Please leave a comment if there is any question or you find my post useful. Cheers!

Diary of a Programmer

Tuesday, November 4, 2014

Hadoop Tips: File system manipulation / modification commands

Hadoop Tips: Useful url in Hadoop system

Finally, C# 9 record, the equivalent of Scala's case class

Total Pageviews

More Friend of Programmer Websites