Skip to content

Hadoop CLI on Windows

Johnny Foulds edited this page Jul 12, 2020 · 3 revisions

This page shows how to deploy Hadoop to a development machine that will be used to interact with the Hadoop cluster.

Pre-Installed Software

Setup environment variables

Variables Value
JAVA_HOME C:\PROGRA~1\Java\jdk1.8.0_211
HADOOP_HOME c:\data-analytics\hadoop

Add %JAVA_HOME%\bin, %HADOOP_HOME%\bin, and %HADOOP_HOME%\sbin into Path environment variable.

Install Hadoop

Download the Binaries

PS C:\> mkdir c:\data-analytics
PS C:\> cd c:\data-analytics\
PS C:\> wget http://archive.apache.org/dist/hadoop/core/hadoop-3.1.2/hadoop-3.1.2.tar.gz

Install the Binaries

$ cd /mnt/c/data-analytics/
$ tar -xvzf hadoop-3.1.2.tar.gz

$ echo "hadoop-3.1.2" > hadoop-3.1.2/_version.txt
$ mv hadoop-3.1.2 hadoop

Patch Hadoop

$ wget https://github.com/s911415/apache-hadoop-3.1.0-winutils/raw/master/bin/winutils.exe
$ wget https://github.com/s911415/apache-hadoop-3.1.0-winutils/raw/master/bin/hadoop.dll

$ mv winutils.exe hadoop/bin/
$ mv hadoop.dll hadoop/bin/

Test HDFS

PS C:\> hdfs -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

PS C:\> hadoop fs -ls hdfs://pshp111:9000/

Configure fs.defaultFS

Edit hadoop/etc/hadoop/core-site.xml and add the following property to fix it to the a server to not have to type it every time.

<property>
	<name>fs.defaultFS</name>
	<value>hdfs://pshp111:9000</value>
</property>

Upload a sample file to test if it is working:

PS C:\> hdfs dfs -put C:\Temp\IISLogs\W3SVC1291934293\u_ex190620.log /
PS C:\> hdfs dfs -ls /

Simple Test From Apache Zeppelin

val logFile = sc.textFile("hdfs://pshp111:9000/u_ex190620.log")
z.show(logFile.toDF)

Web References