You can choose any python version you want. python3 -Vīy default spark comes with python 2, however for distributed deep learning development I prefer to use python version as 3.6.x (because of the compatibility issues of other libraries). I am using Ubuntu 16.04 (highly recommended because it is most stable version and I didn’t find any compatibility issues) that comes with python 3.5.2 versions which you can check by following command. To install windows sub-system you can follow the tutorial here.
Windows 10 offers an application to install sub-operating-system known as the windows sub-system (WSL).
If you don’t see mapred-site.xml then open file and rename it to mapred-site.xml Ĭheck if C:\BigData\hadoop-2.9.1\etc\hadoop\slaves file is present, if it’s not then create one and add localhost in it and save it Format Name Node Open C:\BigData\hadoop-2.9.1\etc\hadoop\ mapred-site.xml and below content within tags. įinally, let’s configure properties for the Map-Reduce framework. Open C:\BigData\hadoop-2.9.1\etc\hadoop\ hdfs-site.xml and below content within tags. Īfter editing core-site.xml, you need to set replication factor and the location of namenode and datanodes. Open C:\BigData\hadoop-2.9.1\etc\hadoop\ core-site.xml and below content within tags. Set PATH=%PATH% %HADOOP_PREFIX%\bin Edit core-site.xml Set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop Open C:\BigData\hadoop-2.9.1\etc\hadoop\ hadoop-env.cmd and add below content at the bottom set HADOOP_PREFIX=%HADOOP_HOME% Once environment variables are set up, we need to configure Hadoop by editing the following configurations files.įirst, let’s configure the Hadoop environment file. Make sure you have opened a new command prompt to test them. If the variables are not initialized yet then it can probably be because you are testing them in an old session. Open a new Windows Command prompt and run echo command on each variable to confirm they are assigned the desired values. Now that we have set the environment variables, we need to validate them.
Once the binaries are unpacked you should see below files and folders. One of the way to get around those errors is to install Cygwin with a standard tar package, and then run “ tar – xvf ” from Windows/Cygwin Command prompt.
Note that some standard unzip Software may yield ‘Path too long’ error.
If you don’t have a software to unpack a tar.gz then you can download 7-zip to do so. Unpack the tar.gz in C:/BigData/ hadoop-2.9.1 folder. If there are spaces in the folder then some of the variables will not expand properly. In this post, we’ll create ‘C:/BigData/hadoop-2.9.1’ folder and refer that further on, but you can choose whatever makes sense for you.ĭon’t give any spaces in the folder names. So, create a separate folder where you’ll be unpacking the binaries. You should get the hadoop-2.9.1.tar.gz file.įor plenty of obvious reasons, you may want to organize your installations properly. To download the binaries, go to and search for Hadoop 2.9.1 binaries or click here to go directly to the download page. Brace yourself! Download Hadoop 2.9.1 binaries
There are many manual steps and any miss can lead to a failure or a learning opportunity – depending upon whether you see a glass half full or half empty. We are going to perform quite a few steps here so I recommend to set aside some time and do these steps very patiently and carefully. We’ll cover that method in one of the future posts. It requires building the source using Apache Maven and Windows SDK.
You can install Hadoop on your local machine using its source code also. This post covers the steps to install Hadoop 2.9.1 on Windows 10 using its binaries. In this post, we have laid out a detailed step by step guide to set up and configure Hadoop on a lightweight windows machine along with a small demonstration of putting a local file into HDFS. Although these methods are effective they require considerably high hardware configurations. There are multiple ways you can install Hadoop on Windows but most of them require installing a virtual machine or using docker containers to run Cloudera or HDP images on them.