Winutils Exe Download

Today, I was working on IBM Big Data University course Spark Fundamentals and found that there are some issues with Data Scientist Workbench (DSWB) site. DSWB’s Jupyter Notebook link was not working. I try to overcome this situation by creating Apache Spark Standalone Mode Setup on my home Windows 10 PC. This blog post summarizes steps that I have performed for the purpose.

Winutils Exe Download
Winutils.exe Download Google
Winutils.exe Download Spark
Hadoop Winutils.exe Download

In order to avoid error, download winutils.exe binary and add the same to the classpath. Error: java.io.IOException: Could not locate executable null bin winutils.exe in the Hadoop binaries. Winutils free download - WinUtilities Free Edition, WinUtilities Professional Edition, WinUtilities Free Undelete, and many more programs. Winutils free download - WinUtilities Free Edition. Winutils free download - WinUtilities Free Edition, WinUtilities Professional Edition, WinUtilities Free Undelete, and many more programs. Winutils free download - WinUtilities Free Edition.

Please refer Wikipedia Apache Spark page ( https://en.wikipedia.org/wiki/Apache_Spark ) to start learning about the same.

Software Version details:

OS: Microsoft Windows 10 [Version 10.0.14393] 64bit
Java JDK Version 1.8.0_101
Apache Spark version 2.0.2
Scala Version 2.12.0

Install Java

Java is required for Apache Spark. Spark overview page clearly mentions “It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation”. I’ve used Java JDK Version 1.8.0_101 for my setup.

Scala

Apache Spark is written in Scala programming language and needs it installed on local PC. I’ve downloaded Scala 2.12.0 binaries MSI installer from ( http://www.scala-lang.org/download/ ). Followed standard installation prompts and installed Scala on default path ( C:Program Files (x86)scala ).

winutils

I referred various sources and found that Spark can run locally, but needs winutils.exe which is a component of Hadoop. So why exactly is winutils and why it is required? On further investigation, I found that among other things, it seems like Spark uses Hadoop which calls UNIX commands such as chmod to create files and directories. Also, winutils calls are made to read and write files on Windows. In summary, it is required for running shell commands on Windows OS. I’m running 64bit Windows 10 and downloaded winutils.exe from this Git Hub URL: https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin . Placed the winutils.exe file to a folder.

Spark

I’ve downloaded latest Spark release 2.0.2 (Nov 14, 2016) from the official download site.( https://spark.apache.org/downloads.html ). The downloaded file is 7zip compressed. I’ve extracted the files to a folder in D drive as my C drive have limited space.

Environment Variables

Following environment variables set to specify the path where various required components are installed.

HADOOP_HOME: I set this variable to the folder containing the winutils.exe file
JAVA_HOME: set the folder path for my JDK
SCALA_HOME: the bin folder of the Scala location.
SPARK_HOME: the bin folder path of uncompressed Spark

Following are the values for my Desktop:

Also added “D:spark-2.0.2-bin-hadoop2.7bin” folder to PATH environment variable. This give me flexibility to run spark from anywhere from command prompt.

Start Spark

Opened a command prompt and run “spark-shell.cmd”. Immediately got some errors on console regarding hive directory write permission. Googled and found that Spark is looking for “tmphive” folder also this folder is expected to have 777 permissions. Further results provide hints that this permission should be granted using following winutils command.

Testing Spark

Followed Scala Quick Start Guide for tests

Apache Spark

Spark is an in-memory cluster computing framework for processing and analyzing large amounts of data.

Steps to Install Spark in Windows.

Step 1: Install Java

You must have java installed in your system.

If you dont have java installed in your system, download the appropriate java version from this link.

Step 2: Downloading Winutils.exe and setting up Hadoop path

To run spark in windows without any complications, we have to download winutils . Download winutils from this link.
Now, create a folder in C drive with name winutils. Inside the winutils folder create another folder named bin and place the winutils.exe .
Now, open environment variables and click on new. And, add HADOOP_HOME as shown in the pic.