Monday, August 11, 2014

Build Hadoop in windows



I run into a lot of issues in building Hadoop in windows. I have to build Hadoop in windows in order to generate some windows native components which are not included in Hadoop binary distribution. 

Without these windows native components, if I try to run a Hadoop example in windows, I will get this error:


c:\hadoop-2.4.1>hadoop jar share\hadoop\mapreduce\hadoop-mapreduce-examples-2.4.1.jar pi 10 100
14/08/11 15:11:32 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
 


This blog http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os is the most detailed step-by-step guideline I could google, and yet, even it failed to mention some essential steps that I had to take to succeed. So I wrote this blog to fill in the gaps.

My windows system is:

c:\>systeminfo | findstr /B /C:"OS Name" /C:"OS Version" /C:"System Type"
OS Name:                   Microsoft Windows 7 Enterprise
OS Version:                6.1.7601 Service Pack 1 Build 7601
System Type:               x64-based PC 


First of all, read carefully the “Building on Windows” section of hadoop-2.4.1-src\BUILDING.txt.





Hadoop

I am getting the latest version (2.4.1) of binary distribution and source distribution.
Unzip the source distribution to C:\hadoop. It is important that you use a short path to hold the source code, otherwise you will run into “too long path” error when building. In fact, C:\hadoop-2.4.1-src (the default name) is apparently too long to build some classes.

Cgwin

Add C:\cygwin\bin into windows system variable path

Microsoft Windows SDK v7.1

If you fail to install Microsoft Windows SDK with 5100 error, check out this:

Maven

Add MAVEN_HOME system variable:

 







Protocol Buffers 2.5.0

Add protoc location to path.

 


CLib

This step is missing from the above mentioned blog.


I downloaded zlib1.2.7 because it is mentioned in hadoop-2.4.1-src\BUILDING.txt that this is the version tested with. 

Add ZLIB_HOME system variable:

 

The following is important: copy the two header files (zconf.h, zlib.h) from %ZLIB_HOME%\include to %ZLIB_HOME%.

Platform variable

You are almost set now, one last step:
set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)

To avoid typing, you can setup this as a windows system variable:


Build

Start cmd from Windows SDK:
 


Run command:
mvn package -Pdist,native-win  -DskipTests –Dtar

If build is successful, you will see these windows native files are generated under C:\hadoop\hadoop-common-project\hadoop-common\target\bin:

 

Now you have the windows native files, which you can copy these files into Hadoop binary distribution hadoop-2.4.1\bin.
 



No comments:

Post a Comment