I run into a lot of issues in building Hadoop in windows. I have to build Hadoop in windows in order to generate some windows native components which are not included in Hadoop binary distribution.
Without these windows native components, if I try to run a Hadoop example in windows, I will get this error:
c:\hadoop-2.4.1>hadoop jar share\hadoop\mapreduce\hadoop-mapreduce-examples-2.4.1.jar pi 10 10014/08/11 15:11:32 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
This blog http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os is the most detailed step-by-step guideline I could google, and yet, even it failed to mention some essential steps that I had to take to succeed. So I wrote this blog to fill in the gaps.
My windows system is:
c:\>systeminfo | findstr /B /C:"OS Name" /C:"OS Version" /C:"System Type"OS Name: Microsoft Windows 7 EnterpriseOS Version: 6.1.7601 Service Pack 1 Build 7601System Type: x64-based PC
First of all, read carefully the “Building on Windows” section of hadoop-2.4.1-src\BUILDING.txt.
I am getting the latest version (2.4.1) of binary distribution and source distribution.
Unzip the source distribution to C:\hadoop. It is important that you use a short path to hold the source code, otherwise you will run into “too long path” error when building. In fact, C:\hadoop-2.4.1-src (the default name) is apparently too long to build some classes.
Add C:\cygwin\bin into windows system variable path.
Microsoft Windows SDK v7.1
If you fail to install Microsoft Windows SDK with 5100 error, check out this:
Add MAVEN_HOME system variable:
Protocol Buffers 2.5.0
Add protoc location to path.
This step is missing from the above mentioned blog.
I downloaded zlib1.2.7 because it is mentioned in hadoop-2.4.1-src\BUILDING.txt that this is the version tested with.
Add ZLIB_HOME system variable:
The following is important: copy the two header files (zconf.h, zlib.h) from %ZLIB_HOME%\include to %ZLIB_HOME%.
You are almost set now, one last step:
set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)
To avoid typing, you can setup this as a windows system variable:
Start cmd from Windows SDK:
mvn package -Pdist,native-win -DskipTests –Dtar
Now you have the windows native files, which you can copy these files into Hadoop binary distribution hadoop-2.4.1\bin.