Monday, August 11, 2014

Build Hadoop in windows



I run into a lot of issues in building Hadoop in windows. I have to build Hadoop in windows in order to generate some windows native components which are not included in Hadoop binary distribution. 

Without these windows native components, if I try to run a Hadoop example in windows, I will get this error:


c:\hadoop-2.4.1>hadoop jar share\hadoop\mapreduce\hadoop-mapreduce-examples-2.4.1.jar pi 10 100
14/08/11 15:11:32 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
 


This blog http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os is the most detailed step-by-step guideline I could google, and yet, even it failed to mention some essential steps that I had to take to succeed. So I wrote this blog to fill in the gaps.

My windows system is:

c:\>systeminfo | findstr /B /C:"OS Name" /C:"OS Version" /C:"System Type"
OS Name:                   Microsoft Windows 7 Enterprise
OS Version:                6.1.7601 Service Pack 1 Build 7601
System Type:               x64-based PC 


First of all, read carefully the “Building on Windows” section of hadoop-2.4.1-src\BUILDING.txt.





Hadoop

I am getting the latest version (2.4.1) of binary distribution and source distribution.
Unzip the source distribution to C:\hadoop. It is important that you use a short path to hold the source code, otherwise you will run into “too long path” error when building. In fact, C:\hadoop-2.4.1-src (the default name) is apparently too long to build some classes.

Cgwin

Add C:\cygwin\bin into windows system variable path

Microsoft Windows SDK v7.1

If you fail to install Microsoft Windows SDK with 5100 error, check out this:

Maven

Add MAVEN_HOME system variable:

 







Protocol Buffers 2.5.0

Add protoc location to path.

 


CLib

This step is missing from the above mentioned blog.


I downloaded zlib1.2.7 because it is mentioned in hadoop-2.4.1-src\BUILDING.txt that this is the version tested with. 

Add ZLIB_HOME system variable:

 

The following is important: copy the two header files (zconf.h, zlib.h) from %ZLIB_HOME%\include to %ZLIB_HOME%.

Platform variable

You are almost set now, one last step:
set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)

To avoid typing, you can setup this as a windows system variable:


Build

Start cmd from Windows SDK:
 


Run command:
mvn package -Pdist,native-win  -DskipTests –Dtar

If build is successful, you will see these windows native files are generated under C:\hadoop\hadoop-common-project\hadoop-common\target\bin:

 

Now you have the windows native files, which you can copy these files into Hadoop binary distribution hadoop-2.4.1\bin.
 



6 comments:

  1. I have configured in your way and its working. But when I am running code , its completing map 100 but reduce 0%/. Please I am facing this issue since long time please help me


    Big Data Training
    Big Data Course in Chennai
    Big Data Hadoop Training in Chennai

    ReplyDelete
  2. There will be a lot of difference in attending hadoop online training center compared to attending a live classroom training. However, websites like this with rich in information will be very useful for gaining additional knowledge.

    ReplyDelete
  3. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..
    Selenium Training in Chennai | QTP Training in Chennai

    ReplyDelete
  4. Thanks for Information Oracle Apps Technical is a collection of a bunch of collected applications like accounts payables, purchasing, inventory, accounts receivables, human resources, order management, general ledger and fixed assets, etc which have its own functionality for serving the business
    Oracle Apps Training In Chennai

    ReplyDelete
  5. Oracle Training in chennai | Oracle D2K Training In chennai
    This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

    ReplyDelete
  6. The content provided here is vital in increasing one's knowledge regarding hadoop, the way you have presented here is simply awesome. Thanks for sharing this. The uniqueness I see in your content made me to comment on this. Keep sharing article like this. Thanks :)

    Hadoop Training Chennai | Hadoop Training in Chennai | Big data training in Chennai

    ReplyDelete