Sunday, August 31, 2014

Running Mahout on Windows



Why is Windows so difficult to please? I overcame a lot of difficulties to build Hadoop on Windows. Just as I was ready to breathe a sigh of relief, boom, more troubles: running a simple Mahout example failed. 

My Mahout location is: c:\EclipseProjects\Libraries\mahout-distribution-0.8\

Running a simple Mahout program failed (using Cygwin):


$ cd c:/EclipseProjects/Libraries/mahout-distribution-0.8/bin
$ sh mahout seqdirectory -i C:/EclipseProjects/mahout-cookbook/chapter02/Lastfm-ArtistTags2007/original -o C:/EclipseProjects/mahout-cookbook/chapter02/Lastfm-ArtistTags2007/sequence
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:105)
Caused by: java.lang.ClassNotFoundException:org.apache.hadoop.util.ProgramDriver



This suggested that Mahout couldn’t find Hadoop. Mahout distribution is shipped with a Hadoop Jar: mahout-distribution-0.8\lib\hadoop\hadoop-core-1.1.2.jar, this Jar should be included into Mahout classpath. 

Open C:\EclipseProjects\Libraries\mahout-distribution-0.8\bin\mahout, and add this line into classpath:

# add release dependencies to CLASSPATH
  for f in $MAHOUT_HOME/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done
 
  CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/lib/hadoop/hadoop-core-1.1.2.jar;
 

Rerun the example, it failed again with:

Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-apple\mapred\staging\apple1283258319\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 


There are many threads on web discussing this issue, almost all come to the conclusion it is not solvable and it has to be worked around by changing the source code. So I download Hadoop 1.1.2 source code, and imported hadoop-1.1.2\src\core into Eclipse. The culprit is org.apache.hadoop.fs.FileUtil. Add try{} catch{} in this method:



public static void setPermission(File f, FsPermission permission)
                    throws IOException {
             FsAction user = permission.getUserAction();
             FsAction group = permission.getGroupAction();
             FsAction other = permission.getOtherAction();

             // use the native/fork if the group/other permissions are different
             // or if the native is available
             if (group != other || NativeIO.isAvailable()) {
                    execSetPermission(f, permission);
                    return;
             }

             try {
                    boolean rv = true;

                    // read perms
                    rv = f.setReadable(group.implies(FsAction.READ), false);
                    checkReturnValue(rv, f, permission);
if (group.implies(FsAction.READ) != user.implies(FsAction.READ)) {
                          f.setReadable(user.implies(FsAction.READ), true);
                          checkReturnValue(rv, f, permission);
                    }

                    // write perms
                   rv = f.setWritable(group.implies(FsAction.WRITE), false);
                    checkReturnValue(rv, f, permission);
if (group.implies(FsAction.WRITE) != user.implies(FsAction.WRITE)) {
                           f.setWritable(user.implies(FsAction.WRITE), true);
                          checkReturnValue(rv, f, permission);
                    }

                    // exec perms
                    rv = f.setExecutable(group.implies(FsAction.EXECUTE), false);
                    checkReturnValue(rv, f, permission);
                    if (group.implies(FsAction.EXECUTE) != user
                                 .implies(FsAction.EXECUTE)) {
                           f.setExecutable(user.implies(FsAction.EXECUTE), true);
                          checkReturnValue(rv, f, permission);
                    }
             } catch (IOException ioe) {
                    LOG.warn("Java file permissions failed to set " + f + " to "
                                 + permission + " falling back to fork");
                    execSetPermission(f, permission);
             }
       }
 

Compile this file into hadoop-core-FileUtil-1.1.2.jar and copy it under mahout-distribution-0.8\lib\hadoop\, add the jar into Mahout classpath:
# add release dependencies to CLASSPATH
  for f in $MAHOUT_HOME/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done
 
  CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/lib/hadoop/hadoop-core-1.1.2.jar;
  CLASSPATH=$MAHOUT_HOME/lib/hadoop/hadoop-core-FileUtil-1.1.2.jar:${CLASSPATH};
 
At last, my first Mahout example was able to run!


 

3 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

    ReplyDelete
  2. Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog.
    Oracle Training In Chennai

    ReplyDelete
  3. this blog having more useful information about mahout which really helpful to develop my knowledge in Hadoop..

    best hadoop training in chennai | best big data training in chennai

    ReplyDelete