Sunday, August 31, 2014

Running Mahout on Windows



Why is Windows so difficult to please? I overcame a lot of difficulties to build Hadoop on Windows. Just as I was ready to breathe a sigh of relief, boom, more troubles: running a simple Mahout example failed. 

My Mahout location is: c:\EclipseProjects\Libraries\mahout-distribution-0.8\

Running a simple Mahout program failed (using Cygwin):


$ cd c:/EclipseProjects/Libraries/mahout-distribution-0.8/bin
$ sh mahout seqdirectory -i C:/EclipseProjects/mahout-cookbook/chapter02/Lastfm-ArtistTags2007/original -o C:/EclipseProjects/mahout-cookbook/chapter02/Lastfm-ArtistTags2007/sequence
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ProgramDriver at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:105)
Caused by: java.lang.ClassNotFoundException:org.apache.hadoop.util.ProgramDriver



This suggested that Mahout couldn’t find Hadoop. Mahout distribution is shipped with a Hadoop Jar: mahout-distribution-0.8\lib\hadoop\hadoop-core-1.1.2.jar, this Jar should be included into Mahout classpath. 

Open C:\EclipseProjects\Libraries\mahout-distribution-0.8\bin\mahout, and add this line into classpath:

# add release dependencies to CLASSPATH
  for f in $MAHOUT_HOME/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done
 
  CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/lib/hadoop/hadoop-core-1.1.2.jar;
 

Rerun the example, it failed again with:

Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-apple\mapred\staging\apple1283258319\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 


There are many threads on web discussing this issue, almost all come to the conclusion it is not solvable and it has to be worked around by changing the source code. So I download Hadoop 1.1.2 source code, and imported hadoop-1.1.2\src\core into Eclipse. The culprit is org.apache.hadoop.fs.FileUtil. Add try{} catch{} in this method:



public static void setPermission(File f, FsPermission permission)
                    throws IOException {
             FsAction user = permission.getUserAction();
             FsAction group = permission.getGroupAction();
             FsAction other = permission.getOtherAction();

             // use the native/fork if the group/other permissions are different
             // or if the native is available
             if (group != other || NativeIO.isAvailable()) {
                    execSetPermission(f, permission);
                    return;
             }

             try {
                    boolean rv = true;

                    // read perms
                    rv = f.setReadable(group.implies(FsAction.READ), false);
                    checkReturnValue(rv, f, permission);
if (group.implies(FsAction.READ) != user.implies(FsAction.READ)) {
                          f.setReadable(user.implies(FsAction.READ), true);
                          checkReturnValue(rv, f, permission);
                    }

                    // write perms
                   rv = f.setWritable(group.implies(FsAction.WRITE), false);
                    checkReturnValue(rv, f, permission);
if (group.implies(FsAction.WRITE) != user.implies(FsAction.WRITE)) {
                           f.setWritable(user.implies(FsAction.WRITE), true);
                          checkReturnValue(rv, f, permission);
                    }

                    // exec perms
                    rv = f.setExecutable(group.implies(FsAction.EXECUTE), false);
                    checkReturnValue(rv, f, permission);
                    if (group.implies(FsAction.EXECUTE) != user
                                 .implies(FsAction.EXECUTE)) {
                           f.setExecutable(user.implies(FsAction.EXECUTE), true);
                          checkReturnValue(rv, f, permission);
                    }
             } catch (IOException ioe) {
                    LOG.warn("Java file permissions failed to set " + f + " to "
                                 + permission + " falling back to fork");
                    execSetPermission(f, permission);
             }
       }
 

Compile this file into hadoop-core-FileUtil-1.1.2.jar and copy it under mahout-distribution-0.8\lib\hadoop\, add the jar into Mahout classpath:
# add release dependencies to CLASSPATH
  for f in $MAHOUT_HOME/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done
 
  CLASSPATH=${CLASSPATH}:$MAHOUT_HOME/lib/hadoop/hadoop-core-1.1.2.jar;
  CLASSPATH=$MAHOUT_HOME/lib/hadoop/hadoop-core-FileUtil-1.1.2.jar:${CLASSPATH};
 
At last, my first Mahout example was able to run!


 

1 comment:

  1. Infycle Technologies, the top software training institute and placement center in Chennai offers the Best Digital Marketing course in Chennai for freshers, students, and tech professionals at the best offers. In addition to Digital Marketing, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

    ReplyDelete