Tuesday, November 18, 2014

Hadoop Troubleshoot: Hadoop build error related to findbugs, Eclipse configuration, protobuf, and AvroRecord

Last week, I was trying to build Hadoop 2.5.0 from source code. I tried several ways to build the source code, first one is using Maven in terminal, and the second one is using Eclipse as my IDE.

1. Related to findbugs (using maven in terminal)

I read the BUILDING.txt that you can find in the root of Hadoop source code directory. And I ran this command to build me a hadoop package:

$ mvn package -Pdist,native,docs,src -DskipTests -Dtar

And some how in the middle of long building time, there is an error related to FINDBUGS_HOME environment variable as you can see in this specifice error message:


I already have the findbugs installed and then I tried to set the FINDBUGS_HOME to /usr/bin/findbugs using this command:

$ export FINDBUGS_HOME=/usr/bin/findbugs

But still no use, the error was still there. So I downloaded the findbugs source code from sourceforge and set once again the FINDBUGS_HOME to the findbugs source code root directory.

$ export FINDBUGS_HOME=/path/to/your/<sourcecode>/findbugs

I tried to run build command again and it went well this time. :)

2. Build path error (Eclipse configuration)

When you're trying to import the Hadoop projects into your Eclipse workspace, and try to build all the projects, probably you will many kind of errors, but you could see this specific error message too:

Unbound classpath variable: 'M2_REPO/asm/asm/3.2/asm-3.2.jar' in project 'hadoop-tools-dist' hadoop-tools-dist

This error related to M2_REPO Classpath variable in Eclipse. To solve this problem you can open Classpath Variable configuration  from Eclipse menu:

"Windows -> Preferences". It will open Preferences dialog. After that, in that dialog you can go to:

"Java -> Build Path -> Classpath Variable". You can add new Classpath Variable and give name to the new variable M2_REPO  and fill the path with: 


Try to rebuild all you projects. You won't see those kind of error again after that. But there are still many error in your project.

3. hadoop-streaming build path error

If you are lucky (or not), you will find this error message related to hadoop-streaming build path error:

Project 'hadoop-streaming' is missing required source folder: '/hadoop-2.5.0-working/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf'

This error is quite strange. Because the build path can't be changed although you already edited it (I tried this many times and it was quite annoying). But the solution more strange, just remove the build path configuration and it will disappear. You can do it this way:

To open the configuration, you can right click to your hadoop-streaming project to open the context menu. And then choose "Build Path -> Configure Build Path". It will open "Properties for hadoop-streaming" dialog. From that dialog, you can choose "Source" tab and select the problematic source folder and then press the "Remove" button. After that just rebuild the projects. And the remaining error you see is related to protobuf and avro missing files.

4. Protobuf and Avro missing files

The remaining errors you might find are related to missing files in hadoop-common project. The problematic packages in the project are org.apache.hadoop.io.serializer and org.apache.hadoop.ipc.protobuf (you can see the protobuf directory is empty). I have done searching about the empty protobuf directory but it only tells us to rebuild the project, and it will automatically generate the file for you, using maven with these commands: 

$ mvn clean
$ mvn compile -Pnative
$ mvn package
$ mvn compile findbugs:findbugs
$ mvn install
$ mvn package -Pdist,docs,src,native -Dtar

I don't know if this one works for you, but it didn't work in my project. These errors messages still linger in my project:

AvroRecord cannot be resolved to a type
The import org.apache.hadoop.ipc.protobuf.TestRpcServiceProtos cannot be resolved
The import org.apache.hadoop.ipc.protobuf.TestProtos cannot be resolved
TestProtobufRpcProto cannot be resolved to a type
TestProtobufRpc2Proto cannot be resolved to a type
EmptyRequestProto cannot be resolved to a type
EchoResponseProto cannot be resolved to a type

EchoRequestProto cannot be resolved to a type

After I found this very good site called grepcode, and I was able to find all classes and files that I needed (you even can get from other version of hadoop!).
For example you can download the AvroRecord.java file here, and you can put it in the directory respectively.

That's all. If you find this post is useful, please leave a comment below. See you in my next post.

1 comment:

Tips: How to ssh to your Digitalocean server without password

If you are tired of being asked for a password when accessing your remote droplet servers in Digitalocean, you might consider adding an rsa ...