Build a Custom Hadoop/Giraph Project with Maven

Context

 

Hey mates! Did you have dependencies problems on packing a jar file and submitting it to the Hadoop/Giraph Cluster? Well, you can see in the mail lists many people suggesting to use Apache Ant or Apache Maven, but without giving any concrete example of how do actually do it. In the next sections I will give you a quick example of how to actually do it.

Maven startup configuration

 

Well, let’s assume that you have Maven properly installed already – if not, check here.

To check if maven was properly installed, run:

mvn --version

You should see an output similar to (may differ slightly):

Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 14:51:28+0100)
Maven home: D:\apache-maven-3.0.5\bin\..
Java version: 1.6.0_25, vendor: Sun Microsystems Inc.
Java home: C:\Program Files\Java\jdk1.6.0_25\jre
Default locale: nl_NL, platform encoding: Cp1252
OS name: "windows 7", version: "6.1", arch: "amd64", family: "windows"

Creating the Maven Project

 

Ok, now create the directory where you would like your project to reside. Once you have done that, run the following command into the empty project directory.

mvn archetype:generate -DgroupId=com.marcolotz.giraph -DartifactId=TripletExample -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

Please choose a DgroupId and DartifactId that matches your project. The names in those fields above were just examples that I used in my personal project. Basically it belonged to package com.marcolotz.giraph and the project name was TripletExample

Modifying the POM file

 

Pom stands for Project Object Model and is the core of a project configuration in maven. You may have already realized that Giraph contains several pom files. When you created a maven project on the command that I mentioned above, a pom.xml file was also created. We now need to modify this file in order to build giraph code. Thus, modify the pom.xml content to the following:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

<modelVersion>4.0.0</modelVersion>
<groupId>com.marcolotz.giraph</groupId>
<artifactId>tripletExample</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>tripletExample</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.giraph</groupId>
<artifactId>giraph-core</artifactId>
<version>1.1.0</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.203.0</version>
</dependency>

</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>

</plugins>
</build>
</project>

Should you actually use it to build a Hadoop project, just make sure that you are using the correct Hadoop version in the pom file to build and remove the Giraph dependency. Also please note that the “@Algorithm” notation used in some Giraph examples is not defined in the Giraph-Core, but actually in the Giraph-Examples. This will cause some build problems should your source code contain it.

Inserting Custom Code and Building

 

Now you just need to insert you java files in the package folder (for the example above in tripletExample/src/main/java/com/marcolotz/giraph) and build the solution. In order to build it, run the following command line in the folder where the pom.xml file is located:

mvn clean compile assembly:single

This command will clean the target folder (if not already empty) and prepare a single jar file with all the project dependencies inside. The final product of the command will be in the target folder. Since it contain all dependencies, the size of this jar file may be quite large. Finally you can submit the jar file to your hadoop cluster ;)

 

Please note that there are other solutions to reduce the size of the jar file, which are giving a class path argument to hadoop/giraph instead of packing everything in inside of jar. This is also an elegant solution for when the jar file would be too large to be easily distributed in a cluster.

Giraph configuration details

Heeeey!

Today I had to deploy a single-node Apache Giraph installation in my machine. Giraph has a really good quick start guide (here). A few corrections need to be made in order to keep it up-to-date with the latest version available (1.1.0), but hopefully I will be able to do it this week.

There are, however, a few glitches that can happen during the installation process. Here I will present quick solutions to it.

OpenJDK management.properties problem

I must confess that I rather use the oracle java than the openjdk. As most of you may know, in order to run hadoop you need at least java 1.6 installed. In my current configuration, I was trying to run hadoop with Openjdk java 7, on Ubuntu 14.04.2

With all my $JAVA_HOME variables correctly set, i was getting the following error:

hduser@prometheus-UX32A:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/management$ hadoop namenode -format
Error: Config file not found: /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/management/management.properties

This code also appeared for other hadoop scripts like start-dfs.sh, start-mapred.sh – and of course start-all.sh.

Taking a quick look at the installation, I realized that management.properties was just a broken symbolic link. Also, looks like it is directly related with Java RMI capabilities (Remove Method Invocation). This is probably how Hadoop is going to start all the daemons across the nodes. I must confess that I installed all the opendjk packages available in the Ubuntu repository and none solve my problem.

This can be solved with the following workaround:

  1. Download the equivalent Oracle Java version HERE. Keep in mind to download the same version of your openjdk (i.e. download java 6, 7 or 8 depending on your jdk).
  2. Open the Oracle Java files and copy all the content of the folder /jre/lib/management/management.properties into the OpenJDK equivalent folder.
  3. Run again

Hopefully, this will solve that problem.

Giraph Mapper Allocation problem

Let’s suppose you installed Giraph correctly and Hadoop MapReduce examples are running as you expect in your machine.

Giraph gives, however, this very interesting line when you submit a job:

15/06/04 16:58:03 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers

If you want to make sure that we are talking about the same thing, here is a complete error output:

hduser@prometheus-UX32A:~$ hadoop jar /opt/apache/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1
15/06/04 16:57:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
15/06/04 16:57:58 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
15/06/04 16:57:59 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4)
15/06/04 16:58:03 INFO job.GiraphJob: Tracking URL: http://hdhost:50030/jobdetails.jsp?jobid=job_201506041652_0003
15/06/04 16:58:03 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
15/06/04 16:58:31 INFO mapred.JobClient: Running job: job_201506041652_0003
15/06/04 16:58:31 INFO mapred.JobClient: Job complete: job_201506041652_0003
15/06/04 16:58:31 INFO mapred.JobClient: Counters: 5
15/06/04 16:58:31 INFO mapred.JobClient:   Job Counters
15/06/04 16:58:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=31987
15/06/04 16:58:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/06/04 16:58:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/06/04 16:58:31 INFO mapred.JobClient:     Launched map tasks=2
15/06/04 16:58:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

Good news: It is not your fault (unless you think that giving bad hostnames to your machine is your fault). According to this JIRA there is a problem parsing hostnames. Basically, it does not recognize upper and lower letters in the hostname. This can be solved by setting a new hostname with only lower letters. To do this perform as sudo:

 sudo hostname NEW_HOST_NAME 

also, do not forget to change your /etc/hosts table to your new hostname. Keep in mind that this may affect other softwares that rely on your hostname to run. Once you change it, restart the machine and the hadoop daemons. Hopefully you will get the correct output, that in my case is:

hduser@prometheusmobile:/opt/apache/giraph/giraph-examples$ hadoop jar target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/output/shortestpaths -w 1
15/06/04 17:06:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
15/06/04 17:06:58 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
15/06/04 17:06:58 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4)
15/06/04 17:07:02 INFO job.GiraphJob: Tracking URL: http://hdhost:50030/jobdetails.jsp?jobid=job_201506041705_0002
15/06/04 17:07:02 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
15/06/04 17:07:31 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer prometheusmobile.ironnetwork:22181 --zkNode /_hadoopBsp/job_201506041705_0002/_haltComputation'
15/06/04 17:07:31 INFO mapred.JobClient: Running job: job_201506041705_0002
15/06/04 17:07:32 INFO mapred.JobClient:  map 100% reduce 0%
15/06/04 17:07:43 INFO mapred.JobClient: Job complete: job_201506041705_0002
15/06/04 17:07:43 INFO mapred.JobClient: Counters: 37
15/06/04 17:07:43 INFO mapred.JobClient:   Zookeeper halt node
15/06/04 17:07:43 INFO mapred.JobClient:     /_hadoopBsp/job_201506041705_0002/_haltComputation=0
15/06/04 17:07:43 INFO mapred.JobClient:   Zookeeper base path
15/06/04 17:07:43 INFO mapred.JobClient:     /_hadoopBsp/job_201506041705_0002=0
15/06/04 17:07:43 INFO mapred.JobClient:   Job Counters
15/06/04 17:07:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=40730
15/06/04 17:07:43 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/06/04 17:07:43 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/06/04 17:07:43 INFO mapred.JobClient:     Launched map tasks=2
15/06/04 17:07:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/06/04 17:07:43 INFO mapred.JobClient:   Giraph Timers
15/06/04 17:07:43 INFO mapred.JobClient:     Input superstep (ms)=188
15/06/04 17:07:43 INFO mapred.JobClient:     Total (ms)=9260
15/06/04 17:07:43 INFO mapred.JobClient:     Superstep 2 SimpleShortestPathsComputation (ms)=62
15/06/04 17:07:43 INFO mapred.JobClient:     Shutdown (ms)=8795
15/06/04 17:07:43 INFO mapred.JobClient:     Superstep 0 SimpleShortestPathsComputation (ms)=75
15/06/04 17:07:43 INFO mapred.JobClient:     Initialize (ms)=3254
15/06/04 17:07:43 INFO mapred.JobClient:     Superstep 3 SimpleShortestPathsComputation (ms)=44
15/06/04 17:07:43 INFO mapred.JobClient:     Superstep 1 SimpleShortestPathsComputation (ms)=61
15/06/04 17:07:43 INFO mapred.JobClient:     Setup (ms)=32
15/06/04 17:07:43 INFO mapred.JobClient:   Zookeeper server:port
15/06/04 17:07:43 INFO mapred.JobClient:     prometheusmobile.ironnetwork:22181=0
15/06/04 17:07:43 INFO mapred.JobClient:   Giraph Stats
15/06/04 17:07:43 INFO mapred.JobClient:     Aggregate edges=12
15/06/04 17:07:43 INFO mapred.JobClient:     Sent message bytes=0
15/06/04 17:07:43 INFO mapred.JobClient:     Superstep=4
15/06/04 17:07:43 INFO mapred.JobClient:     Last checkpointed superstep=0
15/06/04 17:07:43 INFO mapred.JobClient:     Current workers=1
15/06/04 17:07:43 INFO mapred.JobClient:     Aggregate sent messages=12
15/06/04 17:07:43 INFO mapred.JobClient:     Current master task partition=0
15/06/04 17:07:43 INFO mapred.JobClient:     Sent messages=0
15/06/04 17:07:43 INFO mapred.JobClient:     Aggregate finished vertices=5
15/06/04 17:07:43 INFO mapred.JobClient:     Aggregate sent message message bytes=267
15/06/04 17:07:43 INFO mapred.JobClient:     Aggregate vertices=5
15/06/04 17:07:43 INFO mapred.JobClient:   File Output Format Counters
15/06/04 17:07:43 INFO mapred.JobClient:     Bytes Written=0
15/06/04 17:07:43 INFO mapred.JobClient:   FileSystemCounters
15/06/04 17:07:43 INFO mapred.JobClient:     HDFS_BYTES_READ=200
15/06/04 17:07:43 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=43376
15/06/04 17:07:43 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=30
15/06/04 17:07:43 INFO mapred.JobClient:   File Input Format Counters
15/06/04 17:07:43 INFO mapred.JobClient:     Bytes Read=0
15/06/04 17:07:43 INFO mapred.JobClient:   Map-Reduce Framework
15/06/04 17:07:43 INFO mapred.JobClient:     Map input records=2
15/06/04 17:07:43 INFO mapred.JobClient:     Spilled Records=0
15/06/04 17:07:43 INFO mapred.JobClient:     Map output records=0
15/06/04 17:07:43 INFO mapred.JobClient:     SPLIT_RAW_BYTES=88

Electronics for Kids

Hey mates!

Originally I had planned to release this as a Christmas Gift for all the readers and followers, but I ran short on time.

A mate of mine had the amazing idea to write an introductory book of Electronics for kids, just to be able to explain to his soon, in a really simple way, the principles behind a few well-known components and circuits. When he showed me his work, I’ve found it so brilliant that I asked his permission to translate and modify the material to other languages.

This material dedicated to all the Electronic Engineers/fans/adepts that always wanted to explain in a simple way to their beloved kids a bit more about their work.

I hope you enjoy.

Update:

I would like to say a special thanks to Maína Worm Silva for fixing some typos in the portuguese version of the document. Should you also find a typo in any document language, please let me know ;)

Electronics for Kids – German Version
Electronics for Kids – English Version
Electronics for Kids – Portuguese Version

RTX RTOS Adaptation for LPC1343 and LPC1347

Hey mates! A few months ago, I was working on a project that required RTOS for ARM Cortex M3. Long story short, we ended up using the LPC1347 and the RTX RTOS

The problem was that, by that time, there was no port of that operating system or example code to run on LPCExpresso IDE.

I bet that you are thinking: (if you are actually understanding what I am talking about) Why are you using LPCExpresso when you can use other IDEs like IAR and solve that problem?

The thing is: I was using Linux. And not an average Linux. I had the brilliant idea to try the unstable Ubuntu 13.10 (that just got released by that time). The only IDE that was available for that OS was LPCExpresso (former CodeRed).

In order to make the OS work properly, I had to modify some of the main source code and also make some compiler configurations/ set some flags.

By the time I had to made this, there was no other solution available for this. So, in the case you had the same problem that I had, let me save you some time and provide you with this solution ;)

Download

LPC1347: Download of the project here.

LPC1343: Download of the project here.