Creating a Hadoop EC2 AMI

From Icbwiki

Jump to: navigation, search

Creating an AMI for Hadoop

There is currently no public AMI that matches the configuration required by the ICB therefore one must be created. The following describes the steps used to create an EC2 AMI for hadoop-0.20.0 running on ubuntu 9.0.4 with the sun jdk. This needs to be once for each architecture type desired (i.e., i386, x86_64).

 # start a base ubunutu instance corresponding to the architecture desired
 # either a "small" 32 bit version
 user@local $ ec2-run-instances --key campagne-lab ami-0d729464
 # or a "large" 64 bit version
 user@local $ ec2-run-instances --instance-type m1.large --key campagne-lab ami-1f749276
 # login as root to the instance
 # update the base packages (required)
 root@ec2 # apt-get update && apt-get upgrade -y
 # install the jdk
 # NOTE: JAVA_HOME can be "/usr/lib/jvm/java-6-sun" OR just "/usr"
 root@ec2 # apt-get install -y sun-java6-jdk
 # install the ec2 api tools
 root@ec2 # apt-get install -y ec2-api-tools
 # install other common packages
 root@ec2 # apt-get install -y screen
 root@ec2 # apt-get install -y emacs22
 root@ec2 # apt-get install -y zip
 root@ec2 # apt-get install -y ant
 root@ec2 # apt-get install -y subversion
 # install S3 Tools (http://s3tools.org/) - the ubuntu packaged version is too old
 # download tar file and install with "python setup.py install
 # s3cmd will get placed into /usr/local/bin (TODO: configure)
 # install groovy
 root@ec2 # cd /tmp
 root@ec2 # wget http://dist.groovy.codehaus.org/distributions/groovy-binary-1.6.3.zip
 root@ec2 # cd /usr/local
 root@ec2 # unzip /tmp/groovy-binary-1.6.3.zip
 root@ec2 # cd /usr/local/bin
 root@ec2 # ln -s ../groovy-1.6.3/bin/groovy .
 root@ec2 # ln -s ../groovy-1.6.3/bin/grape .
 root@ec2 # ln -s ../groovy-1.6.3/bin/groovyc .
 root@ec2 # ln -s ../groovy-1.6.3/bin/groovyConsole .
 root@ec2 # ln -s ../groovy-1.6.3/bin/groovysh .
 root@ec2 # ln -s ../groovy-1.6.3/bin/java2groovy .
 root@ec2 # ln -s ../groovy-1.6.3/bin/startGroovy .
 # install hadoop
 root@ec2 # cd /tmp
 root@ec2 # wget http://apache.securedservers.com/hadoop/core/hadoop-0.20.0/hadoop-0.20.0.tar.gz
 root@ec2 # cd /usr/local
 root@ec2 # tar zxvf /tmp/hadoop-0.20.0.tar.gz
 root@ec2 # ln -s hadoop-0.20.0 hadoop
 #
 # In /usr/local/conf/hadoop-env.sh:
 #   export JAVA_HOME=/usr/lib/jvm/java-6-sun
 #   export GROOVY_HOME=/usr/local/groovy-1.6.3
 #   export HADOOP_OPTS=-server
 #   export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no -o ServerAliveInterval=30"
 # set up the default environment - add the following to ~/.profile
 export JAVA_HOME=/usr/lib/jvm/java-6-sun
 export GROOVY_HOME=/usr/local/groovy-1.6.3
 export HADOOP_HOME=/usr/local/hadoop
 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
 # copy the keys up to the ec2 instance
 user@local $ scp -i campagnelab.pem -r ~/.ec2/*.pem root@ec2:/tmp
 # bundle the image
 root@ec2 # ec2-bundle-vol --cert /tmp/cert-XXX.pem --privatekey /tmp/pk-XXX.pem --user XXX --prefix hadoop-0.20.0-i386
 # upload to S3
 root@ec2 # ec2-upload-bundle --bucket icb-hadoop-images --manifest /tmp/hadoop-0.20.0-i386.manifest.xml --access-key XXX --secret-key XXX
 # register the image
 user@local $ ec2-register icb-hadoop-images/hadoop-0.20.0-i386.manifest.xml


  • TODO

Ganglia

apt-get install ganglia-monitor

 apt-get install -y libapr1-dev
 apt-get install -y libconfuse-dev
 apt-get install -y python-dev
 apt-get install -y rrdtool
 apt-get install -y librrd-dev
 expat
 apache2
 php5
 mkdir /etc/ganglia
 gmond --default_config > /etc/ganglia/gmond.conf
 cp gmetad.conf /etc/ganglia/
 mkdir -p /var/lib/ganglia/rrds
 chown nobody /var/lib/ganglia/rrds
Personal tools