Wiki Home | Recent changes | View source | Page history

Printable version | Disclaimers | Privacy policy

Not logged in
Log in | Help
 

Cloud Computing Made Easy®

Apache Hadoop Deployment for EC2

From Cloud Computing Wiki - Kaavo

Share/Save/Bookmark

Contents

Overview :

Like other Sample System Definitions provided with IMOD, the hadoop-multinode-amazon is provided as an example. Users are expected to customize the provided solutions for their own needs. The information assumes that the user has basic understanding of deploying and starting systems in IMOD; deploying systems from the sample templates and configuring them with certain required parameters. For detail instructions please watch the following 15 minutes video:


Please also check how to use Apache Hadoop (http://hadoop.apache.org/) . Hadoop is a large-scale distributed batch processing infrastructure. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores. Hadoop is also designed to efficiently distribute large amounts of work across a set of machines. We tested this setup using Fedora 8 AMI, however, you can use your own custom image with a different flavor/version of Linux. In case there are any issues with using a different flavor/version of Linux please post it on http://forums.kaavo.com. Also refer to the list of supported versions of Linux for monitoring Installing_Monitoring_Agents; the monitoring may not work for unsupported versions. What does it do? Deploy a fully functional Hadoop cluster with a single click. What deployment time Actions are included in the System Definition? Bring online fully functional Hadoop cluster with 3 servers, 1 is hadoop master server group having role master, and 2 are hadoop slave servers group having role slave .

What does it do?

Automatic Data Processing at Scheduled Intervals in the Cloud using Hadoop and Kaavo IMOD. By schedule this system at regular interval it will start automatically , process the new file and when everything is done , it will shutdown itself.

List of actions :

Need to specify volume-id,device-name and mount-path in the attache ebs volume section to attache master node with an ebs :

   <command type="ec2" name="attach-ebs-vol">[volume-id=][device-name=][mount-path=]</command>  

File system should be exists in the volume ie mkfs mannually done at least once after attached to the instance.

You will need to provide the value of the parameter for the following

   <parameter name="hadoop_tmp_dir" type="literal" value="put_hadoop_data_directory_path"/>

for eg : \/mnt\/data-store put your hadoop data directory path where hadoop will store process data.

You will need to provide the value of the parameter for the following

   <parameter name="hadoop_tmp_dir" type="literal" value="put_hadoop_data_directory_path"/>  

for eg : \/mnt\/data-store

put your hadoop data directory path where hadoop will store process data .

You need to provide the value of following parameters.

   <parameter name="user" type="literal" value="put_login_user_name"/>  
   <parameter name="password" type="literal" value="put_login_password"/>  
   <parameter name="systemName" type="literal" value="put_system_name"/> 

Formatting the name node :

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster.

   /usr/local/hadoop/bin/hadoop namenode -format

Testing :

download any book in the /mnt/data-store/newfiles folder for example cd /mnt/data-store/newfiles wget http://3rdparty-tools.s3.amazonaws.com/gutenburg/pg132.txt after a minute hadoop will process this files and move it to processedfiles folder.

Check the processed files

   /usr/local/hadoop/bin/hadoop dfs -ls
   /usr/local/hadoop/bin/hadoop dfs -ls processed_file_output
   /usr/local/hadoop/bin/hadoop dfs -ls processed_file_output/20110305
   /usr/local/hadoop/bin/hadoop dfs -cat processed_file_output/20110305-0923/part-r-00000

Retrieved from "http://wiki.kaavo.com/index.php/Apache_Hadoop_Deployment_for_EC2"

This page was last modified on 9 March 2011, at 05:16. Content is available under Copyright 2012 Kaavo,All rights reserved..


[Wiki Home]
Wiki Home
Guides and Tutorials
Kaavo Web Services
Solutions
Webinars
FAQs
Free Trial
Release Notes
Kaavo Forums
Join our Mailing List
Contact us
YouTube Channel
Follow on Linkedin
Follow on Twitter
Follow on Facebook
Watch IMOD Demo
Kaavo Home

View source
Discuss this page
Page history
What links here
Related changes

Special pages