Cloud Computing Made Easy®
N-Tier Guide System Definition
From Kaavo - Wiki
Kaavo’s App-centric approach
Kaavo has developed world’s first app-centric approach to manage infrastructure and middleware on the cloud. A complete n-tier system is defined in an XML file, called the System Definition file. This definition is then deployed into Kaavo’s IMOD application that orchestrates the execution of the overall system. The system definition file has two components one is the deployment time and other is the run-time. The deployment section has all the information about provisioning resources, deploying and configuring the middle-ware, and deploying and configuring application components to bring the application online; the run-time section has the workflows to be executed in response to defined events to make sure application service levels are met during run-time automatically without human intervention. See the figure below for the structure for the System Definition File.
Figure 1: Structure of System Definition File used by IMOD
The end result is that you have a single-click approach to completely start a complex system that consists of multiple tiers (e.g. a database tier, an application tier, a web tier) as well as the fire-wall rules between the tiers etc. And once the system is deployed you have auto-pilot functionality for streamlining and automating production support for your application. For details on the structure of the System Definition file please also refer to the latest XSD for the system definition file.
Designing an N-tier System Definition
You define your n-tier system via an n-tier System Definition file. The task of designing the System Definition file can broadly be divided into three steps:
1. Define your system artifacts - tiers, server types and servers 2. Define the tasks to be done to configure the servers - commands and actions 3. Define the events and their corresponding handlers to to perform custom workflows in response to events
Finally, you need to do the wiring of the actions at proper points in the activity graph of vital-events (e.g. startup and shutdown etc.) and custom-events (e.g. scale-up, scale-down and recovery) in the lifecycle of the n-tier system.
Structural Aspects
First, let us discuss the structural aspects of the n-tier System Definition file, related to the system artifacts and the vital-events.
N-Tier System
An n-tier system is composed of one or more tiers. Each tier is composed of groups of servers with a specific role - called servertypes. Each servertype has servers, which may be provided by multiple cloud providers. Currently we support Amazon EC2 provider - later we plan to include support for other providers as well.
This structure is appealing to the notion of an n-tier system - where there are multiple tiers and each tier has multiple servers performing specific responsibilities.
Tier
A tier has one of more “serverTypes” (described below.) A tier also has a unique name, which is used to refer to it in the n-tier System Defintion file. A tier has a notion of “displayIndex” and “order.” “DisplayIndex is used to render the graphical view of the n-tier system, showing the tiers in a particular stacking order” in the graphical user interface. “Order” is used to determine the order in which this tier will be instantiated and set-up by the Kaavo’s n-tier engine during execution. This helps decide which tier comes up first - so that a dependent tier can assume the existence of another tier when it comes up in its turn. With the addition of the facility to define custom events, we have changed the way to look at the scaling and recovery of the system. Scaling and recovery of the system can now be defined for each server-type by defining events and handling them with corresponding handlers, more on this later. The scalable attribute on the tier element is deprecated now.
ServerTypes and Servers
We group servers with a particular responsibility in a servertype - a server type has a role attribute which is unique in the scope of a tier. Servers are sub-classed by the provider of the server – e.g. awsServer for the Amazon EC2.
The system, tier, and servers have a lifetime and they have some vital events in the course of their lifetime. These vital events are startup, shutdown, scale-up and scale-down.
Runtime Aspects – Kaavo’s N-Tier Orchestration Engine
Kaavo’s n-tier orchestration engine, which is part of the IMOD application, takes the n-tier System Definition file, and orchestrates it according to all the instructions in it.
The engine responds to events and executes a workflow to handle those events. These events can come, for example, from the user interaction on the n-tier dashboard to start/stop a system. In addition, the events can also come from monitoring system to trigger scale-up/scale-down workflows.
For example, when the engine encounters the startup event on a server defined in the input n-tier System Definition file, it invokes the startup sequence on the described and guides through a set of predefined states until the server is started, is accessible, and optionally configured by executing some post-startup commands. The post-startup commands could be any arbitrary shell-script to be run on the server just started, or it can be some cloud provider specific command (e.g. in the case of AWS, associating an elastic IP, attaching EBS volume from a snapshot or otherwise, or downloading a set of files from S3 to that instance.)
We can now invoke the custom-event from the UI manually by pressing the events button; if the handler for the event is a scaling activity or a recovery activity the corresponding actions will be done by the n-tier engine and the system will behave likewise, more on this later.
Actions
Actions are units of activities which have a unique name which is used to refer to it from points in the activity graph to be performed on each of the target set of servers. To be able to refer to a set of servers in the description we use an expression (e.g. [tier=aTierName][serverrole=aServerTypeRole]) - this effectively returns a set of servers in the mentioned named tier and with the server type with the mentioned role in the expression. Under the new implementation of the n-tier engine, where we can scale the (tiers+server-types) by a user defined number of servers, we have extended the expression, where we can refer to the target servers which are used for scaling by using the string as below: [tier=aTierName][serverrole=aServerTypeRole][scalingserver] We have also brought in the recovery event in the current implementation – as case where a server has died and needs to be replaced by a new server. To be able to refer to the server being recovered, we have introduced the expression [tier=aTierName][serverrole=aServerTypeRole][recoveringserver].
In case you want to delete the generated file on the server side after executing an action, you can do it by setting delete-on-exit flag on the action. This is helpful in case there is sensitive information like username or password in the file which shouldn't be left on the server. See the example below.
<action name="start-apache" execute="true" delete-on-exit="true" target="[tier=web_tier][serverrole=default]">
<scriptName>apache.sh</scriptName>
<scriptPath>/root</scriptPath>
<scriptTemplate type="inline">
<![CDATA[
#!/bin/sh
/etc/init.d/httpd start
.............
.............
</action>
Executing Actions on Specific Server
The syntax for executing actions on specific server is [tier=app][serverrole=jira][index=2] e.g. if there are 5 jira servers in the app tier, specifying index=2 will execute the action on the second app server. Users can execute actions on all servers in the group by not adding the index, e.g. [tier=app][serverrole=jira] will execute actions on all 5 jira servers in the app tier.
Parameterized Templates
An action in its core is a velocity template (http://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html) which has access to some runtime system artifacts (servers mentioned in the action target) and some user defined parameters to generate a script/configuration file that gets transferred to each of the target servers and optionally gets executed. This way, we can easily create a configuration file for a newly launched server and upload this configuration file to the server. We can also create a custom shell-script this way, upload the shell script to the newly launched server and execute it there.
We provide some best practice templates for some common scripts (e.g. MySQL cluster configuration scripts, JBOSS cluster scripts etc.). However the user also can write inline templates for the script/configuration in the action definition. Templates get the parameters defined in the action definition as part of its context. These parameters can be of type “literal”, which implies that the string value of parameter will be used as is in the velocity context, or it can be of type “serverref”, which will be evaluated to get a set of servers from the System Definition. For each of these servers we can access their properties, namely, PrivateDNS and PublicDNS.
As of the current implementation, we can look at the actions as typical global functions in any standard programming language, which are defined with some parameters and can be invoked at runtime. In our case we can defined the action prototype in the actions section and when being invoked by a command (see below in the Commands section), we can redefine the target of the action and their parameters. In addition to these parameters certain parameters are passed implicitly to all templates, namely, CurrentTarget(the node on which the script is being created), AllTargets(set of all nodes that are target of the action) and OtherTargets(set of all nodes except the CurrentTarget).
List of Server Attributes Available to Velocity Templates
Please also refer to the common server attribute names. Following is the list of all server attributes that are available for each integrated cloud provider for dynamically generating scripts, code, or configuration files in the context of dynamic cloud infrastructure for the application or workload.
- AWS Server
- $node.publicDNS
- $node.privateDNS
- $node.publicIP
- $node.privateIP
- $node.index : The index of the particular server instance in the same role/servertype
- $node.roleName
- $node.name
- $node.awsInstanceId
- $node.rootDeviceType
- $node.amiImage
- $node.kernelImage
- $node.ramDisk
- $node.instanceType
- $node.securityGroup
- $node.privateKeyName
- $node.availabilityZone
- $node.region
- $node.platform
- Rackspace Server
- $node.publicIP
- $node.privateIP
- $node.index : The index of the particular server instance in the same role/servertype
- $node.roleName
- $node.name
- $node.serverId;
- $node.imageId;
- $node.flavorId;
- $node.platform
- Eucalyptus Server
- $node.publicDNS
- $node.privateDNS
- $node.publicIP
- $node.privateIP
- $node.index : The index of the particular server instance in the same role/servertype
- $node.roleName
- $node.name
- $node.eucaInstanceId
- $node.emiImage
- $node.kernelImage
- $node.ramDisk
- $node.instanceType
- $node.securityGroup
- $node.privateKeyName
- $node.availabilityZone
- $node.platform
- IBM Server
- $node.publicIP
- $node.index : The index of the particular server instance in the same role/servertype
- $node.roleName
- $node.name
- $node.instanceId;
- $node.imageId;
- $node.instanceType;
- $node.platform
- Terremark Server
- $node.publicIP
- $node.index : The index of the particular server instance in the same role/servertype
- $node.roleName
- $node.name
- $node.instanceId;
- $node.templateId;
- $node.instanceType;
- $node.platform
- $node.cpu;
- $node.vpu
- $node.memory;
- Physical Server
- $node.host (Note host can be IP, or DNS for the server)
Common Server Attribute Names
To allow reuse of actions we have implemented common names for server attributes which are supported by all integrated cloud providers but in some cases have slightly different name. If you want to reuse your Actions using dynamic server attributes across providers we recommend using the common name for the attribute.
| Common Name | Amazon EC2 | Rackspace Cloud | IBM Cloud | Eucalyptus Cloud | Terremark vCloud Express |
|---|---|---|---|---|---|
| publicIP | publicIP | publicIP | publicIP | publicIP | publicIP |
| privateIP | privateIP | privateIP | N/A | privateIP | privateIP |
| instanceId | awsInstanceId | serverId | instanceId | eucaInstanceId | serverId |
| imageId | amiImage | imageId | imageId | emiImage | templateId |
| instanceType | instanceType | flavorId | instanceType | instanceType | vpu, memory |
Commands
Commands provides the context for executing actions. Using programming analogy, Actions are methods/procedures, whereas as Commands are specific calls with parameters. Actions are invoked during the execution of a “command”- which you will find in the post-startup definitions of the artifact definitions or event-handler sections. Post-startup section is executed after all the servers have been launched. Thus, when the generic activities of starting up of the respective artifact are complete, these commands are taken up for execution and performed on each of the target servers. Commands of type action have a global context – i.e. they can be executed from any point of the post-startup of any artifact even if the target of the action is a set of different artifacts. For example, we have an action to create MySQL databases on the database tier - the target of this action are the servers on the database tier and yet this action can be invoked from the post-startup block of the web tier. The post-scale-up and pre-scale-down are deprecated now, as the scale-up and scale-down are now definable in the custom events section and have a generic command-action mapping.
Commands are like function invocations, where we can override the target of the action being invoked and the parameters for creating the script/configuration to be used for the corresponding action. The action thus redefined, will act on the new set of target servers instances with the changed script/configuration rather than the one defined in the global actions pool.
Some commands can be relevant only in the context of a server (specifically awsServer); these commands are for associating an elastic IP address to the enclosing server definition, associating an EBS volume to the server either by creating from a EBS snaphot or a predefined volume, or setting up the server for S3 operations. The post-startup section in <awsServer> can have commands of type “script”, “ec2”, or “s3” only.
Commands are taken up in the order that they are described in the post-startup or event-handler sections and executed accordingly.
Events
Since the release of IMOD 1.4.5 we have introduced the feature to define custom events and describe their corresponding handling mechanism. Events are elements in the system definition which define a specific message to the n-tier engine about a state of a user's system semantically relevant only to their system, and informs the engine as to how to react to that message. The message can be manually sent to the engine through the IMOD Application-Centric dashboard UI or may be triggered from an external monitoring system which may be looking at the user's n-tier system servers. In either case, the message that is sent to the n-tier engine has information about the event name, the relevant system, identity of the IMOD customer who owns the system and any other information as a may be required. This message elements define the context of the event, which the n-tier engine reacts to. The behavior of the engine on receipt of such a message is defined by the handler of the event – the handler can be of 4 types: scaleup, scaledown, recovery or simple. Thus, the event maps to the course of activities that may be required by the user for their individual systems. The scaleup, scaledown and recovery handlers are guided by corresponding workflows specialised to do the needful. All handlers have their sequence of commands (invocation of actions defined in the global pool of actions) which are part of the corresponding activities. In case of the scaleup handler (where new servers are added to the servertype defined in the scope of the event) and recovery handler (where dead servers are replaced by similar servers to the servertype defined in the scope of the event), the commands are executed as post-actions to the workflow. In case of the scale-down handler (where servers are killed in the pool of servertypes defined in the scope of the event) the commands are executed as pre-actions to the workflow. In the case of the simple handler, there are no workflows and the commands are executed sequentially. In release 1.5 we introduced enhanced format for configuring custom events, using the new format we can configure scale up, scale down, recovery and custom events, we deprecate all the old format of events, scale-up, scale-down, recovery, and simple event.
Example Cases
Now we go through the possible cases while defining the system.
Defining the ntier section
<ntier>
<workflow>
<startup timeout="1500"/>
<shutdown timeout="900"/>
</workflow>
The ntier is primarily a container for tiers. It has a generic startup and shutdown sequence that is performed by the n-tier engine on respective message from the user - its main activity is to start or stop the contained tiers.
The generic startup and shutdown activity sequence has a timeout to prevent the corresponding activities running for too long a time; which you can customize the activity timeout with the timeout attribute - each of the artifacts (ntier, tier and server have their timeouts. Based on your own needs yo may want to put your own timeouts). The default timeout is 1800 seconds.
Startup Tag
The startup sequence has provision for defining user-defined actions to be invoked after the generic startup activities are done. For this you will need to add a post-startup element under the startup definition. This post-startup block can have a ordered sequence of commands with reference to actions defined in the actions section(described later).
<startup timeout="1500"/>
<post-startup>
<command type="action" name="notify-me"/>
</post-startup>
</startup>
These user-defined actions will be executed after the n-tier generic startup sequence is complete.
Defining the tier section
Example:
<tier displayindex="1">
<name>web_tier</name>
<workflow>
<startup timeout="300" order="3">
<post-startup>
<command type="action" name="create-apache-jira-conf"/>
<command type="action" name="start-apache"/>
</post-startup>
</startup>
<shutdown/>
</workflow>
You can put several tiers inside the ntier definition. Each tier has a displayIndex attribute which helps decide which tier is displayed before the others after it on the UI. The tier has a name which has to be unique among all tiers in this descriptor. This will help us locating servers contained in this tier later on in actions. A tier, like the n-tier, has a generic startup and shutdown sequence, with customizable timeouts. A tier also has the provision of mentioning the order in which the startup of the tier will be done among other tiers in this descriptor. Based on the value of the order attribute of the startup sequence, the n-tier startup sequence starts the tier. The main responsibility of the tier startup or shutdown sequences are to startup or shutdown the servers contained in it. The tier configuration also has to capability to execute user-defined actions in the post-startup definition of the startup definition. It is similar to the post-startup of the n-tier; however it is executed after the generic activities of the tier startup are done.
Defining the server types and servers
<tier displayindex="2">
<name>app_tier</name>
<workflow>
…
</workflow>
<serverTypes>
<serverType role="default" min="2" max="4">
<awsServer>
<name>app node</name>
<awsaccount>147404622121</awsaccount>
<securityGroup>Kaavo-Demo</securityGroup>
<keypair>demo.kaavo.us</keypair>
<machineIdentifier>ami-87c522ee</machineIdentifier>
<instanceType>m1.small</instanceType>
<parameters></parameters>
<startupCount>2</startupCount>
<workflow>
<startup timeout="800">
<post-startup>
<command type="script" name="myscript.sh">
<![CDATA[
#!/bin/sh
rm -f /var/www/html/php-colab/includes/settings.php
cd /var/www/html/php-colab/includes/
wget http://php-collab.s3.amazonaws.com/settings.php
chmod 755 /var/www/html/php-colab/includes/settings.php
]]>
</command>
</post-startup>
</startup>
<shutdown />
</workflow>
</awsServer>
</serverType>
</serverTypes>
…
</tier>
<tier displayindex="3">
<name>db_tier</name>
<workflow>
...
</workflow>
<serverTypes>
<serverType role="manager" min="1" max="1">
<awsServer>
<name>mysql-manager-node</name>
<awsaccount>YOUR AWS ACCOUNT ID</awsaccount>
<securityGroup>YOUR AWS SECURITY GROUP</securityGroup>
<keypair>YOUR AWS SERVER LAUNCH PVT KEYNAME</keypair>
<machineIdentifier>ami-005eba69</machineIdentifier>
<region>us-east-1</region>
<availabilityZone>us-east-1a</availabilityZone>
<!— Note when picking EU region please make sure to pick the appropriate AMI for EU region -->
<instanceType>m1.small</instanceType>
<parameters></parameters>
<startupCount>1</startupCount>
<workflow>
<startup timeout="300"/>
<shutdown/>
</workflow>
</awsServer>
</serverType>
<serverType role="ndbd" min="2" max="4">
<awsServer>
<name>mysql-ndb-node</name>
<awsaccount>YOUR AWS ACCOUNT ID</awsaccount>
<securityGroup>YOUR AWS SECURITY GROUP</securityGroup>
<keypair>YOUR AWS SERVER LAUNCH PVT KEYNAME</keypair>
<machineIdentifier>ami-005eba69</machineIdentifier>
<region>us-east-1</region>
<availabilityZone>us-east-1a</availabilityZone>
<instanceType>m1.small</instanceType>
<parameters></parameters>
<startupCount>2</startupCount>
<workflow>
<startup timeout="300"/>
<shutdown/>
</workflow>
</awsServer>
</serverType>
...
</serverTypes>
</tier>
Server Types
Server types are, as discussed earlier, a grouping of servers in a tier based on roles played by the servers. They are identified by the role name in the context of the containing tier. Currently we have two other required attributes, min and max, which are respectively the minimum and maximum count of a type of server in the containing tier. Scale-up is limited by the max count of servers in this type, whereas scale-down is limited by the min count.
Servers
Servers are the atomic unit of artifacts in the n-tier system definition and represent the actual machine instances to comprise the n-tier system. Currently we support AWS EC2 and Rackspace server instances, the awsServer element has all the required elements to be able to launch an ec2 instance. The rackspaceServer element has all the required information to be able to launch Rackspace server.
The sub-elements of the awsServer element are pretty self-explanatory for one who is conversant with EC2 parlance. Though ec2 allows for multiple security groups while launching an ec2 instance, we in the n-tier engine accept only one security group. The security group in the n-tier is modified to authorize access for the IMOD maintenance security group, so that n-tier engine can configure, monitor, and manage the servers in the system. There is also a startup Count element which defines the number of servers that will be launched at the startup of this server definition. Rackspace servers also follow similar structure, and it is easy to define a system with severs running on multiple providers for robustness. According to the new region support of EC2, we have implemented the choice of region where the aws server will be launched by the ntier engine. This is an optional element in the definition file. If the user does not provide any option of region, the server will be launched in the US-EAST-1 region.
Similar, to the n-tier and tier counterparts, the server also has the generic startup and shutdown sequences with customizable timeouts. Similar to the other artifacts, the awsServer also has a post-startup section where you can associate actions that you define globally. These will be executed after the generic server startup activity has completed.
See below for a reference to server tag structure
AWS Server
<awsServer>
<name>lb</name>
<awsaccount>147404622121</awsaccount>
<securityGroup>Kaavo-Demo</securityGroup>
<keypair>demo.kaavo.us</keypair>
<machineIdentifier>ami-70c42319</machineIdentifier>
<instanceType>m1.small</instanceType>
<parameters></parameters>
<startupCount>2</startupCount>
<workflow>
<startup timeout="1200"/>
<shutdown />
</workflow>
</awsServer>
Rackspace Server
<rackspaceServer>
<name>jira rackspace node</name>
<rackspaceAccount>Rackspace-Cloud-username</rackspaceAccount>
<imageIdentifier>8</imageIdentifier>
<flavorIdentifier>1</flavorIdentifier>
<parameters/>
<startupCount>1</startupCount>
<workflow>
<startup timeout="3000">
<post-startup>
<command type="action" name="setup-monitoring"/>
<command type="action" name="install-jdk"/>
</post-startup>
</startup>
<shutdown/>
</workflow>
</rackspaceServer>
Eucalyptus Server
<eucalyptusServer>
<name>apache node</name>
<eucalyptusaccount></eucalyptusaccount>
<securityGroup></securityGroup>
<keypair></keypair>
<machineIdentifier></machineIdentifier>
<kernelImage></kernelImage>
<ramDisk></ramDisk>
<instanceType></instanceType>
<parameters/>
<startupCount>1</startupCount>
<availabilityZone></availabilityZone>
<workflow>
<startup timeout="3000">
<post-startup>
<command type="script" name="install_zabbix.sh"> <![CDATA[
apt-get update
apt-get install -f -y --force-yes perl
apt-get install -f -y --force-yes zabbix-agent
sed -i 's/Server=localhost/Server=monitor2.kaavo.org/' /etc/zabbix/zabbix_agentd.conf
/etc/init.d/zabbix-agent restart
]]> </command>
</post-startup>
</startup>
<shutdown/>
</workflow>
</eucalyptusServer>
IBM Server
<ibmServer>
<name></name>
<ibmAccount></ibmAccount>
<keypair></keypair>
<imageIdentifier></imageIdentifier>
<instanceType></instanceType>
<location>1</location>
<parameters/>
<startupCount>2</startupCount>
<workflow>
<startup timeout="9000">
<post-startup>
<command type="action" name="install-jdk"/>
</post-startup>
</startup>
<shutdown/>
</workflow>
</ibmServer>
Terremark Server
<terremarkServer>
<name>jira-node</name>
<terremarkAccount/>
<password/>
<templateIdentifier/>
<cpu/>
<memory/>
<openPorts>
<openPort protocol="TCP" fromPort="8080" toPort="8080"/>
</openPorts>
<parameters/>
<startupCount>1</startupCount>
<workflow>
<startup timeout="3000">
<post-startup>
<command type="action" name="install-jre"/>
</post-startup>
</startup>
<shutdown/>
</workflow>
</terremarkServer>
Physical Server
Physical server is treated as a cloud provider so you can add keys to the phyiscal server account in the profile page. Workload can be managed as a single system across hybrid deployments consisting of physical servers, private clouds, and public clouds.
<physicalServer>
<name>jira node</name>
<physicalAccount>my-physical-name</physicalAccount>
<privateKey>my-private-key-name</privateKey>
<sshUser>root</sshUser>
<host>555.106.199.555</host>
<parameters/>
<workflow>
<startup timeout="3000">
<post-startup>
<command type="action" name="install-jre"/>
</post-startup>
</startup>
<shutdown/>
</workflow>
</physicalServer>
In case the server uses password instead of key pair authentication, you can replace the privateKey tag with password tag e.g.:
<password>mypassword</password>
For adding AKI or ARI information for the servers on EC2 please use the following two tags, they are optional tags and can be added to the awsServer tag for passing additional information.
<kernelImage>aki-20c12649</kernelImage> <ramDisk>ari-21c12648</ramDisk>
For Rackspace server make sure to give each server type unique name per Rackspace account to avoid name conflict in the Rackspace side. Use the following table to specify <imageIdentifier> in <rackspaceServer>.
Image Name Rackspace Image ID Debian 5.0 (lenny) 4 Fedora 10 (Cambridge) 5 CentOS 5.3 7 Ubuntu 9.04 (jaunty) 8 Arch 2009.02 9 Ubuntu 8.04.2 LTS (hardy) 10 Ubuntu 8.10 (intrepid) 11 Red Hat EL 5.3 12 Fedora 11 (Leonidas) 13 Red Hat EL 5.4 14 Fedora 12 (Constantine) 17 Ubuntu 9.10 (karmic) 14362 CentOS 5.4 187811
Also use the following table to specify <flavorIdentifier> for Rackspace
Flavor Name Flavor ID 256MB server 1 512MB server 2 1GB server 3 2GB server 4 4GB server 5 8GB server 6 15.5GB server 7
Deployment Timeout Handling
In version 1.8, we added ability to handle timeout conditions during deployment and server configuration by adding onTimeout attribute in the server startup workflow tag <startup timeout="300" onTimeout="Continue"/>
- onTimeout="Abort" means the server will be shutdown and the rest of the system will continue
- onTimeout="Continue" means the server will not be shutdown but still the rest of the system will continue
- If we do not specify any onTimeout attribute, it will act according to the current default behavior, i.e. the tier and the system deployment will timeout and users have to manually abort the system and retry the deployment after addressing the issue responsible for the workflow timeout. See the following example of how to configure this within the serverType tag:
<serverType role="loadbalancer" min="1" max="8">
...............................
...............................
...............................
<startupCount>2</startupCount>
<workflow>
<startup timeout="1200" onTimeout="Continue"></startup>
<shutdown/>
</workflow>
...............................
...............................
</serverType>
Deployment Error Handling
In version 2.5, we added ability to handle server error conditions during deployment and server configuration by adding onError attribute in the server startup workflow tag <startup timeout="300" onError="Continue"/>
- onError="Continue" means the server will not be shutdown but still the rest of the system will continue
If we do not specify any onError attribute, it will act according to the current default behavior, i.e. the tier and the system deployment will be in error state and users have to manually abort the system and retry the deployment after addressing the issue responsible for the error. See the following example of how to configure this within the serverType tag:
serverType role="loadbalancer" min="1" max="8">
...............................
...............................
...............................
<startupCount>2</startupCount>
<workflow>
<startup timeout="1200" onError="Continue"></startup>
<shutdown/>
</workflow>
...............................
...............................
</serverType>
AWS EBS Boot Instance
Starting version 1.9 we added support for EBS instances using EBS for booting. EBS boot instances, can be included in the n-tier system by adding optional information to the awsServer tag. Setting terminateOnStopInstance false ensures that the server is stopped and not terminated
<serverType role="default" min="1" max="2" > <awsServer> .................. .................. <securityGroup></securityGroup> <keypair></keypair> <rootDeviceType>ebs</rootDeviceType> <terminateOnStopInstance>false</terminateOnStopInstance> <machineIdentifier>ami-df6489b6</machineIdentifier> .................. .................. </serverType>
- Note if you specify an EBS instance in the system definition and start the system, by default instance is launched the first time (i.e. new instance id and IP is assigned). On subsequent starting of the system EBS boot instances in the system are started with the same instance id that was assigned during the first time launching of the system. If you want to terminate or bundle the EBS boot instance you have to use the DashBoard. Unlike stopping, terminating the instance removes the results releasing of the IP and instance ID for the instance. Current support of EBS boot instances doesn’t support scale up or auto-recovery
Monitoring Custom Images (images not provided by Kaavo)
Starting version 1.9 we added automatic installation of monitoring agents on servers with Linux OS. This will allow users to use any custom Linux images without being limited to the use of Kaavo provided images for Linux OS. All flavors of Linux are supported. By default, any time a Linux OS based server is launched from IMOD N-Tier, IMOD N-Tier Engine checks if the monitoring agent is installed on the server, if the agent is not installed, IMOD N-Tier Engine installs the agent. If users don’t want the monitoring agent to be installed on the server, they can disable the default behavior by adding the optional flag agentSetup="manual” for the serverType
<serverType role="default" min="1" max="4" agentSetup="manual">
- Note it is not possible to set flags on individual servers. serverType(a group of servers of the same role) is the lowest granularity to specify flags
Debugging Custom Actions on Servers
IMOD logs exit code for the actions executed on the servers over SSH in the application centric system logs accessible from the LOG tab for the deployed systems. This gives users more visibility to the configuration and run-time management tasks. Also for debugging users can add the verbose flag to display up to last 10 lines of the message generated from the execution of actions on the servers in the application centric logs. Detailed messages are logged on the servers in the same directory where the actions were executed in the log files for the corresponding actions (<action-name>.log). To disable both exit codes and the messages from actions the value none has to be used for logMode .
<serverType role="default" min="1" max="4" logMode="verbose">
Connecting to Servers as a non-root User
Some images don’t allow users to connect as root user over SSH. The non-root username can be configured for IMOD engine to connect to the server by setting the sshUser attribute in the server tag.
<serverType role="default" min="1" max="2" > ............................ ............................ <sshUser>ubuntu</sshUser> ............................ ............................ </serverType>
Manage Servers without SSH Connectivity
Sometime users only want to launch the servers and don’t want to connect to the server over SSH or have firewall rules to not allow IMOD to connect to certain servers over SSH. Users can now bring up multi-server systems using IMOD without IMOD engine waiting for SSH connectivity by just adding the optional checkSSH=”false” flag to the startup timeout in the server tag
<serverType role="default" min="1" max="2">
<awsServer>
............................
............................
<startupCount>1</startupCount>
<workflow>
<startup timeout="300" checkSSH="false">
............................
............................
</serverType>
Actions
Actions have a unique name, a target (written as an expression which resolves to a set of servers at execution time) and a Boolean flag: execute, that indicate whether the action will evaluate to an executable script if it is true, or a server configuration file. The script or configuration file is created dynamically and is saved with the name scriptName on the server path scriptPath and will contain the contents evaluated by replacing the values of the parameters in the template scriptTemplate. Parameters are elements with a name referenced in the scriptTemplate, and has a value. The type of a parameter is either a literal or a server expression which is resolved to a set of servers.
<action name="grant-mysql-phpcolab" execute="true" target="[tier=db_tier][serverrole=sql]">
<scriptName>GrantPhpDbAccess.sh</scriptName>
<scriptPath>/root</scriptPath>
<scriptTemplate type="inline">
<![CDATA[
#!/bin/sh
#foreach ($clientNode in $SqlClientNodes)
/usr/local/mysql/bin/mysql -uroot -p${mysqladminpassword} -e "GRANT ALL PRIVILEGES ON ${appdb}.* TO '${appdbuser}'@'${clientNode.PrivateDNS}' IDENTIFIED BY '${appdbpassword}'"
#end
]]>
</scriptTemplate>
<parameters>
<parameter name="mysqladminpassword" type="literal" value="passwd" />
<parameter name="appdb" type="literal" value="php_collab" />
<parameter name="appdbuser" type="literal" value="phpcollab" />
<parameter name="appdbpassword" type="literal" value="apppasswd" />
<parameter name="SqlClientNodes" type="serverref" value="[tier=app_tier][serverrole=default]"/>
</parameters>
</action>
Events
The four types of handlers are described in the following example: Note these following events format has been deprecated and is replaced with enhanced event monitoring format. Please use the new format which allows user to define any of the following custom events type plus more using a single standard format.
<events>
<event name="jira-service-died" scope="[tier=jira_tier][serverrole=default]">
<handler type="simple">
<command type="action" name="restart-jira" target="[context=event][param=instanceid]"/>
</handler>
</event>
<event name="jira-server-died" scope="[tier=jira_tier][serverrole=default]">
<handler type="recovery" timeout="600" server-to-recover="[context=event][param=instanceid]">
<command type="action" name="start-jira" target="[tier=jira_tier][serverrole=default][recoveringserver]"/>
</handler>
</event>
<event name="jira-tier-overloaded" scope="[tier=jira_tier][serverrole=default]">
<handler type="scaleup" timeout="600" scalecount="1">
<command type="action" name="start-jira" target="[tier=jira_tier][serverrole=default][scalingserver]"/>
</handler>
</event>
<event name="jira-tier-underused" scope="[tier=jira_tier][serverrole=default]">
<handler type="scaledown" timeout="600" scalecount="1">
</handler>
</event>
</events>
The “jira-service-died” is an event, defined in the scope of the tier(jira_tier) and the servertype(default) and has a handler of type simple; this implies that the commands defined in the handler will executed in sequence. The action referenced here “restart-jira” has the target overridden by a server expression which takes its reference from the event context. The expression [context=event][param=instanceid] will look for a parameter named instanceid in the event context, and will resolve to a set of servers from the scope of the given event as it is a target of an action. Such an expression will expect a comma-separated string of AWS instanceids and will pick the respective servers from the scope of this event. The “jira-server-died” is an event, defined in the scope of the tier(jira_tier) and the servertype(default) and has a handler of type recovery; this implies that a workflow will replace the server defined in the expression mentioned in the handler attribute: server-to-recover, and as post-actions of the workflow will execute the sequence of commands defined in the handler. The “jira-tier-overloaded” is an event, defined in the scope of the tier(jira_tier) and the servertype(default) and has a handler of type scaleup; this implies that a workflow will increase the count of servers in this scope and configure them by commands defined in the post-startup section of the scope servertype. The count increase by the value of the scalecount attribute of the handler. Also, as part of the post-scaleup actions, the commands defined in the handler section will be executed sequentially. The “jira-tier-underused” is an event, defined in the scope of the tier(jira_tier) and the servertype(default) and has a handler of type scaledown; this implies that a workflow will decrease the count of servers in this scope by a value of the scalecount attribute of the handler. Also, as part of the pre-scaledown actions, the commands defined in the handler section will be executed – here there are no commands defined. Note you may give any name to your custom events as long as name is unique within the system definition. Different systems can have same event names.
Event Format Examples
Enhanced format for configuring custom events
<event name="custom-scaleup" description="Jira server custom scaleup" type="custom">
<handler timeout="1200">
<pre-process>
<!—can have a sequence of actions here. they will be executed sequentially before start/stop/recoverServers-->
<startServers>
<serverType role="[tier=jira_tier][serverrole=default]" count="1" addToEvent="true"/>
</startServers>
</pre-process>
<process>
<command type="action" name="start-jira" target="[tier=jira_tier][serverrole=default][scaleupserver]"/>
</process>
</handler>
</event>
<event name="custom-recovery" description="Jira server is died" type="custom">
<handler timeout="1200">
<pre-process>
<!—can have a sequence of actions here. they will be executed sequentially before start/stop/recoverServers--
<recoverServers>
<server server-to-recover="[context=event][param=instanceid]"/>
</recoverServers>
</pre-process>
<process>
<command type="action" name="start-jira" target="[tier=jira_tier][serverrole=default][recoveringserver]"/>
</process>
</handler>
</event>
<event name="custom-scaledown" description="Jira server custom scaledown" type="custom">
<handler timeout="1200">
<pre-process>
<!—can have a sequence of actions here. they will be executed sequentially before start/stop/recoverServers. For example ,<command type="action" name="xyz" target="[tier=jira_tier][serverrole=default [scaledownserver]"/> -->
<stopServers>
<serverType role="[tier=jira_tier][serverrole=default]" count="1"/>
</stopServers>
</pre-process>
<!—no process section because we don’t need it -->
</handler>
</event>
“custom-scaleup” is an event. The attribute type=”custom” indicates that this is follows the enhanced format. This event can act on different server types in different tiers in different ways. Hence, <event> tag does not have any scope attribute defined. The <handler> tag does not have any type attribute – the nature of the handler is completely defined by its body. It contains a couple of new tags, namely, <pre-process>, <process>, <startServers>, etc. The <pre-process> tag is used for grouping certain operations. <startServers> and a couple of nested <serverType> tags represent the operation of starting/stopping a number (specified by the count attribute) of new servers belonging to a certain role (specified by the role attribute). The <process> tag contains a sequence of actions to be executed after operations specified by <pre-process> are executed. Hence, the handler for “custom-scaleup” performs the following:
- Starts one server of role “default” of tier “jira_tier”. And stops one server of the same role.
- Executes the action “start-jira “on the newly started server (scaleupserver).
- Adds the new server to the configured event as the attribute addToEvent is set to “true”.
On the contrary, please note that a recovery event is automatically associated with the recovered instance and stopped instances are automatically deleted from the events they belong to.“custom-recovery” is an event whose handler performs auto-recovery of dead servers. It uses the new tag <recoverServer>. <recoverServers> and a nested <server> tag represent the operation of recovering server specified by the attribute server-to-recover. Hence, the handler for “custom-recovery” performs the following:
- Recovers the dead server.
- Executes the action “start-jira” on the recovered server.
“custom-scaledown” is an event whose handler performs scaledown of servers. It uses the new tag <stopServer> for stopping servers. To refer to servers that are going to be scaled down the selector [scaledownserver] is used e.g. target="[tier=jira_tier][serverrole=default][scaledownserver]"
Enhanced format still does not support the event context based target expressions such as “[context=event][param=instanceid]" in actions/commands. In the same event the same server role should not appear in both startServers and stopServers blocks.
Amazon Specific Commands
Apart from the script like actions, we also provide cloud-provider specific commands that can be executed in the post-startup sections. For example, here are the commands supported for the Amazon cloud:
Associate an elastic IP address to a running instance
Example:
<post-startup>
<command type="ec2" name="associate-ip">
174.129.251.111
</command>
...
</post-startup>
Note: Since you can have an elastic IP to only one instance at a time, please ensure that the above command definition is done in a server definition which results in only one server instance at runtime to avoid any unpredictable consequences.
Attach an EBS Volume to a running instance and mount it as a device on some path
Example:
<post-startup>
<command type="ec2" name="attach-ebs-vol">
[volume-id=vol-3de40054][device-name=/dev/sdh][mount-path=/mnt/apache]
</command>
...
</post-startup>
Note: Since you can attach a volume to only one instance, please ensure that the above command definition is done in a server definition which can result in only one server instance at runtime to avoid any unpredictable consequences. Also, ensure that the volume is formatted, as we do not do the formatting yet. Don't put any space characters between brackets.
Attach an EBS Volume to a running instance after creating the volume from a snapshot
Example:
<post-startup>
<command type="ec2" name="attach-ebs-vol">
[snapshot-id=snap-c7f012ae][volume-size=5]
[device-name=/dev/sdh][mount-path=/mnt/mysql]
</command>
...
</post-startup>
Note: This command will ensure that a volume is created from the mentioned snapshot in the same zone as the server instance; and mounted to the same path and device mentioned.
Configure the instance to be able communicate with S3 buckets belonging to to this AWS account that launched the instance
Eample:
<post-startup>
<command type="s3" name="setup-s3config"/>
...
</post-startup>
Cleanup the S3 configuration from the instance to disable any further communication with S3 from this instance
Exmaple:
<post-startup>
<command type="s3" name="teardown-s3config"/>
...
</post-startup>
Normal shell script executions on the instance
Example:
<post-startup>
<command type="script" name="cloudondemandconfig.sh">
<![CDATA[
echo "copying from s3"
s3cmd -f get s3://your_bucket/your_file /somepath/your_file_on_the_server
]]>
</command>
...
</post-startup>
Creating Custom Events
Along with auto-deployment of an N-tier system, IMOD also automates the management of the deployed N-tier systems by providing a framework for defining custom events and mapping events to actions and workflows in the system definition file. This functionality enables fully automated lifecycle management of deployed applications. IMOD automatically execute pre-defined actions in response to registered events. Think of this as an auto-pilot for managing the service levels for your application, IMOD automatically takes corrective actions without requiring any human intervention to ensure service levels are met.
To leverage this functionality requires two main steps; one is to define action and map action to event and second is to configure the event conditions for triggering the events.
Define Actions and Map Events
In the System Definition file you can define any custom actions and map actions to events. Events are generated by the Monitoring System or can be generated manually from the n-tier Dashboard. To automate the response to events we need to map the events to actions. See example System shown in Figure 1.
Figure 2: Event-to-Action Mapping
Configure Events
You have to configure Events so that the Monitoring System can trigger them at specified condition. In IMOD you can do it from the Monitoring dashboard. Click on Configure System Event button to open the Create Event Dialog Box as shown in Figure 2. Perform the following operations to define an Event. Since release 1.5, IMOD supports persistent events. Persistent events are statically defined because they can be defined on systems that have not been started. Statically configured events are persistent in nature because they are persisted along with the system definition whereas dynamically configured events cease to exist when the system is stopped. When a system is undeployed even statically configured events are deleted. Statically configured events are activated when the system is started and it is initially associated with all the servers in the selected server roles. On the other hand, dynamically configured events are initially associated with the selected server instances only. In both the cases, the server association may change as servers come and go. • Chose the System • Check the Servers/Server roles whose metric values will be used in deciding the bounding condition. Since release 1.5 we can select server roles as well as specific servers. • Select the name of the event, you want to configure, from the Event Name combo box. Whenever you chose a system the Event Name combo box is populated with all the events that you have defined in the system definition file. • Write down your comments for that event that will be used for the Alert mechanism. Sending alerts for trigger events is optional you can click on the check box to send alerts and specify the severity and priority of the alerts. In case an alert is configured you receive an email anytime the trigger occurs. There is a separate button called 'Create Alert'. This is used to define alerts on single servers. Only action taken by such an alert is sending email and it is currently supported for AWS servers only.
Figure 3 : Create Event Dialog Box
• Select the Event Type. Events may be aggregated or non-aggregated. If you chose an event to be aggregated then the event will be triggered based on the aggregated metric values of all the selected servers. Otherwise the event will be triggered based on individual metric values of all the selected servers. Generally scale up and scale down events are aggregated events whereas instance recovery and service recovery are non-aggregated events. You can safe guard your scaling mechanism by properly setting the max & min values for serverType tag in your system definition file. IMOD will not scale up beyond the max value (maximum number of servers for that tier) and scale down below the min value (minimum number of servers for that tier). • Properly select/deselect the Dynamic? checkbox to indicate whether you want to configure the event statically or dynamically. Statically configured events can be defined when the system is not in running state. In this case, server roles are selected in the second step above (2nd bullet from top). Dynamically configured events can be defined when the system is running and has running servers. In this case, running servers are selected in the second step above. • Chose the metric for deciding bounding condition. IMOD supports, CPU, Memory, I/O, Disk-Space, Swap Memory, and Number of Requests. The metric ‘Ping to Server (TCP)’ is provided to support instance recovery. If you need any other metrics please let us know and we will add them. • Chose the bounding expression for metrics other than ‘Ping to Server (TCP)’. Current release of IMOD supports the following expressions: o Average value for period of T times < N o Average value for period of T times > N o Average value for period of T times = N o Average value for period of T times NOT N • Chose the value of N if required. • Chose the value of T if required. • Click on Submit button.
After defining the Event you see it in the Monitoring dashboard under Alerts/Events Tab. Just click on a System name under Systems and Standalone Instances Tab. It will list all the events defined for that System. To delete an Event click on the corresponding delete icon.
Figure 4 : List of Custom Events for a System
Event configuration Example
Refer to the sample php-collab system definition template provided as an example in IMOD. php-collab is a php-based collaboration application, and the System Definition is for deploying the application in the following 3-tier setup: • A web-tier with two apache load balancer configured with round robin dns. • An app-tier with php-collab application deployed on the apache web server. Initially it consists of two app nodes. • A db-tier configured with mysql cluster consisting of a manager node, a sql node and two ndb nodes. The System Definition file has three pre-configured actions as an example; one action is for recovering the system failure in the app-tier, second for scaling-up the app-tier, and third for scaling-down the app-tier.
To test drive the event to action functionality, deploy and run the php-collab system and configure the events on the monitoring page for events in the System Definition file. You can simulate the failure of the server by killing one of the servers in the app-tier, e.g. for EC2 server you can kill the server from Elastic Fox Firefox plug-in or command-line EC2. IMOD will discover the server failure and automatically recover the system by launching a server and configuring it in the app-tier by executing the recovery action. If you shutdown the system using IMOD, it will be treated as a planned shutdown and out-recovery event would be triggered.
In addition to auto-triggering of the events from the monitoring system, you can also fire the events manually from the n-tier dashboard by click the Event button and selecting the appropriate event from the drop down list and firing it by clicking on Fire button. Some events may require inputs, e.g. recovery depending on how it is defined in the system definition file may need instance id (server id), this information can be passed using the add button and adding the name value pairs to any event when manually firing the events.
Figure 5: Manually Firing Events
Following pages have some screen shots as an example how the system UI looks during scale-up and scale-down events.



BlogMarks
del.icio.us
digg
Fark
Furl
Newsvine
reddit
Segnalo
Simpy
Slashdot
smarking
Spurl
Wists