Monday, July 18, 2016

Adding the OPS back into DEVOPS in Cloud

I agree strongly with the motivation behind so-called DevOps.

From a Developers point of view, they want to build applications that serve the business - indeed, we would argue that is the sole purpose of IT. Hence, anything that delays delivery of applications into Production is seen as an impairment to IT, and hence a blocker to the business. In an age of ruthless innovation and competition, IT needs to be able to deliver quickly. Traditional IT Operations teams are typically seen as failing to deliver fast enough to serve the business.

In response to this, DevOps aims at establishing a culture and environment where building, testing, and releasing software, can happen rapidly, frequently, and more reliably.

Key to this is Agile development, testing and deployment - typically Continuous Delivery. Not only that, but if this approach is build on a Cloud environment such as Amazon Web Services (AWS), developers can be encouraged to "experiment".

Cloud services enables Dev environments to be spun up in literally minutes - in some cases at no cost. This promotes an interactive, innovative approach - the mantra "fail fast / fail often" has been used: development teams can try out that really Cool new feature and if it fails, then move on to another idea.

However, whilst Dev environments can be created in minutes, I would argue that Ops environments still need a bit more planning. Here are a few reasons why:

UsersMe and myself, plus my Agile friend.Systems with millions of users are quite common these days.
SecurityWho cares? This machine will be destroyed in a couple of months, anyway.Robust Security is essential to protect customer data, and thus the business reputation.
Patch controlWhen I am at lunch.Rolling maintenance windows are essential to preserve a 24/7 service.
ScalabilityMy machine is good enough.Use EC2 Auto-Scaling Groups everywhere, with Elastic Load Balancers and Internet Gateways.
PersistenceA small Database on the Dev machine is fine for build and test.High volumes may not scale out. So the production data needs to be loaded on highly reliable replicated data stores.
ConnectivitySSH is fine. Direct access to servers is perfectly acceptable.Access must be controlled to preserve the business reputation

Bearing in mind the differences between the DEV and the OPS worlds, here are a few recommendations for how DevOps teams can ensure that the OPS side does not lose out:

Assume Infrastructure will fail

This seems counter-intuitive, but the whole Cloud philosophy is that infrastructure will fail. Machines should not be treated like Pets (given loving names and looked after), but rather as Cattle (if they fail, just bring in another one).

So, for example, Data should not be persisted in memory between one session request and the next. Cloud Applications must not rely on load balancer "stickiness". An HTTPS user session which starts by making a request to one server may end up on a totally different server for the second request.

This means a revolution in the ways applications are designed.

Design for Microservices

The assumptions above mean that a Microservices design approach is essential.

Unlike the old Monolithic Application design, in a microservices architecture, an application consists of small services using lightweight protocols to communicate with each other.

This makes it much easier to change and add functions and qualities to the system anytime. If part of an application needs to be re-factored, it can be done so provided that it's interface is preserved.

One of the benefits of this is that new features could be added to an application just by updating the microservices that comprise it. This helps quick and continuous release. In a large organisation, different teams can own different micro services, which means that parallel development can be progressed without teams impacting each other.

The key to this is that the interfaces much remain consistent (or microservices versioning used to manage different interfaces from different sources.

Don't assume request resources are known

In the old development world, a service was built for a specific application need. In the microservices world, your microservice could be called from anywhere. It may not be called from a specific URL, it might be from another application somewhere else (even in another part of the Cloud), hidden behind a load balancer, internet gateway or content delivery network.

A microservice needs to be oblivious of who or what is calling it - otherwise it becomes just a module in a Monolithic Application.

Assume everything is hosted somewhere else

When developing applications, it is convenience to build new microservices and their clients on the development machines. But in the Ops world, there may be very valid reasons for moving microservices away from the client software that calls them.

For example, if you develop a new microservice that wants to call out to the Internet to capture some web service data, the sensible Ops deployment might be to host that microservice in a network zone with an Internet Gateway. This could be in a different network zone from other applications or services.

Remember your new interfaces

This is another "gotcha" which can kill an Ops deployment: if you suddenly remember that your new Cool Feature worked by doing a call out to the Finance Service, but you forgot to account for that in the Ops world.

So remember that if new interfaces to external systems are introduced, you need to make sure the Ops deployment can access it.

Compartmentalise Persistent Stores

Although most infrastructure can happily "scale out" (add more servers as needed), persistent databases typically "scale up". There are exceptions - AWS RDS services allow for readable replicas of production instances, so that performance scalability is very good. I have even heard of companies throwing away their Oracle Exadata machines when they moved to AWS.

Nevertheless, for Ops deployment, it sometimes makes sense to split logical data sets into separate database instances. For example, the customer data could be hosted on one set of instances, and the marketing reporting on another. This means that applications and microservices should not necessarily assume that all data sets are in the same database, and design accordingly.

Don't allow applications to have unnecessary privilege

We all know the stories of where applications were given "drop table" privilege, or could access high privilege system resources. This may work for developing a proof of concept, but is not recommended for Ops, for security reasons. By all means do this for prototyping, but please re-factor this code before it moves into Production.


Spin Up, experiment, fail fast, and try again.

Then continuously deploy into the Ops world.

But remember to bear in mind some of the key points above.

Now, about that really Cool Idea I had last night....