Monday, July 18, 2016

Adding the OPS back into DEVOPS in Cloud

I agree strongly with the motivation behind so-called DevOps.

From a Developers point of view, they want to build applications that serve the business - indeed, we would argue that is the sole purpose of IT. Hence, anything that delays delivery of applications into Production is seen as an impairment to IT, and hence a blocker to the business. In an age of ruthless innovation and competition, IT needs to be able to deliver quickly. Traditional IT Operations teams are typically seen as failing to deliver fast enough to serve the business.

In response to this, DevOps aims at establishing a culture and environment where building, testing, and releasing software, can happen rapidly, frequently, and more reliably.

Key to this is Agile development, testing and deployment - typically Continuous Delivery. Not only that, but if this approach is build on a Cloud environment such as Amazon Web Services (AWS), developers can be encouraged to "experiment".

Cloud services enables Dev environments to be spun up in literally minutes - in some cases at no cost. This promotes an interactive, innovative approach - the mantra "fail fast / fail often" has been used: development teams can try out that really Cool new feature and if it fails, then move on to another idea.

However, whilst Dev environments can be created in minutes, I would argue that Ops environments still need a bit more planning. Here are a few reasons why:

FEATUREDEVOPS
UsersMe and myself, plus my Agile friend.Systems with millions of users are quite common these days.
SecurityWho cares? This machine will be destroyed in a couple of months, anyway.Robust Security is essential to protect customer data, and thus the business reputation.
Patch controlWhen I am at lunch.Rolling maintenance windows are essential to preserve a 24/7 service.
ScalabilityMy machine is good enough.Use EC2 Auto-Scaling Groups everywhere, with Elastic Load Balancers and Internet Gateways.
PersistenceA small Database on the Dev machine is fine for build and test.High volumes may not scale out. So the production data needs to be loaded on highly reliable replicated data stores.
ConnectivitySSH is fine. Direct access to servers is perfectly acceptable.Access must be controlled to preserve the business reputation

Bearing in mind the differences between the DEV and the OPS worlds, here are a few recommendations for how DevOps teams can ensure that the OPS side does not lose out:

Assume Infrastructure will fail

This seems counter-intuitive, but the whole Cloud philosophy is that infrastructure will fail. Machines should not be treated like Pets (given loving names and looked after), but rather as Cattle (if they fail, just bring in another one).

So, for example, Data should not be persisted in memory between one session request and the next. Cloud Applications must not rely on load balancer "stickiness". An HTTPS user session which starts by making a request to one server may end up on a totally different server for the second request.

This means a revolution in the ways applications are designed.

Design for Microservices

The assumptions above mean that a Microservices design approach is essential.

Unlike the old Monolithic Application design, in a microservices architecture, an application consists of small services using lightweight protocols to communicate with each other.

This makes it much easier to change and add functions and qualities to the system anytime. If part of an application needs to be re-factored, it can be done so provided that it's interface is preserved.

One of the benefits of this is that new features could be added to an application just by updating the microservices that comprise it. This helps quick and continuous release. In a large organisation, different teams can own different micro services, which means that parallel development can be progressed without teams impacting each other.

The key to this is that the interfaces much remain consistent (or microservices versioning used to manage different interfaces from different sources.

Don't assume request resources are known

In the old development world, a service was built for a specific application need. In the microservices world, your microservice could be called from anywhere. It may not be called from a specific URL, it might be from another application somewhere else (even in another part of the Cloud), hidden behind a load balancer, internet gateway or content delivery network.

A microservice needs to be oblivious of who or what is calling it - otherwise it becomes just a module in a Monolithic Application.

Assume everything is hosted somewhere else

When developing applications, it is convenience to build new microservices and their clients on the development machines. But in the Ops world, there may be very valid reasons for moving microservices away from the client software that calls them.

For example, if you develop a new microservice that wants to call out to the Internet to capture some web service data, the sensible Ops deployment might be to host that microservice in a network zone with an Internet Gateway. This could be in a different network zone from other applications or services.

Remember your new interfaces

This is another "gotcha" which can kill an Ops deployment: if you suddenly remember that your new Cool Feature worked by doing a call out to the Finance Service, but you forgot to account for that in the Ops world.

So remember that if new interfaces to external systems are introduced, you need to make sure the Ops deployment can access it.

Compartmentalise Persistent Stores

Although most infrastructure can happily "scale out" (add more servers as needed), persistent databases typically "scale up". There are exceptions - AWS RDS services allow for readable replicas of production instances, so that performance scalability is very good. I have even heard of companies throwing away their Oracle Exadata machines when they moved to AWS.

Nevertheless, for Ops deployment, it sometimes makes sense to split logical data sets into separate database instances. For example, the customer data could be hosted on one set of instances, and the marketing reporting on another. This means that applications and microservices should not necessarily assume that all data sets are in the same database, and design accordingly.

Don't allow applications to have unnecessary privilege

We all know the stories of where applications were given "drop table" privilege, or could access high privilege system resources. This may work for developing a proof of concept, but is not recommended for Ops, for security reasons. By all means do this for prototyping, but please re-factor this code before it moves into Production.

GAIN THE BENEFITS

Spin Up, experiment, fail fast, and try again.

Then continuously deploy into the Ops world.

But remember to bear in mind some of the key points above.

Now, about that really Cool Idea I had last night....

Monday, June 27, 2016

Is Cloud ready for the mainstream?

I wrote about the Cloud approach to IT four years ago in this blog. At the time, I wanted to urge caution, particularly in relation to Production implementation. 

Four years on, how have things changed?

I think the Cloud has finally Grown Up. Always popular with development teams and leading-edge (or should that be bleeding-edge?) deployments, is it now time to see Cloud provisioning as the de-facto approach to Infrastructure requirements?

It could be argued that, for commercial organisations, Cloud really started with Software-as-a-Service (SaaS). Companies such as Salesforce.Com established the concept of outsourcing non-core business processes. (Although I suspect my sales & marketing colleagues would take exception to being categorised as "non-core"!) Within IT itself, cloud-based Service Desks such as Service-Now began to eat into traditional service desk markets. 

Subsequently, the concept of Platform-as-a-Service (PaaS) began to be exploited by Development teams who wanted to spin up "Build" and "Test" environments quickly. 

Compute environments were (and still are) another valuable use-case; make use of massive cpu capacity to do data analytics or asset pricing, saving having to purchase on-premise processing. This, plus the idea of just provisioning Storage (Infrastructure-as-a-Service, or IaaS), is where the questions of security and reliability came in. 

Put simply; why should a company trust a third part supplier to look after its confidential data? To answer that question is to address the heart of Cloud, be it IaaS, PaaS, or SaaS. 

In my view, the questions of Security are now being addressed. Many companies now conform to rigorous security rules regarding data isolation, "Chinese walls" and other practices so that even some Banks are now prepared to trust their secure data to a Cloud service. 

Reliability and Availability are also being addressed. However, this does require a different philosophy to infrastructure. The approach is to view servers not as "pets" (having individual attributes, and to be nursed back to health if they get sick) but rather to treat them as "cattle" - herds of identical attributes. If one gets sick, you just kill it and use another one. But this does mean that Applications need a totally different approach. 

If you want your application to be able to run on a Cloud solution, you need to recognise that whilst the environment itself may be stable, individual components themselves might fail. This is much more of an "organic" approach to resilience, compared with the older "technocratic" approach of ensuring resiliency by ensuring availability of each and every component. 

So the new approach involves:

overall infrastructure is "stateless", and runs very small "micro" ACID (Atomic, Consistent, Idolated,Durable) transactions. 
- each transaction takes minimal elapsed time and can run on any host. 
- very simple persistent storage mechanisms are used to store user "state" where necessary. 
- failure of infrastructure does not lead to failure of applications. 

Of course, all the good things we have always demanded from infrastructure (security, availability,reliability, supportability, etc. ) must still apply. 

But, in the Cloud world, we deliver them in a different way - using micro-services hosted on anonymous farms of infrastructure. 

Under this new philosophical approach, the focus moves to Supplier Management. Chose you Cloud Supplier with great care - your business data is in their hands.