Cloud Native Software is software that gets its configuration information dynamically.

TL;DR — Too long; didn’t read

Background, Defining Release Management vs Configuration Management.

Release Management, This is what we’ve been doing with software since the earliest days of computing. Release management is naming and managing a collection of versioned files. It’s not uncommon for release management systems to pull directly from github with a git sha id or collection of git sha id-s. Named releases of Operating systems and named releases of applications are more examples of software that is under release management.

Configuration Management is the information associated with a deployment. The most common examples is IP address and Passwords when an Operating System or application is installed. Another example would be when an email program is configured with your email address and password. Many people will have the same release of the email application, but your configuration should be unique.

Cloud Native Computing is the art of deriving as much of the configuration information as possible dynamically. It’s more cloud native when the configuration is generated from other applications. Yes, that means for a given deployment, if there is static configuration information in a text file, git repo, database, etc. it’s less cloud native. When were calling an API to an IAAS platform, were getting the reference to fresh VM, that’s more cloud native. Wrap that API call in to something that composes a pattern of hosts that is fault resistant, that’s even more cloud native.

The classic “hello world” written in “C” and in most computer languages since, assume your on a host with no name. That doesn’t work in the cloud because, we have to be able to address the hello world program. Usually through an URL or API. In the name of completeness and to make things more confusing, the need to address a unit of work with out all the tooling associated with making or declaring an URL or API is server-less computing, AKA function as a service (FAAS). An open source implementation of FAAS is OpenWhisk. It allows a hello world like program to exist without IP address or URL information at deployment time. OpenWhisk will deal with the mechanics of addressing and scaling the workload.

Linux installed from an ISO will require at least your password information entered in at install time. When Linux is installed from a cloud imagecloud-init will be used for it’s configuration information. This is more cloud native.

Current view of the Cloud

IAAS, Infrastructure as service, calls an API and gets your VM. This can be done with AWS, CGE, Azure, etc. PAAS, Platform as a Service means services your application will need is already up and running for you to call. If the service in question is implemented and deployed with a cloud native application, It will be implemented with many interchangeable workers, read as fault tolerant. If your calling an API and getting a SQL database, or DNS services implemented with MySql and DNS Bind respectively, your less cloud native and on par with classic hosted solutions. Opensource PAAS projects like Apache Mesos and Cloud Foundry already announced support for Docker. That means PAAS is position to fade or morph in to container as a service (CAAS). These definitions are functionally very useful, but they allow for software to retain classic failure pressure points. Examples include the need to update the kernel of the host your software lives on. Your software may go down for the duration of the patch. A Cloud Native application that gets it’s configuration dynamically means your software can be deployed to a new host, identification to software can be changed to the new host. The result is that your living the cloud dream and enjoying little to no down time with routine operations.

Managing the chaos

In most cases, your business’s secret sauce will need more than one micro-service. Below are some ways to think about managing the process of decomposing your work load in to some known patterns.

Tick Tock pattern of complexity management from Intel is still relevant. Intel managed the complexity of building micro processors with a very simple and elegant discipline of change management by limiting changes to either the chip architecture or die geometry size for every iteration. The benefits were many. Engineers have a clarity of focus. Two different groups of engineers can focus on their craft of architecture and die geometry changes. During implementation engineers had a huge hint on what the source of the problem is based on Tick-Tock cycle. It effectively reduced the potential error space by half. Within Intel, Tick is the die shrink, Tock is architecture change.

“Lift and Shift” is the cloud version of Tick, or the die shrink, or deployment on a given infrastructure stack. It doesn’t matter your using configuration management tools like Puppet, Salt Stack, Ansible, or add your favorite configuration management tool. Like any tool, it’s all in the way the user of the tool choose or was forced to use the tool. If were loading up most or all of the data from a static source, it’s more a classic deployment. It doesn’t matter it’s all running on virtual machines on AWS, GCE, Asure, etc. The most Lifted and Shifted application is Word Press, IMHO. Word Press is a Hello World type example for Docker Compose.

Stateless Services

The Cloud Dream is that it should just work. The most straight forward pattern to make a given service more robust is to refactor the service in question in to a stateless API. Implementation will have the API fronted by some type of high available proxy, maybe even implemented by HAProxy. The pattern continues with N copies of a shared nothing controllers. This will allow any given controller to fail and service still work. The next step in the chain may also include many workers behind each of the controllers. Actual implementations of this pattern will vary widely. There are lots of services that already have something very similar. Think Cassandra, etcd, kafka, rabbitmq, the list is too long. Any newer and substantial service will usually require a small team to implement and maintain a given service. Examples of this can be seen in Openstack Nova architectureOpenstack Neutron architecture, other Openstack micro-services combinedKubernetes architecturespinnakerOpenwhisk. Apologies in advance to any noteworthy project I neglected to mention.

Multiple Micro-services

In the past we had longer debug cycles with a monolithic application. Now what we’ve decomposed large applications in to many micro-services, the edge cases of calling these N micro-service, in some given order with a given use case, results with an feature that marketing will call value added services, until the lawyers get involved. Continuous deployments created the Site Reliability Engineer, to be the first point of contact to deal with issues that are expected when integrating multiple micro-services. This shift in where and how the work load for integration of multiple micro-services is done is a very key difference for cloud native software. Continuous deployment of multiple micro-services is so different and so necessary that Mirantis open sourced and dropped it’s baby, the Openstack Fuel installer, bought a software development company, and is now promoting a solution that allows the contiguous deployment of Openstack.  Other competing method of deploying openstack includes adopting the use of docker containerized Openstack Components, also known by its project name Kolla. This collection of containers can then be deploy using Ansibile or Kubernetes. There is no clear winner for Openstack deployment pattern. I predict that at some point, all the cloud providers will start adopting Openstack behind the API. It’s just a cost thing that happens when competition starts to squeeze margins.

Who are Stakeholders

The end user is the first obvious stakeholder. It’s all about making a better experience for the end user.

Systems Administrator. I would suggest this is the grand daddy use case. Systems administrators faced with data centers filled with hundreds or thousands of snowflakes built by people who have long left the company. Each computer with applications that look something like a classic LAMP stack. Any routine operational issue like fixing known vulnerabilitys, software updates, power outages, loose cables, even external to the host services like network routing and switching will all cause down time. Data center operators seeking a better way would eventually led to Openstack. I would also suggest that Open Stack was the first famous open source application refactored in to micro services, at least in my tiny view of the big open source world. 

Openstack is difficult to deploy because each micro-service component with in Openstack was deployed manually. Each Open Stack deployment didn’t necessarily have all components. Each component may have had custom tweaks for their use case. I don’t know who was the first person to use the snow flake analogy to describe an Openstack deployment, but it more accurate than not.

We have Rackspace and NASA to thank for starting and releasing Openstack.

Site Reliability Engineer, When your web site is a search engine. Your compute load is embarrassingly parallel,  you’ll pioneered the use of Site Reliability Engineer to work out issues associated with continuous release, and continuous integration. This is also where the use of Docker shines. By having an immutable image that is run hundreds or thousands of times, were eliminating the risk that any given instance was modified in production. For projects that are less mature and still rapidly changing, This becomes a velocity inhibiting issue. For the uninitiated, embarrassingly parallel computer problems are still very difficult get get right in scale. Rinse and repeat this process enough time as an open source project, you’ll have Kubernetes. These contributions are complements of Google.

Application developer productivity focus, In this view your going remove or minimize all barriers  that the application developer will encounter to release their code all the way to production. No provisioning requests portals, emails or other gating process for releasing your code. They all do nothing but slow down the edit, compile, run cycles. All applications will be different, so the deployment pipeline must be composable for each micro-servive. There should also be the ability to make any number of new complete deployment copies with meta data to tag a given deployment as development, staging, production or any other of arbitrary states. Then the release of a deployment to production should support canary release patternblue green deployment and Chaos Monkey. Not having these strategies in software means the staff will be doing these things with custom in house scripts or even manually where appropriate. The above application concentric features is a description of Spinnaker.  Spinnaker is an open source project that was originated by Netflix.

Random Thoughts

By focusing on your application, the architecture, the tock of the Tick-Tock, your secret sauce,  your maximizing your velocity by delegating the “tick” task to a vendor. This does have the external cost of being tied to the next generation of vendor lock in. It’s the same story since the beginning of technology time, first it was the micro processor, then the system, then the operating system, then applications like data storage and databases. All companies endeavor to be a monopoly. Open Source is the new battle ground for vendor lock in. Yes, they all work together, the same way that tight group of girlfriends are all friends. Think friendenemies … The pattern of delegating IAAS to AWS has been very successful for Netflix. Watch any of their You-tube videos on spinnaker, you’ll get testimonials after testimonials of how they use AWS.

It’s my opinion that the that most influential contributions to the art and science of cloud computing thus far are Openstack, Kubernetes and Spinnaker, maybe OpenWhisk for an honorable mention. Each of these applications are implemented with many APIs using deployment patterns that enhance fault tolerance. It’s worth mentioning that Openstack started as one large monolithic program. It was refactored in public view in to compute, networking and storage components. Subsequently afterwards many other APIs. Openstack’s list of APIs is still growing today. Google’s Kubernetes is a refactor of an internal deployment tool know by the code name Borg. Netflix’s Spinnaker is a refactor of it’s internal deployment took known by the name Asgard. All of these projects are the result of lots of very talented engineers who worked very hard for many years. If your journey to the cloud has been rocky, your in good company.

Cloud native software should get all of it’s configuration information from the cloud. Software must be refactored to accept this pattern. Lift and Shift is not cloud native. It’s a great step in a journey to being fully cloud native. It’s a Tock in the Tick-Tock cycle. In deciding if it’s worth the journey to become fully cloud native, the standard business considerations of cost, benefit, and resources must be decided on a project by project basis. It sounds so simple to pull configuration information for an application dynamically. The actual practice of doing so, as evidenced by Openstack, Kubernetes, and Spinnaker, shows us that it’s difficult to do in practice. At least in the year 2017. This blog post is done with the hopes that your journey to building a cloud native application is a more informed journey.