Monday, October 14, 2013

Help Us Design and Build an Open Source Google Omega

One key Cloud Foundry design goal is that a single operations person should be able to manage hundreds or thousands of machines. We want to bring to the similar tools that are available privately to web scale-out companies like Google with Omega to the typical enterprise. We are looking for a hands-on practitioner that is passionate about creating and operating large systems, wants to act as the product owner role and enjoys working shoulder-to-shoulder with an extremely talented and experienced Cloud Foundry BOSH engineering team. You will also join an amazing user community with passionate technologists like Dr Nic Williams. Watch Dr Nic's talk describing why he fell in love with Cloud Foundry BOSH in his presentation at PlatformCF in September of 2013.


We’re looking for someone passionate and conceptually aware of:
  • cloud operator user experience
  • cluster and data lifecycle management
  • vm/container orchestration
  • complex network management
  • disk volumes
  • software package management
  • basic monitoring

Dr Nic metaphor for BOSH at PlatformCF 2013
Cloud Foundry runtime is a PaaS for running apps and services. Underneath the PaaS, there is a whole other aspect of Cloud Foundry named BOSH (Bosh Outer SHell), which was inspired by the systems in use at Google, Amazon, Facebook and Twitter to deploy and manage their software across many data centers around the world. There is not anything else available in open source that has the same scope and capabilities. Apache Mesos is in the same space, but has a different architecture that doesn't focus on IaaS orchestration, utilizing base operating system images and fine-grained control over network and disk. Cloud Foundry BOSH is the secret sauce that enables running the same Cloud Foundry runtime software on VMware, Amazon and OpenStack infrastructure-as-a-service changing only a minimal amount of infrastructure specific configuration. Not only does BOSH deploy these systems across hundreds or thousands of instances, but it is also able to keep them up-to-date without downtime as new software and fixes are released and rolled through the machines incrementally including kernel patches and middleware.


Cluster lifecycle management and operator user experience will be main focus areas for BOSH in 2014. Many stateful scale-out clustered data services that Cloud Foundry users want to deploy and manage with BOSH have nuances around startup sequence, whether cluster nodes can run in a mixed version cluster, and related considerations. This is a hard problem that when solved well, will be unique in the open source community.


Recently, we made it possible to use BOSH on a developer class machine with bosh-lite. The bosh-lite project uses vagrant and Linux Containers. Instead of traditional IaaS with multiple VMs for each role, bosh-lite uses linux containers inside of a single linux host. Therefore it is much faster to develop and test a BOSH release with fewer resources.


One of the best things the work is the collaboration process. Pivotal truly values the product owner / product manager (PM) role. This role makes the key product decisions with input from all stakeholders, but ultimately the product owner is responsible for defining the product. The PM is embedded shoulder-to-shoulder with the engineering team and is responsible for representing end-users and should meet with customers to build user empathy. The PM prioritizes the daily work items for the engineering team with the agile software tool Pivotal Tracker. if the PM prioritizes a feature in the engineering backlog, we can have it in production in days. The feeling of true influence and collaboration over the product destiny cannot be contrasted enough with the experience I had with a traditional waterfall process, such as my time in Product Management at Oracle.


Another incredible benefit is that you will be working with a truly world class product management team including James Watters, Matt Reider, Shannon Coen, Justin Richard, Scott Truitt, Mark Kropf, Ryan Morgan and Tammer Saleh. We work at 875 Howard St in downtown San Francisco, walking distance from many commute options and have incredible amenities.


If you read this far, then you should get in touch with recruiting@pivotallabs.com and reference the BOSH PM position or reach out to me on twitter.


Additional reading:


Google Omega:


Mesos - open source clustering platform in use at Twitter and others:


Yarn - distributed workload management using Hadoop 2.0+


Systems and configuration management automation:


Distributed linux container management:

Saturday, October 5, 2013

A Quick Tour of the Cloud Foundry Router and Sticky Sessions

The Cloud Foundry Router is a key component of the platform that provides:
  • load balancing to multiple instances of applications with a dynamic routing table updated in a fraction of a second every time there is a change in the system
  • sticky session support
  • access log for apps that feeds into an application's Loggregator stream
  • support for Web Socket and other TCP protocols via HTTP Connect
Cloud Foundry has had support for "sticky" sessions in both v1 and v2. This week I had a meeting with a very large technology provider that has been using Cloud Foundry for quite awhile. They had no idea that sticky sessions were supported or how they worked and I realized we should have a better explanation and showcase of this capability. The use-case for this scenario involved a significant performance optimization when using sticky sessions because of user-specific data caching in the warmed-up app instance.

To enable Cloud Foundry sticky sessions all you need to do in the application is set a cookie named JSESSIONID. Java web applications have a history of using a cookie named JSESSIONID when you enable a session with code like request.getSession( true ) and because many Cloud Foundry users run Java apps, the project decided to use JSESSIONID as an indicator that sticky sessions should be used. To use another sticky sessions with another language like Ruby or Node.js, all you need to do is set a cookie named JSESSIONID and Cloud Foundry will attempt to consistently route requests to the same app instance.

I created a simple Java web application to show how this works and I hosted a copy of it on http://sticky-session.cfapps.io. In the first screenshot you can see that sticky sessions are not enabled. First we can see that there are 4 instances of the app running.


Now when we visit the app and use something like Chrome Dev Tools to inspect the app. Refresh the app several times and observe that the printed out port changes as we are load balanced across the multiple app instances.


Now we'll click the "start a sticky session" link in the app, which creates a Java session in Tomcat and creates a JSESSIONID cookie. The router will notice the JSESSIONID cookie and set a Cloud Foundry specific cookie named __VCAP_ID__ with a value of the app instance GUID. The __VCAP_ID__ cookie is the hint to the router to send the request to the specific app instance. With these cookies set, refresh the page and the browser should consistently route to the single app instance.


If you use Chrome Dev Tools to delete the __VCAP_ID__ cookie and refresh the page, and you should see a randomly selected app instance value get set again for __VCAP_ID__. Subsequent page refreshes will be sticky to the instance reflected in the updated __VCAP_ID__ cookie value.

If you are interested in other features for the Cloud Foundry Router, let us know on the Cloud Foundry mailing list.