Friday, January 2, 2015

boot2docker Work-around for API Limit Error

update Jan 3rd, 2015: see bottom of this post for follow-up tweets from docker project 

i recently got a new laptop and had to reinstall some software. i kept having trouble with boot2docker. after running boot2docker up followed by docker version i kept getting an error about an API versions mismatch between the docker client and server. sure enough boot2docker ssh confirmed that i had an old server despite using the new boot2docker 1.4.1 download package. the standard boot2docker update procedure did not work with some cryptic command about an API rate limit being reached: FATA[0000] Error response from daemon: client and server don't have same version (client : 1.16, server: 1.15)

it turns out that boot2docker relies on github URLs to determine the latest version and too many unauthenticated users were invoking github from my NAT'd originating IP. boot2docker issue 481 is currently tracking this.
thankfully there is an easy work-around already available that works with the latest 1.4.1 release of docker shown below.
now my local docker workflow is all happy again.

update Jan 3rd, 2015: the docker team followed up yesterday explaining that the docker machine project was the preferred going forward approach and that boot2docker would be sunset. i was able to get it working, but the distributed machine binary requires a custom build of docker from a personal repository, which is certainly not ideal if you're concerned about security. they explained that was the path forward until new identity auth work was merged into docker with .

Monday, February 3, 2014

Remote Dependencies, Convenience, Risk and Other Considerations for Operating Distributed Systems

One deeply held principle by experienced distributed system operators that I have worked with is that you should have no external dependencies to your software other than the ties to minimum requirements of the OS such as common system libraries, utilities, and the kernel of the base OS. This approach should enable recreating a distributed system deployment without any dependencies on the outside world. When something goes wrong, you should have control over your own destiny. Reliance on any external dependency that is managed or hosted by someone else introduces risk that something outside your system can affect your ability to restore and recreate the system any time you need to.
To use a simple metaphor, imagine your system is represented by Jenga blocks and it falls over as Jenga towers inevitably do. However, instead of being able to rebuild your tower you find out that a mandatory required component at the base of your tower is missing or unavailable now no matter what you cannot rebuild the tower exactly how it was before. Your new tower is going to behave differently in unexpected ways and you might topple over because you do not understand all the behaviors when using different building blocks combined in a different way.
Some of the original designers of the software deployment project for Cloud Foundry named BOSH (Mark LucovskyVadim SpiwakDerek Collison) embraced this principle and tried to create a prescriptive framework that encouraged this approach. They had experience managing large scale distributed systems at Google (the web services APIs). Kent Skaar also did similar for SaaS provider Zendesk. Given a software release that references specific versions of multiple software packages (known as a BOSH release), an instantiation of that release (a BOSH deployment) can be reconstructed at any time with the deployment configuration (a BOSH deployment manifest), the base OS images (the BOSH stemcells) and the software release (the BOSH packages and job templates for applying configuration). at any point in time, properly implemented BOSH releases of large scale distributed systems can be recreated without external dependencies. That means this holds true even when the internet is unavailable.
BOSH does give you the framework hooks to break out of this prescriptive principle and use external dependencies or at least external dependency formats if you choose to for convenience or other reasons. Dr Nic Williams recently implemented tooling to use apt packages instead of compiling from source. another example: some of the Pivotal big data software intentionally targets CentOS/RHEL only and therefore only ships rpm packages rather than compiling Hadoop. A guiding principle is that you should be mindful of the tradeoffs you are making of convenience vs risk and tying your release to only one OS distributor.
Examples of the tradeoffs:
  • relying on an externally hosted package manager like apt-get could affect the availability or correctness of that dependency when you need it most
  • relying on debian packages could prevent someone from using your release unmodified with a CentOS image
A recent real-world example demonstrated the risk of an external dependency changing unexpectedly. The coreos/etcdproject that Cloud Foundry is using for storing stateful configuration data for the new Cloud Foundry Health Manager codebase had one of the dependencies (goraft/raft) force push to master of their git repository that overwrote some git history required by git to work properly. This situation has limited the flexibility of some users to make code modifications on several previous releases of Cloud Foundry without some tedious intervention.
A common reaction when learning about Cloud Foundry BOSH is to question the prescriptive guidance to compile from source when commonly used distributed package management systems exist in the Linux distributions. My recommendation is to understand the tradeoffs involved and make the best choice for your situation. You should explicitly call out external dependencies if you have them in your system. When you tower inevitably falls over, know how to rebuild it.

Saturday, February 1, 2014

How to Find Java Mission Control on OSX 10.9

i was excited to show some of my colleagues how great Java Mission Control is to debug, troubleshoot and monitor local and remote Java applications. On my OSX 10.9 laptop i installed the latest Oracle JDK 7, which now includes Java Mission Control with the JDK, and i expected to be able to type jmc on the command line. that didn't work and resulted in a command not found! OSX Finder wouldn't find jmc it either. i found a hint on an OTN community thread. the Java installer only put installation files in the obscure and hard to find /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/ instead of also placing a shortcut in /usr/local/bin which should link to /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/jmc that should change when you use /usr/libexec/java_home. you can see the install location by running this command:
$ find /Library/Java -name jmc
/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/lib/missioncontrol/Java Mission

the solution i chose was to create this executable file in /usr/local/bin/jmc



maybe it's because i had JDK 7 installed originally before Mission Control was included with it. i'm not sure, but i hope this helps someone else.

Thursday, December 19, 2013

Friendly BOSH Labels in vCenter

thanks to the BOSH team for showing me how to do this. Cloud Foundry BOSH automatically tags vSphere deployed VMs with various attributions including the job and index. this way instead of justing have a GUID as the name in vCenter, you can add additional columns that are already populated with the BOSH job and job index. we'll be adding this to the documentation for pivotal cf soon. click the image to see the full size.

Thursday, December 12, 2013

Send Interactive Commands to a Cloud Foundry App with websocketd

on my way home from work tonight i saw the tweet below about connecting STDIN and STDOUT from remote processes with websocket. i tried it out quickly locally and it worked streaming output of numbers from 1 to 10 sent to STDOUT 1 second apart over websocket using localhost. can i apply this to cloud foundry easily?  it turns out the answer is yes!
you can easily include the small linux 64bit websocketd binary and this script with your app and remotely send commands over websocket that will execute in the app container and stream the STDOUT from the command back over websocket to the browser. this is helpful for sending commands like rake db:migrate or to explore the linux container file system after the buildpack has run. see the screenshot and video below. for the impatient, skip to about 3:00 of the short demo.

more instructions are on github. also see the websocketd project.

Sunday, December 8, 2013

Be Direct When You Communicate

"Be direct when communicating" is a common theme I've been hearing the last few days in Pivotal leadership discussions and other places.

When I listened to the Twitter CEO Dick Costolo fireside chat with PandoMonthly (start listening at around 35:30) it crystalized how important this is and how easy it is to fall out of this to appease the feelings of someone you're meeting with. Dick describes how he has a management / leadership training class where they do exercises for this and how many experienced people still yield to the temptation to "migrate along the Y-axis" and give up clarity instead of "optimize for the X-axis." Doing this definitely takes practice, but I'm going to remind myself to think of these axes in my communication. It's totally great if the person feels great about the discussion, but how they feel about it is not as important as receiving and understanding the message. There are certainly many conversations where someone will feel badly about the message, and that's a fine outcome if they receive the message and were unlikely to feel good about it under any circumstances.

Monday, October 14, 2013

Help Us Design and Build an Open Source Google Omega

One key Cloud Foundry design goal is that a single operations person should be able to manage hundreds or thousands of machines. We want to bring to the similar tools that are available privately to web scale-out companies like Google with Omega to the typical enterprise. We are looking for a hands-on practitioner that is passionate about creating and operating large systems, wants to act as the product owner role and enjoys working shoulder-to-shoulder with an extremely talented and experienced Cloud Foundry BOSH engineering team. You will also join an amazing user community with passionate technologists like Dr Nic Williams. Watch Dr Nic's talk describing why he fell in love with Cloud Foundry BOSH in his presentation at PlatformCF in September of 2013.

We’re looking for someone passionate and conceptually aware of:
  • cloud operator user experience
  • cluster and data lifecycle management
  • vm/container orchestration
  • complex network management
  • disk volumes
  • software package management
  • basic monitoring

Dr Nic metaphor for BOSH at PlatformCF 2013
Cloud Foundry runtime is a PaaS for running apps and services. Underneath the PaaS, there is a whole other aspect of Cloud Foundry named BOSH (Bosh Outer SHell), which was inspired by the systems in use at Google, Amazon, Facebook and Twitter to deploy and manage their software across many data centers around the world. There is not anything else available in open source that has the same scope and capabilities. Apache Mesos is in the same space, but has a different architecture that doesn't focus on IaaS orchestration, utilizing base operating system images and fine-grained control over network and disk. Cloud Foundry BOSH is the secret sauce that enables running the same Cloud Foundry runtime software on VMware, Amazon and OpenStack infrastructure-as-a-service changing only a minimal amount of infrastructure specific configuration. Not only does BOSH deploy these systems across hundreds or thousands of instances, but it is also able to keep them up-to-date without downtime as new software and fixes are released and rolled through the machines incrementally including kernel patches and middleware.

Cluster lifecycle management and operator user experience will be main focus areas for BOSH in 2014. Many stateful scale-out clustered data services that Cloud Foundry users want to deploy and manage with BOSH have nuances around startup sequence, whether cluster nodes can run in a mixed version cluster, and related considerations. This is a hard problem that when solved well, will be unique in the open source community.

Recently, we made it possible to use BOSH on a developer class machine with bosh-lite. The bosh-lite project uses vagrant and Linux Containers. Instead of traditional IaaS with multiple VMs for each role, bosh-lite uses linux containers inside of a single linux host. Therefore it is much faster to develop and test a BOSH release with fewer resources.

One of the best things the work is the collaboration process. Pivotal truly values the product owner / product manager (PM) role. This role makes the key product decisions with input from all stakeholders, but ultimately the product owner is responsible for defining the product. The PM is embedded shoulder-to-shoulder with the engineering team and is responsible for representing end-users and should meet with customers to build user empathy. The PM prioritizes the daily work items for the engineering team with the agile software tool Pivotal Tracker. if the PM prioritizes a feature in the engineering backlog, we can have it in production in days. The feeling of true influence and collaboration over the product destiny cannot be contrasted enough with the experience I had with a traditional waterfall process, such as my time in Product Management at Oracle.

Another incredible benefit is that you will be working with a truly world class product management team including James Watters, Matt Reider, Shannon Coen, Justin Richard, Scott Truitt, Mark Kropf, Ryan Morgan and Tammer Saleh. We work at 875 Howard St in downtown San Francisco, walking distance from many commute options and have incredible amenities.

If you read this far, then you should get in touch with and reference the BOSH PM position or reach out to me on twitter.

Additional reading:

Google Omega:

Mesos - open source clustering platform in use at Twitter and others:

Yarn - distributed workload management using Hadoop 2.0+

Systems and configuration management automation:

Distributed linux container management: