Before it was called “DevOps”, it was just “magic”

26 January, 2015 (06:41) | Uncategorized | By: seth

One thing I had forgotten in my time away from Amazon was that their code management, packaging, and deployment systems are magic. Seriously impressive internally built tools to manage a large, but usually apolllodecoupled codebase across thousands of servers in data centers around the world. Magic indeed… sometime black magic, but magic nonetheless.

There are pretty strict limits on what I can say about internal systems, but on the deployment piece at least the cat is out of the bag. Werner Vogels, Amazon CTO talks about Apollo, the Amazon deployment system here.

By comparison, the other massively scalable deployment system I know about is Microsoft’s Autopilo, which you can read about here,.  Autopilot is good for what it is, but Apollo has the advantage in many areas including interface (usability), configurability, packaging, and versioning, which is not surprising given its 10 year head start.

And those systems are just deployment.  My ability to search code, have code dependencies just handled for me (at development, build, and debug times), and to build code in a consistent trackable way is simply impressive at Amazon, reflecting Amazon’s long history in the software services space.

Finally my ability to request new servers at Amazon (generally EC2 virtual servers) is as easy going to a website and filling out a form.  I remember in my early days at Microsoft having to select SKUs and meet with a vendor to place the order. Then a week or so later I got a phone call telling me my servers were on a pallet on the data center loading dock…. what did I want done with them :-)

Amazon has done DevOps long before that word existed.  Like Microsoft, any company making the move into software services learns quickly how important that is.

How to communicate: Tools of the trade

19 January, 2015 (07:20) | Uncategorized | By: seth

One of the biggest issues I see on struggling software teams (although this is not limited to software) is problems with communication.   Modern software is complex, and therefore our software teams can be complex.   There are many groups within and external to the team that require information for successful delivery

  • Developers
  • Testers
  • Engineering managers
  • Product managers
  • Business stakeholders
  • Customers
  • Upper level management

Here are my quick thoughts on HOW we choose to communicate.

This list is in ascending order — bias to use the ones at the top over the ones at the bottom

Discussion (meetings, phone, IM)

Best for complex updates and impactful events. Maximizes understanding and reduces conflict via interactive communication.

Have a whiteboard (or virtual equivalent) to ensure understanding around complex technical or business logic

Issue management / ticketing systems

Best for long term item tracking, history, and status snapshot (NOT good for conveying major changes unless accompanied by discussion)

Wiki / OneNote / shared collaboration documents

Best for capturing discussion outcomes and project status. May not need if tracking system offers rich enough experience to accommodate these.


Best for “simple” updates.

Email can point to Issues/Reports or Wiki.

OK to start discussions which require documents or spec review (providing pointers to those) and then complete discussion outside of email

I am obviously biased towards agile. The following Agile Manifesto values are represented here

Individuals and interactions over processes and tools
Customer collaboration over contract negotiation

Note on meetings

Meetings when done wrong go to the BOTTOM of the list. Specifically having too many people present who do not have a clear contribution to the meeting and/or meetings without clear goals. Bias for short meetings with small groups — a quick ad-hoc 2-person face to face (or virtual equivalent) often suffices. Then follow up with one of the other three communication tools to verify all are aligned on what was agreed, and to provide a history.

TiP is misunderstood – perhaps DDQ is Better

12 January, 2015 (06:51) | Uncategorized | By: seth

I spent a long time talking to folks about the merits of a conscientious Testing in Production (TiP) strategy.  But I knew TiP had a bad rap.  I even shared the story of how some would mischaracterize it as a common and costly technical malpractice

While evangelizing TiP, I and my Microsoft colleagues would happily post this picture wherever we could

imageYet I knew the original poster was not so enthused with TiP.   Comments on TiP were supposing this was not a conscientious and risk-mitigated strategy, but instead devs behaving badly:

Then blame all issues on QA -_-

That’s our motto here. Doesn’t work to well in practice, actually.

Now I have returned to Amazon after spending 6 years at Microsoft.  From the following it looks like I have some education to do.


On the other hand, who can argue with Data-Driven Quality (DDQ).  (Except maybe a HiPPO).  DDQ is also more expansive than TiP, leveraging all data streams whether from production, customer research, or pre-release engineering.  So TiP was fun, but DDQ is the future.

Who is the HiPPO?

8 January, 2015 (19:50) | Uncategorized | By: seth


HiPPO stands for Highest Paid Person’s Opinion

HiPPO driven decision making is the opposite of data-driven decision making.  The highest paid” person may be your boss, or the VP with his eye on the project, or even the CEO.  But no matter how many of those big bucks they are pulling down, it turns out that 2/3 of decisions made without the data are the wrong ones.

When promoting data-driven decision making at Microsoft we distributed 1000s of these little hippo squeeze toys.  Perhaps by squeezing our hippo toy, we are reminded to constrain our HiPPO from making rash decisions without data (somewhat look a voodoo doll).

But another, kinder way to look at the HiPPO is as the person who has final responsibility for product feature decisions, and as a reminder to get that person the data they need to make the data-driven decisions.

Either way, it reminds us that data trumps intuition.


[and remember the hippo is considered to be the most dangerous animal in Africa (not counting the mosquito)] :-)

Better software through SCIENCE!

29 November, 2014 (23:05) | Uncategorized | By: seth

The scientific method is the time-proven way we have learned about the very principles that govern the universe.  It can be summarized by the following sequential steps

  • imageAsk a question
  • Construct a hypothesis
  • Test the hypothesis with an experiment
  • Analyze results of the experiment
  • Determine whether hypothesis was correct
  • Answer the question


A hypothesis is a suggested explanation of observed phenomena.  Given such an explanation one can then make predictions about those phenomena given certain conditions. 

But for a hypothesis to be truly useful it should be a specific, testable prediction about what you expect to happen.

For example, Galileo might ask the question "Is speed of a falling object dependent on its mass?" The hypothesis Galileo formed was that two objects of different mass, dropped from a height, would strike the ground at different times if falling speed depended on mass. The hypothesis differed from the original question in that the hypothesis predicts an experimental outcome that can be tested. The experiment in turn yielded data indicating nearly simultaneous impact with the ground, and the analysis concluded that falling speed is not dependent on mass.

Data-driven quality

In my previous blog (Data-Driven Quality explained — part 1: questions? what questions?), I introduced DDQ.  DDQ represents application of the scientific method to software quality. The steps of the scientific method can be mapped to the DDQ model as seen below.  Instead of the nature of universe though, we are interested in answering questions about the quality of our software.


Applied to software

For example we might be interested in learning why less and less folks are using Internet Explorer as their web browser. 

Based on some preliminary research we may hypothesize that users abandon IE when they encounter web pages that do not function properly, but work better in other browsers.

We might then configure IE malfunction with a select set of popular pages and assess whether IE abandonment rates are higher for users of those pages. 

Um….no…. that would be pretty stupid.  

So we take a cue from social scientists here. Social scientists do not send out crack teams equipped with highly addictive narcotics to supply certain neighborhoods that they can contrast the effects with other neighborhoods.   They instead find existing populations that already exhibit the attributes they need for comparison.

In our case we would compare users of pages known to malfunction in IE to see if there is a significantly higher abandonment rate than users who do not encounter such pages. 

If confirmed we can then dig in and identify the chief offenders of browser compatibility and fix them…. then re-assess.

Software quality

Testability, data-driven, answering questions… these should all sound familiar to any software professional as good practices.  Using DDQ and the scientific method is a powerful way to apply these for your software.

Data-Driven Quality explained – part 1: questions? what questions?

24 November, 2014 (06:00) | Uncategorized | By: seth

The "dictionary" definition of Data-Driven Quality (DDQ) is:

Application of data science techniques to a variety of data sources to assess and drive software quality.

But it is really about questions and answers, specifically using data to find those answers. Trying to derive insights from data without knowing what you are looking for can be a source for new discoveries, but more often will yield mirages instead of insights, such as the following image [Source:, used under CopyLeft license CC 4.0]

(if I only had such data in 1983 I could have wasted even more of my quarter-fueled youth)

So what questions then? These are the questions to ask about your software: image In this diagram "it" is your software (service or product). The questions are divided into the three layers identified by the categories on the left:

  • Business value: Does your software contribute to the bottom line and/or strategic goals?
  • User experience: Does your software delight customers and beat the competition?
  • System Health: Does your software work as designed?

Each category layer depends on the layer beneath it. Consider that it is difficult to build a good user experience on a slow error-riddled product. And ultimately it is the top layer, Business value, that we care about. This leads to the "trick question" I will sometimes ask software testers and SDETs: What is your job? To which I answer:

Your job is to enable the creation of software that delivers value to the business. This is the job of the tester. It also happens to be the job of the developer, program manager, designer, manager, etc.

(I explore this idea a bit more here if you are interested) If you think about it, you might want to add to the above statement that you create this business value through:

  • An experience that delights the users
  • The production of high-quality software

True. It also turns out that these are respectively User Experience and System Health, the layers we are dependent on to build Business Value. An interesting note about that word "quality". "High-quality software" above means system health – that it does not break. But as Brent Jensen likes to ask, which is higher quality software:

  • A. Perfect bug-less software that people do not use (or perhaps worse, they hate to use)
  • B. Quirky software with a few glitches making millions happy (and making happy $millions)

If you believe the answer is B, then DDQ will appeal to you with its "Q" for "quality" happily spanning the pyramid above and not just system health. DDQ is about a confluence of what has been called Business Intelligence (BI) and quality. They are not really different things.

Asking the right questions is an important start, but is only one piece of the DDQ puzzle. DDQ works in an environment of iterative improvement (same as Agile). The faster we can spin around these cycles, the faster we improve our software. This is the DDQ Model: image I will leave it as an exercise to the reader to map the above to the scientific method. (I may help you out and do this in a future blog post.)

Understanding our questions, the next step is to understand the data sources we can use to answer these questions. I will close by sharing a list of some of the types of data we can use below. You will note much of this data comes from production or near-production (think of private betas or internal dogfood). Production is a great environment to get data from as it is the most realistic environment for our software with real users doing real user things.

Business value User experience System health
Is it successful? Is it useable? valuable? Is it available? reliable? performant?
image image image


Adoption of a new feature,

New users, Unique users


Market share, Session duration, Repeat use


Purchase, conversion, ad revenue

Minus: COGS, support costs

Usage Data

Feature use, task completion rates

Feedback (2nd person)

User ratings, User reviews

Sentiment Analysis (3rd person)

From: Twitter, Forums

Infrastructure data

Memory, CPU, Network

Application Data



MTTR (Meant Time to Recovery)


Test Cases (run as part of pre-production test passes, or in production as monitors)



Engineering Metrics (pre-production)

Code coverage

Code churn

Delivery cadence

Not covered

As you can see in the DDQ model, there is plenty more to cover. Besides the other boxes in that model, here are some other things that were NOT covered in this blog post

  • How to determine specific scenarios to frame your questions for your software
  • How this fits into a comprehensive software development life cycle, and specifically the impact on BUFT (Big Up-Front Testing)
  • Impact on team roles and responsibilities. Who does what?
  • The future of the Tester/SDET role
  • What do you need to know about actual Data Science?
  • Tools
  • Dashboards, and actionability
  • Examples :-)

Further reading

If you want to learn more, I recommend the following:

  • My former Microsoft colleague Steve Rowe has a great series of posts on DDQ
  • Adding to the acronym soup, but definitely on target with DDQ, my often co-conspirator Ken Johnston explains EaaSy and MVQ.
  • Although I had not quite tightened up the questions into the neat pyramid model above, I do fill in some of the blanks left by this blog post in my Roadmap to DDQ talk.

Finally wish to acknowledge and thank Monica Catunda who collaborated with me on much of this material, and co-delivered the Create Your Roadmap to Data-Driven Quality 1-day workshop at STPCon Nov 2014 in Denver.

New blog, first post

18 November, 2014 (11:53) | Uncategorized | By: seth

This is my new blogging home.

I had previously used this space ( as a place to collect my various presentations and papers (and they are still here), but now I will also use this as my new blogging platform.

My old blog was called Your Software has Bugs, and indeed it I am sure your software still does, but for this go I am going to stick with the simpler Seth Eliot’s Blog.

While the old one was primarily about software, this one has a broader scope including software, data science, and whatever I think you might be interested in.

My old blog started with a self-indulgent exploration of me, which was immediately called out by a commenter as being overly self-indulgent and too much about me (looking back at that post, I recall I declined to publish the comment… a decision I now regret).   Now instead I would like to start with an exploration of the greatest hits” from my previous blog, with some added context.   Sort of a starting point to continue with the new blog.

Presently I am an advocate of (and enjoy helping engineers with) Data-Driven Quality (DDQ).   To get an idea of what DDQ is, peruse the deck or watch the talk Create Your Roadmap to Data-Driven Quality.   How we use data is central to how we produce quality software.   This is certainly not limited to big data, but large unstructured streams of data provide a compelling story – one that we an unlock with modern processing and tools.


So in conclusion, I do indeed like big data :-)


But before there was DDQ, there was TiP — Testing in Production, which I chose to introduce by showing how to do it WRONG:

Feeling TiPsy…Testing in Production Success and Horror Stories

…and also some on how to do it right, such as enabling teenagers to escape reality behind online avatars like this:


I was recently at a Seattle area QA meeting (QASig) where the topic of finding bugs and its place in quality assessment came up.   Years I before pondered that question asking sarcastically:

Hooray for Buggy Software???

This was an interesting re-read for me as I saw early signs of DDQ in this diagram from the blog post.

See any resemblance to this one from one of my DDQ talks?


To wrap up on TiP and DDQ I will share some fun I had at the expense of Big Blue:

Testing in Production (TiP), a common and costly technical malpractice???

When I presented this story in a talk, I actually got back a comment that it was unprofessional to make fun of IBM.   I certainly want to stay professional in my interactions, but I think IBM can take it.

Finally, before I close, I would like to share my most popular blog posts… most popular that is in China!   For reasons that I do not quite grasp, the Chinese audience really responded to my posts on the Microsoft change in logo Smile

The Four Colors of the New Microsoft Company Logo

The Four Colors of Microsoft, revisited

Microsoft Logo

That’s it for now.     Look for more timely (for sufficiently broad definitions of timely”) and compelling (well, I think it’s interesting) content soon (or not… I’ll try).