Designing Dependable Cloud Services

Today another Data Center Knowledge article posted by my colleague David Bills, chief reliability strategist, covering guiding design principles for cloud services. In the article, he explains the cultural shift and evolving engineering principles Microsoft employs to help improve the dependability of services.

David says service providers need to identify as many potential failure conditions as possible in advance and account for those during the service design phase. During this phase, design teams can also consider new dynamics such as technological advances that test performance limits, the interplay of applications, and broader industry trends.  This careful planning helps us decide exactly how the service is supposed to react if and when the unexpected occurs.  The goal is for services to be able to recover from these failure conditions with minimal to zero interruptions.

David suggests that cloud services teams employ failure mode and effects analysis to help build redundancy into cloud services. This type of analysis indicates that efforts to simplify physical infrastructure and utilize software to build resiliency into cloud services. I recommend reading David’s article and his prior Data Center Knowledge article. Both articles draw upon David’s experiences with our cloud-based infrastructure supporting more than 200 services, 1 billion customers, and 20 million businesses in more than 76 markets worldwide.

About the Author
Adrienne Hall

General Manager, Issues & Crisis Management

Adrienne Hall is a General Manager in the Microsoft Trustworthy Computing group, where she leads a team of information technology (IT) professionals who are focused on the security, privacy, reliability, and accessibility of devices and services built on Microsoft technology. Read more »