Google infrastructure security design overview  |  Documentation  |  Google Cloud (2022)

This content was last updated in March 2022, and represents the status quo asof the time it was written. Google's security policies and systems may changegoing forward, as we continually improve protection for our customers.

Download pdf version

Introduction

This document provides an overview of how security is designed into Google'stechnical infrastructure. It is intended for security executives, securityarchitects, and auditors.

This document describes the following:

  • Google's global technical infrastructure, which is designed to providesecurity through the entire information processing lifecycle at Google.This infrastructure helps provide the following:
    • Secure deployment of services
    • Secure storage of data with end-user privacy safeguards
    • Secure communication between services
    • Secure and private communication with customers over the internet
    • Safe operation by Google engineers
  • How we use this infrastructure to build internet services, includingconsumer services such as Google Search, Gmail, andGoogle Photos, and enterprise services such as Google Workspace andGoogle Cloud.
  • Our investment in securing our infrastructure and operations. We havemany engineers who are dedicated to security and privacy across Google,including many who are recognized industry authorities.
  • The security products and services that are the result of innovationsthat we implemented internally to meet our security needs. For example,BeyondCorp is the direct result of our internal implementation of thezero-trust security model.
  • How the security of the infrastructure is designed in progressivelayers. These layers include the following:

    • Low-level infrastructure

    • Service deployment

    • Data storage

    • Internet communication

    • Operations

The remaining sections of this document describe the security layers.

Secure low-level infrastructure

This section describes how we secure the physical premises of our data centers,the hardware in our data centers, and the software stack running on thehardware.

Security of physical premises

We design and build our own data centers, which incorporate multiple layers ofphysical security. Access to these data centers is tightly controlled. We usemultiple physical security layers to protect our data center floors. We usebiometric identification, metal detection, cameras, vehicle barriers, andlaser-based intrusion detection systems. For more information, seeData center security.

We also host some servers in third-party data centers. In these data centers, weensure that there are Google-controlled physical security measures on top of thesecurity layers that are provided by the data center operator. For example, weoperate biometric identification systems, cameras, and metal detectors that areindependent from the security layers that the data center operator provides.

Hardware design and provenance

A Google data center consists of thousands of servers connected to a localnetwork. We design the server boards and the networking equipment. We vet thecomponent vendors that we work with and choose components with care. We workwith vendors to audit and validate the security properties that are provided bythe components. We also design custom chips, including a hardware security chip(calledTitan),that we deploy on servers, devices, and peripherals. These chips let us identifyand authenticate legitimate Google devices at the hardware level and serve ashardware roots of trust.

Secure boot stack and machine identity

Google servers use various technologies to ensure that they boot the correctsoftware stack. We use cryptographic signatures for low-level components likethe baseboard management controller (BMC), BIOS, bootloader, kernel, and baseoperating system image. These signatures can be validated during each boot orupdate cycle. The first integrity check for Google servers uses a hardwareroot of trust. The components are Google-controlled, built, and hardened withintegrity attestation. With each new generation of hardware, we strive tocontinually improve security. For example, depending on the generation of serverdesign, the boot chain's root of trust is in one of the following:

  • The Titan hardware chip
  • A lockable firmware chip
  • A microcontroller running our own security code

Each server in the data center has its own unique identity. This identity canbe tied to the hardware root of trust and the software with which the machineboots. This identity is used to authenticate API calls to and from low-levelmanagement services on the machine. This identity is also used for mutual serverauthentication and transport encryption. We developed theApplication Layer Transport Security (ALTS) system for securing remote procedure call (RPC) communications within ourinfrastructure. These machine identities can be centrally revoked to respond toa security incident. In addition, their certificates and keys are routinelyrotated, and old ones revoked.

We developed automated systems to do the following:

  • Ensure that servers run up-to-date versions of their software stacks(including security patches).
  • Detect and diagnose hardware and software problems.
  • Ensure the integrity of the machines and peripherals with verified bootand implicit attestation.
  • Ensure that only machines running the intended software and firmware canaccess credentials that allow them to communicate on the production network.
  • Remove or re-allocate machines from service when they're no longer needed.

Secure service deployment

Google services are the application binaries that our developers write and runon our infrastructure. Examples of Google services are Gmailservers, Spanner databases, Cloud Storage servers, YouTubevideo transcoders, and Compute Engine VMs running customer applications. Tohandle the required scale of the workload, thousands of machines might berunning binaries of the same service. A cluster orchestration service, calledBorg,controls the services that are running directly on the infrastructure.

The infrastructure does not assume any trust between the services that arerunning on the infrastructure. This trust model is referred to as a zero-trustsecurity model. A zero-trust security model means that no devices or users aretrusted by default, whether they are inside or outside of the network.

Because the infrastructure is designed to be multi-tenant, data from ourcustomers (consumers, businesses, and even our own data) is distributed acrossshared infrastructure. This infrastructure is composed of tens of thousands ofhomogeneous machines. The infrastructure does not segregate customer data onto asingle machine or set of machines, except in specific circumstances, such aswhen you are using Google Cloud to provision VMs onsole-tenant nodes for Compute Engine.

Google Cloud and Google Workspace support regulatory requirementsaround data residency. For more information about data residency andGoogle Cloud, seeImplement data residency and sovereignty requirements.For more information about data residency and Google Workspace, seeData regions: Choose a geographic location for your data.

Service identity, integrity, and isolation

To enable inter-service communication, applications use cryptographicauthentication and authorization. Authentication and authorization providestrong access control at an abstraction level and granularity thatadministrators and services can understand.

Services do not rely on internal network segmentation or firewalling as theprimary security mechanism. Ingress and egress filtering at various points inour network helps prevent IP spoofing. This approach also helps us to maximizeour network's performance and availability. For Google Cloud, you can addadditional security mechanisms such asVPC Service Controls andCloud Interconnect.

Each service that runs on the infrastructure has an associated service accountidentity. A service is provided with cryptographic credentials that it can useto prove its identity to other services when making or receiving RPCs. Theseidentities are used in security policies. The security policies ensure thatclients are communicating with the intended server, and that servers arelimiting the methods and data that particular clients can access.

We use various isolation and sandboxing techniques to help protect a servicefrom other services running on the same machine. These techniques include Linuxuser separation, language-based (such as theSandboxed API)and kernel-based sandboxes, application kernel for containers (such asgVisor),and hardware virtualization. In general, we use more layers of isolation forriskier workloads. Riskier workloads include user-supplied items that requireadditional processing. For example, riskier workloads include running complexfile converters on user-supplied data or running user-supplied code for productslike App Engine or Compute Engine.

For extra security, sensitive services, such as the cluster orchestrationservice and some key management services, run exclusively on dedicated machines.

In Google Cloud, to provide stronger cryptographic isolation for yourworkloads and to protect data in use, we supportConfidential Computing services for Compute Engine VMs and Google Kubernetes Engine (GKE) nodes.

Inter-service access management

The owner of a service can use access-management features provided by theinfrastructure to specify exactly which other services can communicate with theservice. For example, a service can restrict incoming RPCs solely to an allowedlist of other services. That service can be configured with the allowed list ofthe service identities, and the infrastructure automatically enforces thisaccess restriction. Enforcement includes audit logging, justifications, andunilateral access restriction (for engineer requests, for example).

Google engineers who need access to services are also issued individualidentities. Services can be configured to allow or deny their access based ontheir identities. All of these identities (machine, service, and employee) arein a global namespace that the infrastructure maintains.

To manage these identities, the infrastructure provides a workflow system thatincludes approval chains, logging, and notification. For example, the securitypolicy can enforce multi-party authorization. This system uses the two-personrule to ensure that an engineer acting alone cannot perform sensitive operationswithout first getting approval from another, authorized engineer. This systemallows secure access-management processes to scale to thousands of servicesrunning on the infrastructure.

The infrastructure also provides services with the canonical service for user,group, and membership management so that they can implement custom,fine-grained access control where necessary.

End-user identities are managed separately, as described inAccess management of end-user data in Google Workspace.

Encryption of inter-service communication

The infrastructure provides confidentiality and integrity for RPC data on thenetwork. All Google Cloud virtual networking traffic is encrypted. Allcommunication between infrastructure services is authenticated and most inter-service communication is encrypted, which adds an additional layer of securityto help protect communication even if the network is tapped or a network deviceis compromised. Exceptions to the encryption requirement for inter-servicecommunication are granted only for traffic that has low latency requirements,and that also doesn't leave a single networking fabric within the multiplelayers of physical security in our data center.

The infrastructure automatically and efficiently (with help of hardware offload)provides end-to-end encryption for the infrastructure RPC traffic that goes overthe network between data centers.

Access management of end-user data in Google Workspace

A typical Google Workspace service is written to do something for anend user. For example, an end user can store their email onGmail. The end user's interaction with an application likeGmail might span other services within the infrastructure. Forexample, Gmail might call a People API to access the enduser's address book.

TheEncryption of inter-service communication section describes how a service (such as Google Contacts) is designed to protectRPC requests from another service (such as Gmail).However, this level of access control is still a broad set of permissionsbecause Gmail is able to request the contacts of any user at anytime.

When Gmail makes an RPC request to Google Contacts on behalf ofan end user, the infrastructure lets Gmail present an end-userpermission ticket in the RPC request. This ticket proves thatGmail is making the RPC request on behalf of that particular enduser. The ticket enables Google Contacts to implement a safeguard so that itonly returns data for the end user named in the ticket.

The infrastructure provides a central user identity service that issues theseend-user permission tickets. The identity service verifies the end-user loginand then issues a user credential, such as a cookie or OAuth token, to theuser's device. Every subsequent request from the device to our infrastructuremust present that end-user credential.

When a service receives an end-user credential, the service passes thecredential to the identity service for verification. If the end-user credentialis verified, the identity service returns a short-lived end-user permissionticket that can be used for RPCs related to the user's request. In our example,the service that gets the end-user permission ticket is Gmail,which passes the ticket to Google Contacts. From that point on, for anycascading calls, the calling service can send the end-user permission ticket tothe callee as a part of the RPC.

The following diagram shows how Service A and Service B communicate. Theinfrastructure provides service identity, automatic mutual authentication,encrypted inter-service communication, and enforcement of the access policiesthat are defined by the service owner. Each service has a service configuration,which the service owner creates. For encrypted inter-service communication,automatic mutual authentication uses caller and callee identities. Communicationis only possible when an access rule configuration permits it.

Google infrastructure security design overview | Documentation | Google Cloud (2)

For information about access management in Google Cloud, seeIAM overview.

Secure data storage

This section describes how we implement security for data that is stored on theinfrastructure.

Encryption at rest

Google's infrastructure provides various storage services and distributed filesystems (for example, Spanner andColossus),and a central key management service. Applications at Google access physicalstorage by using storage infrastructure. We use several layers of encryption toprotect data at rest. By default, the storage infrastructure encrypts all userdata before the user data is written to physical storage.

The infrastructure performs encryption at the application or storageinfrastructure layer. Encryption lets the infrastructure isolate itself frompotential threats at the lower levels of storage, such as malicious diskfirmware. Where applicable, we also enable hardware encryption support in ourhard drives and SSDs, and we meticulously track each drive through itslifecycle. Before a decommissioned, encrypted storage device can physicallyleave our custody, the device is cleaned by using a multi-step process thatincludes two independent verifications. Devices that do not pass this cleaningprocess are physically destroyed (that is, shredded) on-premises.

In addition to the encryption done by the infrastructure, Google Cloud andGoogle Workspace provide key management services. ForGoogle Cloud,Cloud KMS is a cloud service that lets customers manage cryptographic keys. ForGoogle Workspace, you can use client-side encryption. For moreinformation, seeClient-side encryption and strengthened collaboration in Google Workspace.

Deletion of data

Deletion of data typically starts with marking specific data as scheduled fordeletion rather than actually deleting the data. This approach lets us recoverfrom unintentional deletions, whether they are customer-initiated, are due to abug, or are the result of an internal process error. After data is marked asscheduled for deletion, it is deleted in accordance with service-specificpolicies.

When an end user deletes their account, the infrastructure notifies the servicesthat are handling the end-user data that the account has been deleted. Theservices can then schedule the data that is associated with the deleted end-useraccount for deletion. This feature enables an end user to control their owndata.

For more information, seeData deletion on Google Cloud.

Secure internet communication

This section describes how we secure communication between the internet and theservices that run on Google infrastructure.

As discussed inHardware design and provenance,the infrastructure consists of many physical machines that are interconnectedover the LAN and WAN. The security of inter-service communication is notdependent on the security of the network. However, we isolate our infrastructurefrom the internet into a private IP address space. We only expose a subset ofthe machines directly to external internet traffic so that we can implementadditional protections such as defenses against denial of service (DoS)attacks.

Google Front End service

When a service must make itself available on the internet, it can registeritself with an infrastructure service called the Google Front End (GFE). The GFEensures that all TLS connections are terminated with correct certificates and byfollowing best practices such as supporting perfect forward secrecy. The GFEalso applies protections against DoS attacks. The GFE then forwards requests forthe service by using the RPC security protocol discussed inAccess management of end-user data in Google Workspace.

In effect, any internal service that must publish itself externally uses the GFEas a smart reverse-proxy frontend. The GFE provides public IP address hosting ofits public DNS name, DoS protection, and TLS termination. GFEs run on theinfrastructure like any other service and can scale to match incoming requestvolumes.

Customer VMs on Google Cloud do not register with GFE. Instead, theyregister with the Cloud Front End, which is a special configuration of GFE thatuses the Compute Engine networking stack. Cloud Front End lets customerVMs access a Google service directly using their public or private IP address.(Private IP addresses are only available whenPrivate Google Access is enabled.)

DoS protection

The scale of our infrastructure enables it to absorb many DoS attacks. Tofurther reduce the risk of DoS impact on services, we have multi-tier,multi-layer DoS protections.

When our fiber-optic backbone delivers an external connection to one of our datacenters, the connection passes through several layers of hardware and softwareload balancers. These load balancers report information about incoming trafficto a central DoS service running on the infrastructure. When the central DoSservice detects a DoS attack, the service can configure the load balancers todrop or throttle traffic associated with the attack.

The GFE instances also report information about the requests that they arereceiving to the central DoS service, including application-layer informationthat the load balancers don't have access to. The central DoS service can thenconfigure the GFE instances to drop or throttle attack traffic.

User authentication

After DoS protection, the next layer of defense for secure communication comesfrom the central identity service. End users interact with this service throughthe Google login page. The service asks for a username and password, and it canalso challenge users for additional information based on risk factors. Examplerisk factors include whether the users have logged in from the same device orfrom a similar location in the past. After authenticating the user, the identityservice issues credentials such as cookies and OAuth tokens that can be used forsubsequent calls.

When users sign in, they can use second factors such as OTPs orphishing-resistant security keys such as theTitan Security Key.The Titan Security Key is a physical token that supports theFIDO Universal 2nd Factor (U2F).We helped develop the U2F open standard with the FIDO Alliance. Most webplatforms and browsers have adopted this open authentication standard.

Operational security

This section describes how we develop infrastructure software, protect ouremployees' machines and credentials, and defend against threats to theinfrastructure from both insiders and external actors.

Safe software development

Besidesthe source control protections and two-party review process described earlier, we use libraries that prevent developers from introducingcertain classes of security bugs. For example, we have libraries and frameworksthat help eliminate XSS vulnerabilities in web apps. We also use automated toolssuch as fuzzers, static analysis tools, and web security scanners toautomatically detect security bugs.

As a final check, we use manual security reviews that range from quick triagesfor less risky features to in-depth design and implementation reviews for themost risky features. The team that conducts these reviews includes expertsacross web security, cryptography, and operating system security. The reviewscan lead to the development of new security library features and new fuzzersthat we can use for future products.

In addition, we run aVulnerability Rewards Program that rewards anyone who discovers and informs us of bugs in our infrastructureor applications. For more information about this program, including the rewardsthat we've given, seeBug hunters key stats.

We also invest in finding zero-day exploits and other security issues in theopen source software that we use. We runProject Zero,which is a team of Google researchers who are dedicated to researching zero-dayvulnerabilities, includingSpectre and Meltdown.In addition, we are the largest submitter of CVEs and security bug fixes for theLinux KVM hypervisor.

Source code protections

Our source code is stored in repositories with built-in source integrity andgovernance, where both current and past versions of the service can be audited.The infrastructure requires that a service's binaries be built from specificsource code, after it is reviewed, checked in, and tested.Binary Authorization for Borg (BAB) is an internal enforcement check that happens when a service is deployed. BABdoes the following:

  • Ensures that the production software and configuration that is deployedat Google is reviewed and authorized, particularly when that code canaccess user data.
  • Ensures that code and configuration deployments meet certain minimumstandards.
  • Limits the ability of an insider or adversary to make maliciousmodifications to source code and also provides a forensic trail from aservice back to its source.

Keeping employee devices and credentials safe

We implement safeguards to help protect our employees' devices and credentialsfrom compromise. To help protect our employees against sophisticated phishingattempts, we have replaced OTP second-factor authentication with the mandatoryuse of U2F-compatible security keys.

We monitor the client devices that our employees use to operate ourinfrastructure. We ensure that the operating system images for these devices areup to date with security patches and we control the applications that employeescan install on their devices. We also have systems that scan user-installedapplications, downloads, browser extensions, and web browser content todetermine whether they are suitable for corporate devices.

Being connected to the corporate LAN is not our primary mechanism for grantingaccess privileges. Instead, we use zero-trust security to help protect employeeaccess to our resources. Access-management controls at the application levelexpose internal applications to employees only when employees use a manageddevice and are connecting from expected networks and geographic locations. Aclient device is trusted based on a certificate that's issued to the individualmachine, and based on assertions about its configuration (such as up-to-datesoftware). For more information, seeBeyondCorp.

Reducing insider risk

We limit and actively monitor the activities of employees who have been grantedadministrative access to the infrastructure. We continually work to eliminatethe need for privileged access for particular tasks by using automation that canaccomplish the same tasks in a safe and controlled way. For example, we requiretwo-party approvals for some actions and we use limited APIs that allowdebugging without exposing sensitive information.

Google employee access to end-user information can be logged through low-levelinfrastructure hooks. Our security team monitors access patterns andinvestigates unusual events.

Threat monitoring

TheThreat Analysis Groupat Google monitors threat actors and the evolution of their tactics andtechniques. The goals of this group are to help improve the safety and securityof Google products and share this intelligence for the benefit of the onlinecommunity.

For Google Cloud, you can useGoogle Cloud Threat Intelligence for Chronicle andVirusTotal to monitor and respond to many types of malware. Google Cloud ThreatIntelligence for Chronicle is a team of threat researchers who developthreat intelligence for use withChronicle.VirusTotal is a malware database and visualization solution that you can use tobetter understand how malware operates within your enterprise.

For more information about our threat monitoring activities, see theThreat Horizons report.

Intrusion detection

We use sophisticated data processing pipelines to integrate host-based signalson individual devices, network-based signals from various monitoring points inthe infrastructure, and signals from infrastructure services. Rules and machineintelligence built on top of these pipelines give operational security engineerswarnings of possible incidents.Our investigation and incident-response teams triage, investigate, and respond to these potential incidents 24 hours a day,365 days a year. We conductRed Team exercises to measure and improve the effectiveness of our detection and responsemechanisms.

What's next

You might also like

Latest Posts

Article information

Author: Tyson Zemlak

Last Updated: 09/16/2022

Views: 5948

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.