CrowdStrike’s Falcon Platform Incident: Root Cause Analysis and Mitigations

CrowdStrike’s Falcon Platform Incident Banner
August 8, 2024 | Cybersecurity
By Ashwani Mishra, Editor-Technology, 63SATS

CrowdStrike has released its Root Cause Analysis (RCA) Report, shedding light on the factors behind the global outages on July 19, 2024. The report identifies a mix of security flaws and procedural shortcomings as the primary causes of the system crash.

The report provides an in-depth analysis of the incident involving Channel File 291, expanding on the preliminary Post Incident Review. It details the findings, mitigations, technical specifics, and root cause analysis of the incident that affected CrowdStrike’s Falcon platform.

Incident Overview

The CrowdStrike Falcon sensor, which uses AI and machine learning to protect systems, encountered an issue with the release of sensor version 7.11 in February 2024. This version introduced a new Template Type for detecting novel attack techniques involving Windows interprocess communication (IPC) mechanisms. The problem arose when the new IPC Template Instances required 21 input parameters, but the integration code provided only 20, leading to a system crash.

Root Cause Analysis

The root cause of the incident was a mismatch between the number of input parameters expected by the Content Interpreter and those provided by the integration code. This mismatch led to an out-of-bounds memory read, causing system crashes. The issue was not detected during multiple layers of testing due to the use of wildcard matching criteria for the 21st input during development and initial deployments.

Findings and Mitigations

The report outlines several key findings and corresponding mitigations:

Validation of Input Fields: The number of fields in the IPC Template Type was not validated at sensor compile time. A patch was developed to validate the number of inputs provided by a Template Type.

Runtime Array Bounds Check: A missing runtime array bounds check for Content Interpreter input fields led to out-of-bounds reads. Bounds checking was added to prevent this issue.

Template Type Testing: Testing did not cover a wide variety of matching criteria. Automated tests with non-wildcard matching criteria were introduced.

Content Validator Logic Error: The Content Validator contained a logic error, assuming 21 inputs would be provided. Additional checks were added to prevent this.

Template Instance Validation: Stress testing did not reveal the mismatch issue. New test procedures were implemented to ensure thorough validation.

Staged Deployment: Template Instances should have staged deployment to mitigate impact. The deployment process was updated to include additional layers and acceptance checks.

Technical Details

The report delves into the technical aspects of the incident, explaining the components involved in processing regex-based Rapid Response Content on the sensor. It describes the role of the Content Interpreter, Template Types, Template Type Definitions file, Sensor Content, Template Instances, and the Content Configuration System. The report also includes a detailed crash dump analysis, illustrating how the new Template Instances led to a system crash.

Independent Third-Party Review

CrowdStrike engaged two independent third-party software security vendors to review the Falcon sensor code and the end-to-end quality process. These reviews aim to ensure the security and quality of the Falcon platform and prevent similar incidents in the future.