The Multicore Challenge
Timing interference in MultiProcessor systems
Reference multicore (also referred to as MultiProcessor System on Chip MPSoC) platforms in embedded critical domains already incorporate complex, high-performance, and Commercial Off-The-Shelf (COTS) hardware components including decentralized and distributed interconnects, deep shared cache hierarchies, DMAs, GPUs and other specialized, vendor-specific accelerators.
Shared hardware resources are inherent means of propagation for multicore interference. For this reason, from the timing analysis perspective, they are often a source of interference channels.
While multicores enable multiple requests to be sent in parallel to shared hardware resources (caches, interconnects, etc.), the latter only support limited parallelism.
Applications are forcibly delayed and wait for the resource to be available with potentially inordinate impact on average and worst-case performances.
Current MPSoCs are accompanied by an increasingly large amount of technical information in the form of reference manuals, user manuals, and application notes. Nonetheless, the critical information for timing characterization is either not fully undisclosed (to protect IP) or scattered across several, unclear, subject to errata, and sometimes inconsistent long documents.
The amount of information to be processed for a comprehensive hardware analysis is simply overwhelming. As an example, to analyze the A53 cluster in the Xilinx UltraScale EG+ SoC, a team of experts needs to digest at least the following technical notes:
- UG1085 (1,178 pages) Device Technical Reference Manual
- DDI0500J (623 pages) Arm® Cortex® A53 Processor Technical Reference Manual
- DDI0487Fc (8,248 pages) Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile
These documents total more than 10,000 pages. And this is just for starters, if you extend your analysis to the interconnect or other computing units, the total size of the documents to analyze sky rockets.
Example of top level Multicore Contention Analysis
The example below shows the slowdown a reference task suffers as it runs in the NVIDIA’s Jetson AGX Xavier CPU complex against a set of micro-benchmarks generating traffic in the L1, L2, L3 caches and memory. The chart also captures how contention varies as the number of contenders increases.
Do not trust hardware event monitors (HEMs)
Several profiling timing analysis approaches build on the evidence gathered through Hardware Events Monitors (HEM) to confirm or disprove assumptions on the software timing behavior or, in general, to analyze the system behavior in relation to hardware utilization.
Unfortunately, building on HEMs exposes to a set of potential flaws:
- HEMs are not necessarily validated by hardware vendors and, in practice, rarely are.
- HEMs typically come with one-liner descriptions in reference manuals, which makes them prone to misinterpretation.
- Errata documents often reveal errors in their description and/or accuracy.
As an example, the L2D_CACHE (L2 data cache accesses) HEM in Arm Cortex A53 MPCORE counts each cache miss twice, thus including L1 cache refill requests and L2 refills from the interconnect (CCI). Such concern is relevant for Xilinx Zynq UltraScale+ and NXP LayerScape processor families. Similarly, DMOV instructions in some NVIDIA GPUs are counted as miscellaneous instructions (MISC) in full contrast with the official documentation. Analogous challenges are found in almost all high-performance multicore processor families that are considered for safety-critical systems.
Hardware-centric Multicore Analysis Framework
To cope with the challenges imposed by multicore timing analysis, Maspatechnologies has developed and consolidated a hardware-centric multicore analysis framework for the identification and bounding of timing interference. The overarching objective of our approach is to guide in the understanding and mastering the impact of multicore execution on the system timing behavior with particular focus on meeting the requirements for CAST-32A and ISO 26262 compliance.
The proposed framework is:
Hardware-centric: heavily builds on hardware expertise injected at key stages of the overall software development process, from planning to verification and validation activities.
Flexible: covers a wide range of timing requirements, from early exploration to full analysis, from high-level platform characterization to application profiling and interference bounding.
Our technologies and services have been successfully implemented and used to meet our customers requirements in the aerospace and automotive domains.
Analysis and Consultancy Services
Maspatechnologies offers long-term consolidated hardware expertise to cover your analysis needs. A team of hardware and software experts is responsible for analyzing your specific target configuration and producing hardware analysis reports covering:
Maspatechnologies analysis framework builds on a set of technologies and tools that cooperate in the realization of a comprehensive approach for the verification of multicore systems across different phases in the software development process:
– Task Contention Model (TCM)
– Surrogate Applications (SurApps)
Continuous Support from requirement identification to analysis results
Maspatechnologies analysis framework captures the diverse verification requirements stemming from different steps in the multicore software verification: from high-level qualitative hardware characterization, to extensive quantitative analysis of specialized hardware components, to requirement based timing verification not limited to exercising the sources of multicore timing interference.
Requirement Analysis: building on long-standing hardware expertise, Maspatechnologies helps in the identification of timing verification requirements for a specific platform and configuration, at different granularity levels.
Interference Analysis: hardware expertise is also leveraged for the identification and classification of the sources of multicore timing interference on the target configuration. Analysis may touch different levels of detail.
Test Design: based on the outcomes of the interference channels analysis, a set of tests are carefully designed to use selected micro-benchmarks to prove or disprove a specific verification hypothesis.
Test Execution: Maspatechnologies supports the end user in the deployment of micro-benchmarks on the target platform against the application under analysis by defining batches of test procedures, which directly emanate from the test designs.
Evidence Gathering: Test procedures are executed on the target platform and evidence is gathered in the form of a set of relevant HEMs (thus not limited to timing).
Assessment: Maspatechnologies experts support end users in the analysis of the test results and use the collected evidence to evaluate the verification requirement.
Results: The whole process and the experimental results are eventually formalized in a concrete set of artifacts, which can be leveraged to support end users’ certification arguments.
Specialized micro-benchmarks for multicore timing analysis
Maspatechnologies owns an extensive catalogue of micro-benchmarks that have been specifically designed and developed to support a wide range of multicore timing analysis objectives and verification requirements.
Each micro-benchmark is tailored to capture the verification requirements of the specific hardware and software configuration of interest. Maspatechnologies builds on an outstanding hardware expertise to perform a thoughtful analysis of the target platform, with focus on critical sources of multicore timing interference.
Benchmarks undergo a careful verification campaign to gather the necessary evidence on their expected behavior. In turn, micro-benchmark verificaiton builds on low-level hardware event monitors (HEMs), which also are necessarily validated.
By deploying micro-benchmarks together and against the target applications, empirical evidence is collected so that it is used to reason on different aspects of multicore execution including:
- Characterization of interference channels
- Confirmation of architectural features
- Robustness testing
- Other user-defined requirements