Argus: Resilience-Oriented Safety Assurance Framework for End-to-End ADSs

Abstract

End-to-end autonomous driving systems (ADSs), with their strong capabilities in environmental perception and generalizable driving decisions, are attracting growing attention from both academia and industry. However, once deployed on public roads, ADSs are inevitably exposed to diverse driving hazards that may compromise safety and degrade system performance. This raises a strong demand for resilience of ADSs, particularly the capability to continuously monitor driving hazards and adaptively respond to potential safety violations, which is crucial for maintaining robust driving behaviors in complex driving scenarios.

To bridge this gap, we propose a runtime resilience-oriented framework, Argus, to mitigate the driving hazards, thus preventing potential safety violations and improving the driving performance of an ADS. Argus continuously monitors the trajectories generated by the ADS for potential hazards and, whenever the EGO vehicle is deemed unsafe, seamlessly takes control through a hazard mitigator. We integrate Argus with three state-of-the-art end-to-end ADSs, i.e., TCP, UniAD and VAD. Our evaluation has demonstrated that Argus effectively and efficiently enhances the resilience of ADSs, improving the driving score of the ADS by up to 150.30% on average, and preventing up to 64.38% of the violations, with little additional time overhead.

The paper has been submitted to ASE 2025.

Prototype and Documents

MY ALT TEXT

Argus comprises three components, i.e., the Takeover Gate, the Hazard Monitor and the Hazard Mitigator. Every trajectory generated by the ADS passes through the Takeover Gate, a component responsible for dynamic control switching between the ADS and the Hazard Mitigator that is built upon the intelligent driver model. The Takeover Gate leverages three takeover buffers and one recovery buffer, all maintained by the Hazard Monitor, to determine whether a given trajectory is safe for execution by a takeover and recovery mechanism. If the EGO vehicle (i.e., the vehicle controlled by the ADS) is deemed unsafe, control is taken over by the Hazard Mitigator for hazard mitigation, and is only returned to the ADS once safety is re-established.

We have implemented a prototype of Argus with 6,709 lines of Python code. The prototype of Argus and documents are available on GitHub.

Target ADSs and BEVFormer Checkpoint

We choose three state-of-the-art models, which are the most representative end-to-end ADSs, as our targets. All the three ADSs are initially trained with the Bench2Drive training set. Moreover, we adopt a widely used approach, BEVFormer, which aggregates multiview camera inputs to construct the surrounding environment of the ADS.

The checkpoints of the ADS models and BEV perception models are available at Google Drive.

BEVFormer Privileged
mAP ↑ 0.6167 1.0000
mATE ↓ 0.3717 0.0000
mASE ↓ 0.0790 0.0000
mAOE ↓ 0.0437 0.0000
mAVE ↓ 0.8081 0.0000
NDS ↑ 0.6423 1.0000

Note: The used perception model BEVFormer is less effective than privileged perception, but Argus using real-perception can already achieve comparable results with Argus* using privileged perception. Please refer to the paper for details.

Effectiveness Evaluation (RQ1)

Bench2Drive (Bench2Drive220)

Bench2Drive is a CARLA benchmark proposed by the paper Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving. It consists of 220 very short (~150m) routes split across all towns with 1 safety critical scenario in each route. To give a rough estimate, it takes around ~4 days to evaluate one ADS on Bench2Drive with 4xRTX 3090. For more details on how to aggregate the results, see the Bench2Drive.

CARLA leaderboard 2.0 validation set (CVL20)

The CARLA leaderboard 2.0 validation routes is a set of 20 long (~12 km) routes in Town 13. While driving along the routes, the agent has to solve around 90 safety critical scenarios per route consisting of 21 different types (38 counting variations). Due to the length of the routes as well as the large number of scenarios per route, the scores on this benchmark are much lower than on Bench2Drive and other benchmarks.

Efficiency Evaluation (RQ2)

MY ALT TEXT

Argus can efficiently enhance the resilience of end-to-end ADSs and each component in Argus incurs little additional time overhead.

Accuracy Evaluation (RQ3)

MY ALT TEXT

Argus can produce appropriate takeover decisions in the face of driving hazards.

Ablation Study (RQ4)

MY ALT TEXT

Both the two steps contribute the hazard mitigation, and their combination effectively enhances the resilience of end-to-end ADSs, improving the driving performance.

Parameter Sensitivity Analysis (RQ5)

Argus maintains improved resilience across different parameter settings, though overly aggressive or conservative settings cause false positives or missed hazards.

Generalization Evaluation (RQ6)

MY ALT TEXT

Argus shows a strong generalization capability to modular ADSs, consistently outperforming both the original Apollo and Apollo-REDriver across different benchmarks.