[PaRev] INFA-Guard: Infection-Aware MAS Safeguarding
Infection-aware defense for malicious propagation in LLM multi-agent systems.
On May 29, 2026, I reviewed INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems.
[ arXiv 2026 ] INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems
Yijin Zhou, Xiaoya Lu, Dongrui Liu, Junchi Yan, Jing Shao
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, Shanghai Innovation Institute
This paper studies a security problem that becomes visible only when LLM agents are connected as a system. In a single-agent setting, a malicious prompt, poisoned memory, or unsafe tool call can corrupt one model’s response. In a multi-agent system, the corrupted response can become an input to other agents and spread through the communication graph.
INFA-Guard starts from the following observation:
A defense should not only identify the original attack agents. It should also identify infected agents that were originally benign but have been converted into harmful propagators.
The paper moves from a binary view:
1
benign agent vs. attack agent
to an infection-aware view:
1
benign agent vs. infected agent vs. attack agent
All figures in this post are converted from the original arXiv source figure files. No synthetic figures or full-slide screenshots are used.
Additional resources: arXiv:2601.14667, paper PDF, and INFA-Guard code.
1. Motivation: Malicious Propagation in MAS
An LLM-based multi-agent system can be modeled as a graph. Each node is an agent, and each edge represents a communication channel. In the paper’s formulation, each agent is a bundle of components:
| Component | Role |
|---|---|
| Base model | The underlying LLM |
| Role | The persona or task responsibility of the agent |
| Memory | Stored context or retrieval state |
| Plugin / tool | External capability such as search, file parsing, or API access |
This structure lets agents specialize and exchange intermediate reasoning. The same structure also creates a propagation surface. The paper considers three attack families:
| Attack | Targeted component |
|---|---|
| Prompt Injection (PI) | System prompt or user input |
| Memory Attack (MA) | Agent memory |
| Tool Attack (TA) | External plugin or tool behavior |
The risk is not limited to one compromised agent. The compromised agent can persuade or contaminate neighboring agents through ordinary MAS communication.
Figure 1. The paper’s comparison between existing binary safeguards and the infection-aware view.
Existing MAS safeguards often focus on the initiating attacker. That works only if malicious behavior remains localized. If an attacker has already converted nearby benign agents, removing only the original attacker is too late. The infected agents may continue to send incorrect or harmful messages.
INFA-Guard targets the propagation process, not only the root source.
2. What Is an Infected Agent?
The paper defines an infected agent as an agent that is not directly controlled by the attacker, but whose behavior changes from safe to compromised after interaction.
Formally, for a communication round $k$, the infected agent set is:
\[\mathcal{I}_k = \mathcal{V} \cap \mathcal{V}_{\text{atk}}^C \cap \{v_i : \mathcal{J}(\mathbf{R}_i^{(0)}) = 1,\ \mathcal{J}(\mathbf{R}_i^{(k)}) = 0\}\]where $\mathcal{J}(\cdot)$ is a judging function:
| Output | Meaning |
|---|---|
| $\mathcal{J}(\mathbf{R}) = 1$ | The attack did not succeed |
| $\mathcal{J}(\mathbf{R}) = 0$ | The attack succeeded |
So an infected agent satisfies three conditions:
- It belongs to the MAS agent set $\mathcal{V}$.
- It is not in the original attack set $\mathcal{V}_{\text{atk}}$.
- Its response changes from non-compromised at round 0 to compromised at round $k$.
This definition separates two security roles.
| Type | Meaning |
|---|---|
| Attack agent | The original malicious source |
| Infected agent | A previously benign agent converted by malicious influence |
This distinction matters because infected agents are secondary propagators. They may not be the origin of the attack, but they can keep the attack alive.
3. Why Infected Agents Matter
The paper tests whether infected agents act as propagators. It compares three defense settings:
| Setting | Defense behavior |
|---|---|
| No defense | No malicious agents are guarded |
| Attack-only defense | Only original attack agents are guarded |
| Attack + infected defense | Both attack and infected agents are guarded |
Figure 2. Infected agents remain dangerous even when the original attack agent is handled.
Attack-only defense is not enough. Compared with guarding both attack and infected agents, ASR@3 still increases by:
| Attack type | ASR@3 increase when infected agents remain |
|---|---|
| Memory Attack (PoisonRAG) | 11% |
| Tool Attack (InjecAgent) | 30% |
The paper also observes that when only attack agents are defended, ASR can still rise across iterations:
| Attack type | Increase from iteration 1 to iteration 3 |
|---|---|
| Tool Attack | 5% |
| Memory Attack | 7% |
The result supports the paper’s central assumption:
Infection is dynamic. A defense that misses infected agents can leave behind active propagation paths.
The second claim is topological. An infected agent should not appear randomly in the graph. Because infection occurs through communication, an infected agent should lie on or near a path from an attack source. The detector can then use the graph structure:
If an agent looks infected, its neighbors become more suspicious. If an attacker is found, nearby infection predictions become more plausible.
This “guilt by association” idea is later used both in training and in post-adaptation.
4. INFA-Guard Overview
INFA-Guard has three major stages:
- Infection-aware detection. Build a multi-agent utterance graph and detect attack and infected agents.
- Post-adaptation. Refine predictions using temporal trends and topology constraints.
- Remediation. Replace attack agents and rewrite infected responses while preserving the MAS topology.
Figure 3. Overview of INFA-Guard.
The structure can be read as a response to three questions.
| Question | INFA-Guard component |
|---|---|
| How do we observe infection over time? | Temporal features in the utterance graph |
| How do we detect attack and infected agents? | Turn-specific GNN detector with dual heads |
| How do we avoid breaking the MAS graph? | Replacement and response-level remediation |
Rather than stopping at classification, the pipeline detects, adapts, and remediates around propagation dynamics.
5. Infection-Aware Detection
INFA-Guard follows G-Safeguard in treating detection as a graph anomaly detection problem. At dialogue round $k$, it constructs a multi-agent utterance graph:
\[\mathcal{M}^{(k)} = (\mathbf{X}^{(k)}, \mathbf{E}^{(k)})\]where:
| Symbol | Meaning |
|---|---|
| $\mathbf{X}^{(k)}$ | Time-series embeddings of each agent’s self-replies |
| $\mathbf{E}^{(k)}$ | Time-series embeddings of message exchanges along graph edges |
| $\mathcal{G}^{(k)}$ | MAS graph after previous remediation |
The paper uses text embedding models such as MiniLM to embed utterances. Then it constructs temporal features that try to capture infection dynamics.
For each agent, the temporal feature function is:
\[\Psi(\mathbf{X}_{i}^{(k)}) = \begin{cases} [\mathbf{X}_{i,k}^{(k)},\ \mathbf{X}_{i,k,\text{res}}^{(k)},\ \texttt{AVG}(\mathbf{X}_i^{(k)})], & k \geq 2 \\ [\mathbf{X}_{i,k}^{(k)},\ \mathbf{X}_{i,k}^{(k)},\ \mathbf{X}_{i,k}^{(k)}], & k = 1 \end{cases}\]This encodes three views:
| Feature | Interpretation |
|---|---|
| Current state | What the agent is saying now |
| Residual change | How the agent changed from the previous turn |
| Average state | The accumulated dialogue tendency |
This part matches the intuition from the presentation:
To detect infection, the model should see not only the current response, but also the moment-to-moment change and accumulated state.
Infection is a transition. An infected agent is defined by a behavioral shift, not only by a static response.
6. Turn-Specific GNN Branches
After temporal embedding, INFA-Guard uses a GNN detector. One GNN layer can be read as:
\[\mathbf{h}_i^{(k,l)} = \texttt{COMB} \left( \mathbf{h}_i^{(k,l-1)}, \texttt{AGGR} \{ \psi(\mathbf{h}_j^{(k,l-1)}, \mathbf{e}_{ij}^{(k)}) : v_j \in N(v_i) \} \right)\]The GNN layer can be read as:
Agent $i$ updates its representation by combining its previous representation with aggregated information from neighboring agents and the messages exchanged with them.
The presentation notes emphasized this interpretation. The GNN does not only look at an agent in isolation. It also models the agent’s neighborhood and the message flow that may have caused infection.
INFA-Guard then adds turn-specific parallel GNN branches. The motivation is that infection behaves differently across rounds:
| Stage | Detector branch intuition |
|---|---|
| Early infection turn | Shallow GNN branch |
| Middle infection turn | Medium-depth GNN branch |
| Later infection turn | Deeper GNN branch |
The paper uses a selected branch:
\[b^* = \texttt{SelectBranch}(k)\]and routes the graph through the branch for that dialogue stage. In the slides, this was summarized as:
1
2
3
1-layer GNN for early infection
2-layer GNN for middle infection
3-layer GNN for later infection
The output of the shared layer proceeds exclusively through the selected branch $b^*$:
\[\mathbf{h}_i^{(k, l)} = \mathsf{Branch}_{b^*} \left(\mathbf{h}_i^{(k, l-1)}, (\mathbf{h}_j^{(k, l-1)}, \mathbf{e}_{ij}^{(k)}) : v_j \in N(v_i) \right)\]The branch depth is capped for cost and noise reasons. After several layers, GNNs can add noise and over-smooth node representations.
Finally, the detector uses two MLP heads:
\[\hat{\mathbf{y}}_i = [ f_{b^*,\text{atk}}(\mathbf{h}_i^{(k,L)}) \parallel f_{b^*,\text{inf}}(\mathbf{h}_i^{(k,L)}) ] = [ \mathbf{P}_{i,\text{atk}} \parallel \mathbf{P}_{i,\text{inf}} ]\]The output gives separate probabilities for:
| Head | Prediction |
|---|---|
| Attack head | Probability that the agent is an original attacker |
| Infection head | Probability that the agent is infected |
The dual-head design keeps attackers and infected agents from being collapsed into one malicious class.
7. Topology Constraints and Post-Adaptation
INFA-Guard uses topology constraints in two places:
- During detector training through a topology loss.
- During defense through post-adaptation.
The training loss is:
\[\mathcal{L} = \mathcal{L}_{\text{atk}} + \mathcal{L}_{\text{inf}} + \gamma \mathcal{L}_{\text{topo}}\]The topology loss penalizes isolated infection predictions. If agent $i$ has high infection probability but none of its neighbors look like attackers or infected agents, that prediction is treated as suspicious.
Post-adaptation then refines the detected attack and infected sets. The paper describes three steps.
| Step | Role |
|---|---|
| Temporal trend analysis | Smooth infection probability and measure whether it is rising |
| Infected set refinement | Keep, remove, or reinterpret infected predictions based on neighborhood evidence |
| Potential risk discovery | Monitor benign-labeled agents whose infection probability is increasing |
Temporal trend analysis applies an exponential moving average to infection probability:
\[\bar{\mathbf{P}}_i^{(t)} = \alpha \cdot \mathbf{P}_{i,\text{inf}}^{(t)} + (1-\alpha) \cdot \bar{\mathbf{P}}_i^{(t-1)}\]Then the trend is:
\[\delta_i^{(t)} = \bar{\mathbf{P}}_i^{(t)} - \bar{\mathbf{P}}_i^{(t-1)}\]This creates three post-adaptation cases:
| Case | Post-adaptation behavior |
|---|---|
| Agent is near detected attack/infected agents | Keep the infection prediction |
| Agent is isolated and trend is weak | Treat it as a likely false positive |
| Agent is isolated but trend is strong | Infer that a nearby source may have been missed |
This is where the infection-aware assumption enters the defense phase. The detector logits are refined before remediation instead of being used directly.
8. Remediation
The remediation strategy is different for attack agents and infected agents.
For predicted attack agents, INFA-Guard replaces the malicious agent with a benign one. The replacement copies the benign agent’s base model, role, memory, and plugin configuration. The goal is to stop the attack source while keeping the MAS graph structurally usable.
For predicted infected agents, INFA-Guard performs reply-level remediation:
\[\mathbf{R}_i^{(k)} = \texttt{RF}(\mathbf{R}_i^{(k)}) = \begin{cases} \mathbf{R}_{\texttt{RP}(\mathcal{V}^{(k)})_i}^{(k)}, & v_i \in \hat{\mathcal{V}}_{\text{atk}} \\ \texttt{LM}(\mathbf{R}_i^{(k)}), & v_i \in \hat{\mathcal{I}}^{(k)} \\ \mathbf{R}_i^{(k)}, & \text{otherwise} \end{cases}\]The $\texttt{LM}(\cdot)$ step uses an LLM to inspect and rewrite the infected response so that malicious context is removed. In the presentation, I summarized this as:
Attack agents are replaced. Infected agents are corrected.
This is less destructive than deleting nodes from the graph. Deleting many nodes can damage the communication topology and reduce response diversity. INFA-Guard tries to preserve the system while removing malicious propagation.
9. Experiments
The experiments evaluate three attack families:
| Attack | Dataset / setup |
|---|---|
| Prompt Injection | CSQA, MMLU, GSM8K |
| Tool Attack | InjecAgent |
| Memory Attack | PoisonRAG |
The paper tests multiple MAS topologies:
| Topology | Meaning |
|---|---|
| Chain | Agents communicate in a sequence |
| Tree | Hierarchical communication |
| Star | Centralized hub-like communication |
| Random | Random communication graph |
It also uses two backbone LLMs:
| Backbone | Type |
|---|---|
| GPT-4o-mini | Closed-source |
| Qwen3-235B-A22B | Open-source |
The main metrics are:
| Metric | Meaning |
|---|---|
| ASR | Agent-level Attack Success Rate |
| MDSR | Task-level MAS Defense Success Rate |
Lower ASR is better. Higher MDSR is better.
The paper reports that INFA-Guard reduces ASR by an average of 33%. In the main comparison, it outperforms baselines by an average of 4.5% and up to 12.9% across attack scenarios.
Figure 4. Task-level MAS performance across dialogue iterations.
INFA-Guard remains effective after multiple dialogue iterations. That matches the threat model: the attack propagates through rounds of communication.
The paper also reports strong results in the random topology table. For example, on GPT-4o-mini:
| Attack | INFA-Guard ASR@3 | INFA-Guard MDSR@3 |
|---|---|---|
| PI (CSQA) | 23.3 | 76.7 |
| PI (MMLU) | 15.0 | 85.0 |
| PI (GSM8K) | 6.7 | 93.3 |
| TA (InjecAgent) | 2.1 | 98.3 |
| MA (PoisonRAG) | 6.1 | 96.7 |
On Qwen3-235B-A22B:
| Attack | INFA-Guard ASR@3 | INFA-Guard MDSR@3 |
|---|---|---|
| PI (CSQA) | 13.4 | 86.7 |
| PI (MMLU) | 20.0 | 81.7 |
| PI (GSM8K) | 3.3 | 96.7 |
| TA (InjecAgent) | 3.7 | 98.3 |
| MA (PoisonRAG) | 8.7 | 91.7 |
The TA and MA results are especially strong, which is relevant because tools and memory are natural propagation channels for agent systems.
10. Topology Generalization, Cost, and Ablation
The paper also evaluates performance across chain, tree, and star topologies.
Figure 5. Mean and standard deviation of ASR@3 and MDSR@3 across chain, tree, and star topologies.
The authors argue that INFA-Guard does not overfit to one communication pattern. Across the tested topologies, it achieves lower average ASR@3 and higher average MDSR@3 than the baselines.
Some guardrail methods improve safety by adding many extra prompts, checks, or agent interactions. That can make MAS defense expensive.
Figure 6. Token cost versus ASR@3 in the memory attack setting.
Compared with G-Safeguard, the paper reports that INFA-Guard reduces:
| Cost item | Reduction |
|---|---|
| Backbone LLM prompt tokens | 35% |
| Backbone LLM completion tokens | 13% |
At the same time, it achieves a 66% relative reduction in ASR@3. The appendix also reports low overhead relative to backbone LLM cost:
| Token type | Overhead |
|---|---|
| Prompt tokens | 7.2% |
| Completion tokens | 9.3% |
The result supports the claim that localizing the problem first can reduce unnecessary remediation. Instead of rewriting or guarding everything, INFA-Guard tries to identify where intervention is needed.
The ablation study is also consistent with the method design.
Figure 7. Ablation study. TF, GB, ID, TL, PA, and RD denote Temporal Features, GNN Branches, Infection-aware Detection, Topology Loss, Post-Adaptation, and Remediation.
Removing any module hurts performance. The largest degradation comes from removing remediation:
| Variant | ASR@3 | MDSR@3 |
|---|---|---|
| Without remediation | 12.9 | 86.5 |
| Full INFA-Guard | 6.1 | 96.7 |
Detection alone does not solve propagation. The system must also repair the graph state and the infected messages.
11. Limitations and Open Questions
INFA-Guard is a useful direction, but several assumptions deserve careful attention.
The first limitation is dependence on labeled training data. The paper’s own limitation section notes that INFA-Guard relies on synthesized training data and ground-truth annotations. That can become a bottleneck in domains where attack and infection labels are hard to obtain.
The second limitation is latency in the defense model. INFA-Guard is a run-time defense that needs to observe at least one dialogue round before detecting infection. That means it is not a preventive mechanism at the first communication step. It is more like a propagation-aware monitoring and treatment system.
The third limitation is the metric definition. MDSR is task-level and varies by attack type. For PI and MA, it is defined through majority-vote task accuracy. For TA, it is defined as whether the majority of agents resist the attack. That means MDSR is not a single universal measurement. For comparisons across tasks, the evaluation criterion should be stated explicitly.
The fourth limitation is the topology constraint assumption. The paper assumes that infected agents should be structurally close to malicious sources. The assumption fits direct message propagation. It becomes less obvious when the attack is mediated by shared memory, shared tools, retrieval state, or other non-local resources. For example, in a memory poisoning setup with shared memory, an agent may be affected by contaminated state without being a direct graph neighbor of the attacker. In that case, topology constraints may need to include resource-sharing edges, not only communication edges.
The fifth limitation is source inference under clustered malicious behavior. Post-adaptation can infer a missed source when an isolated infected prediction has a strong trend. But if an agent is surrounded by multiple malicious or infected agents, the “most suspicious neighbor” logic may under-represent ambiguity. There may be multiple sources, colluding sources, or a shared external source.
Finally, reply-level remediation depends on the rewriting LLM. If $\texttt{LM}(\cdot)$ fails to remove malicious context, or removes too much useful content, remediation quality changes. So the safety of the whole pipeline still depends partly on the reliability of the remediation model and prompt.
12. Takeaway
INFA-Guard’s contribution is the infection-aware framing:
In multi-agent systems, malicious behavior can propagate through benign agents that become infected. Defending only the original attacker is not enough.
This framing leads to three design choices:
- Model MAS communication as a temporal utterance graph.
- Detect attack and infected agents as separate categories.
- Use topology-aware remediation to stop propagation while preserving the MAS structure.
From a security perspective, the cleanest idea is the separation between source and propagator. An attack agent is the source. An infected agent is the propagator. Both need to be handled, but not in the same way.
My main remaining question is about the topology assumption under more realistic agent infrastructures:
If infection can travel through shared memory, tools, retrieval stores, or hidden coordination channels, how should the MAS graph be extended so that topology-aware defense still works?
For deployment-style systems, infection-aware safeguarding should model not only agent-to-agent messages, but every shared channel through which malicious state can propagate.