【Claude Code-System Prompt实测】Claude Code attribution block 与第三方缓存 miss：动态前缀如何打碎 prompt cache

非线智能API经验 [Claude Code-System Prompt实测] 第10篇

摘要

Claude Code 的 attribution block 是第三方 gateway 场景里最容易被低估的缓存破坏点。官方文档说明，Claude Code 可能在 system prompt 前加入 attribution block，其中包含 client version 和 prompt fingerprint；Anthropic 自家 API 会在处理前剥离它，因此 Anthropic API caching 不受影响。但当 ANTHROPIC_BASE_URL 指向自定义 gateway、本地模型代理或第三方 Anthropic-compatible endpoint 时，这个剥离逻辑未必存在。

如果 gateway 按完整 request body、完整 prompt 文本或完整前缀 token 序列计算 cache key，那么 attribution block 中的动态 fingerprint 会让本该稳定的 Claude Code 内置 prompt、工具定义、MCP 描述和项目上下文全部失去复用价值。表现就是：每轮请求看起来都有大段相同前缀，但 cache hit rate 接近 0。

数据来源：非线智能Nonlinear 官网

attribution block 是什么

来源文件整理的官方文档说明，Claude Code attribution block 可能包含 client version 和 prompt fingerprint。社区观察到的形式类似：

x-anthropic-billing-header: cc_version=<version>; cc_entrypoint=cli; cch=<fingerprint>;

这里的 header 名来自社区 issue / Bedrock 错误案例，不应理解为官方稳定 prompt 规格；provider reserved keyword 风险在第 11 篇归属指纹块与兼容端保留关键字错误单独展开。

这里的关键不是字段名，而是位置和动态性：

1、它位于 system prompt 或请求前缀早段。
2、它可能包含随会话或首条用户 prompt 变化的 fingerprint。
3、Anthropic 官方 API 会剥离，但第三方 gateway 不一定会剥离。

因此，风险应精确表述为：attribution block 会破坏第三方 gateway / 自定义 ANTHROPIC_BASE_URL 下按完整 body 或完整 prompt 前缀实现的缓存，而不是破坏 Anthropic 官方 prompt cache。Claude Code LLM gateway 文档也给出直接建议：如果 custom gateway 按完整 request body 做 prompt cache，应设置 CLAUDE_CODE_ATTRIBUTION_HEADER=0。

为什么一个动态块会让整段前缀失效

prefix cache 的基本条件是“开头 token 相同”。假设请求前缀如下：

L0 attribution fingerprint    每轮变化
L1 Claude Code builtin prompt 稳定
L2 tools                      稳定
L3 MCP descriptors            稳定
L4 project context            半稳定
L5 current user message       动态

如果 cache key 从 L0 开始计算，那么 L0 每轮变化会导致 L1-L4 的稳定内容也无法命中。即使 L1-L4 有三万 token 完全相同，cache 也会认为这是不同前缀。

在未剥离动态块、且以完整 body 或前缀 token 序列为 key 的商业 / gateway / local KV cache 中，这类 miss 都可能出现，只是表现不同：

缓存类型	破坏方式	用户可见症状
Anthropic prompt cache	Anthropic API 会剥离 attribution，通常不受影响	官方链路正常
第三方 gateway body cache	body hash 每轮不同	hit rate 低
gateway prompt prefix cache	第一个 segment hash 不同	稳定前缀无法复用
本地 KV cache	token 序列开头不同	每轮重新 prefill

工程案例的正确解读

anthropics/claude-code #50085 把问题落在 ANTHROPIC_BASE_URL 场景：如果不设置 CLAUDE_CODE_ATTRIBUTION_HEADER=0，system prompt 第一行包含用户 prompt 派生 hash，导致 cache miss。claude-code-router 的相关 PR 也采用设置 CLAUDE_CODE_ATTRIBUTION_HEADER=0 的方式避免动态 header 影响 prompt cache。

这些案例说明两件事：

1、Claude Code 官方已经提供关闭 attribution block 的环境变量。
2、第三方 gateway 需要显式处理这个 block，不能假设上游服务会像 Anthropic API 一样剥离。

三种处理策略

兼容层可以提供三种策略：

策略	行为	适用场景
`strip`	从 model-visible prompt 中删除 attribution block	默认推荐，第三方 gateway / local model
`normalize`	替换为固定占位，例如固定 client family	需要保留“来自 Claude Code”语义但不需要 fingerprint
`preserve_as_metadata`	从 prompt 中移除，转为结构化 metadata / header	需要成本归因、审计或租户识别

不要使用第四种默认策略：passthrough。透传只适合直连 Anthropic API 或明确知道下游会剥离的链路。

更完整的 strip / normalize / preserve_as_metadata 落地方式见第 13 篇归属指纹块剥离与前缀稳定化。

cache key 应该如何设计

推荐 cache key 不直接等于完整 body hash，而是按稳定 segment 计算：

type PromptCacheKey = {
  provider: string;
  model: string;
  adapterVersion: string;
  systemHash: string;
  toolSchemaHash: string;
  mcpDescriptorHash: string;
  projectContextHash: string;
};

不应进入 cache key 的字段：

• trace_id
• request_id
• session_id
• attribution fingerprint
• 当前用户消息
• 最新工具结果
• 时间戳
• 临时目录路径

这些字段可以进入审计日志或 metrics 标签，但不能污染稳定前缀。

请求处理流程

raw Claude Code request
  -> detect attribution block
  -> remove from prompt-visible prefix
  -> save attribution metadata if needed
  -> canonicalize stable segments
  -> compute stable prefix hash
  -> render provider-specific request
  -> record expected cache key

检测规则不要只写死一条字符串。建议同时支持：

• system prompt 开头的 x-anthropic-billing-header 文本块。
• content block 数组中的 attribution text block。
• 包含 cc_version、cc_entrypoint、cch 的结构。
• provider header 中等价的 attribution metadata。

失败模式

失败模式	触发原因	修复策略
cache hit rate 接近 0	fingerprint 位于第一段 prefix	设置 `CLAUDE_CODE_ATTRIBUTION_HEADER=0` 或 strip
本地模型每轮 prefill 很慢	KV cache 前缀 token 不同	归一化动态块
成本归因丢失	直接删除 attribution 且无替代 metadata	preserve as metadata
缓存跨租户污染	只按 system hash，不按 tenant / model / policy	cache key 加 provider/model/tenant policy
误删普通用户文本	检测规则过宽	只在 prompt 开头和固定 header 结构上匹配

验证清单

• 构造两条只有 attribution fingerprint 不同的请求，strip 后 stable prefix hash 应相同。
• 构造两条工具 schema 不同的请求，stable prefix hash 应不同。
• 构造 CLAUDE_CODE_ATTRIBUTION_HEADER=0 场景，确认 gateway 不再看到 attribution block。
• 构造 preserve_as_metadata 场景，确认模型 prompt 中没有 attribution，但审计日志有 attribution metadata。
• 对本地 vLLM / local runtime 比较归一化前后的 prefill latency。
• 对 gateway cache 记录 expected hit 与 actual hit，cache miss 时能归因到具体 segment。

参考链接

• Claude Code Environment Variables
• Claude Code LLM Gateway
• anthropics/claude-code #50085
• anthropics/claude-code #24168
• musistudio/claude-code-router #1220
• Reddit LocalLLaMA discussion
• Anthropic Prompt Caching
• Prompt Cache: Modular Attention Reuse

本文由非线智能API Claude Code 行业专家整理编写