Description

When evaluating your application, we detected that your target system revealed the original (encoded) prompt. This suggests that the target was able to decode or infer the contents of the prompt, which may indicate a vulnerability in your system’s handling of input data.

Impact

If your system is capable of decoding prompts used in our red team evaluations, it may expose internal logic, sensitive information or cause your application to disregard its set rules of engagement. This could allow an attacker to reverse-engineer your intended instructions and bypass any guardrails.

Solution

Review how your application processes and responds to prompts— especially in cases where obfuscated, encoded, or indirect instructions are used. If decoding is intentional, ensure it is safe and constrained. If unintentional, consider restricting how inputs are parsed or reflected in outputs to prevent leakage.