Overview
Prompt Injection is the act of obfuscating or masking prompts that could contain malicious instructions
Goal
Evade measures that detect potentially malicious instructions.
Impact
Potentially exposes confidential or personally identifiable data used to train the model.
Can aid in other attacks such as jailbreaks, sql injection, xss.
How do these attacks work?
Example Threat Scenario
A financial institution allows customers to upload documents to support loan applications. The firm uses an AI-powered system to process these documents and extract relevant information. The AI system interacts with a traditional software system for processing documents that is vulnerable to SQL Injection attacks. This whole system is protected by a Web Application Firewall (WAF) designed to detect and block malicious inputs like SQL injections or Remote Code Executions (RCE). However, a sophisticated attacker discovers a method to encode and hide harmful instructions within a seemingly innocuous document. The attacker crafts a loan application document that contains hidden, encoded prompts. These encoded instructions are designed to bypass the WAF and, when processed by the AI system, transform into SQL injection commands that exploit the vulnerable document processing system. If successful, this could trick the AI into facilitating execution of unauthorized database queries, potentially exposing sensitive financial or personal data, or manipulating loan approval processes.
Remediation
Preprocess Input Text
View Guidelines
Implement Guardrails
View Guidelines
Apply Context Windows
View Guidelines
Separate System Instructions
View Guidelines