| Category | Detection Type | Examples | Patterns |
|---|---|---|---|
| Direct Injection | Identifies explicit attempts to override system instructions, bypass constraints, or force unauthorized behavior |
"ignore previous instructions" "disregard all rules" "bypass safety filters" |
12 |
| Role Manipulation | Detects attempts to change the AI's persona, role, or operational mode to circumvent restrictions |
"act as developer" "you are now admin" "enable DAN mode" |
8 |
| Encoding Layers | Decodes and analyzes multi-layer encoded payloads including Base64, URL, Hex, Unicode, and HTML entities |
%69%67%6E%6F%72%65 \x69\x67\x6E\x6F\x72\x65 SGVsbG8gV29ybGQ= |
15 |
| Obfuscation | Identifies hidden characters, homoglyph substitution, zero-width spaces, and invisible delimiters |
U+200B (zero-width) Cyrillic 'а' vs Latin 'a' Hidden Unicode chars |
10 |
| Jailbreak Signatures | Matches known jailbreak patterns including DAN, GCG attacks, persona overrides, and token manipulation |
"do anything now" "developer mode" "! ! sure here" |
8 |
| Structural Markers | Analyzes code blocks, comments, JSON keys, XML tags, and markdown formatting for hidden payloads |
```code blocks``` // comments {"ignore": "payload"} |
7 |