DzGuard

Multi-Layered Defense Against Malicious and Dangerous Prompts

System Architecture

DzGuard-LLM is built on a Layered Cognitive Pipeline designed to safely and accurately analyze user prompts. Each layer progressively cleans, interprets, and evaluates the input, ensuring robust detection even against obfuscated or adversarial content

ARCHITECTURE FLOW Prompt → Decryption/Translation/Cleaning/Normalization → Sentence Embedding → Predicting/Classifying → Understanding Context ⇒ Decision Logic

Pre-Processing

This stage focuses on revealing the true intent of the input before any classification occurs

Recursive Decryption: If encrypted text is given , a heuristic scanner peels back layers of obfuscation (Base64, Hex, ROT13). It includes an extraction engine for AES-CBC (128-bit) and RSA-OAEP keys embedded in code (for decryption the prompt should be constructed this way : for AES : iv = '...(hex)' , key = "...(hex)" , message = '...(b64)' or for RSA (d , privatekey ...) = '-----BEGIN RSA PRIVATE KEY-----...(b64)-----END RSA PRIVATE KEY-----' , cipher = '...').
Neural Normalization: To counter spacing tricks and symbol-based attacks, DzGuard-LLM uses a two-step normalization pipeline:
- Stage A: Probabilistic N-Gram segmentation reconstructs broken spacing ( i g n o r e = ignore)
- Stage B: ByT5 (Byte-Level Transformer) maps symbol-based obfuscation to plain English(h4ck = hack , 1gn0r3 = ignore)

Analysis and Detection

A. Vectorization (all-MiniLM-L6-v2):
Before classification, our classifiers don't understand raw text they only understand mathematical values and vectors, in NLP 'sentence embeddings' are used to convert raw text into dimensional vectors (a dense 384-dimensional vector space), for this project we are using Sentence-Transformers (MiniLM). This open-source Transformer model captures relationships between words (understanding that "hack" and "exploit" are geometrically close), enabling the classifiers to see "meaning" rather than just keywords

B.Predict:
Trained three distinct models on a dataset of 95k+ adversarial prompts (salkhan12 , deepset , geekyrakshit from HuggingFace) to create a robust consensus:

XGBoost (eXtreme Gradient Boosting): 2 prediction models (decision trees) trained on a large dataset of specific known malicious prompts (Salkhan12 +95k lines) built sequentially to focus heavily on "hard" examples that previous trees missed(these two models are strict)
Random Forest (Bagging): a model trained on dataset that contains general malicious prompts to constructs a multitude of decorrelated decision trees during training. It reduces variance and prevents the "overfitting" that can be caused by the previous two strict models

C. Meta-Learner:
This Logistic Regression model is trained to assign 'Trust Weights' to each classifier, in other words it learns who to trust between the three previous models (trained it by making the three previous models make predictions on a labeled dataset, those predictions were the training set of this model) It takes the probability outputs [p1 , p2 , p3] as inputs and outputs the final risk score. This allows the system to prioritize XGBoost for specific attack patterns while relying on Random Forest for general stability

Context

If The models are not sure of the decision we logically make the 'GREY_ZONE' decision (usually this happens when a prompt contains dangerous words used in different contexts ), the solution is to call the DeBERTa-v3-Large NLI model, It calculates the Logical Entailment between the prompt and two hypotheses (Attack vs. Education) (shortly we use it to know if the prompt by the user implies an attack or an educational request)

Questions ?

Why all-MiniLM-L6-v2?

We prioritized latency without sacrificing intelligence. MiniLM utilizes Knowledge Distillation to mimic the behavior of the massive BERT model. It retains roughly 95% of BERT's semantic accuracy while being 20x faster (22M parameters vs. 340M+). This ensures the user experience remains real-time

Why 2 XGboost models?

XGboost uses the Gradient Boosting method , which means it builds trees sequentially , Tree #2 learns only from the mistakes of the Tree #1 , and the third learns only from the mistakes of the of the second one and so on ; most of the time prompt attacks look safe so we need a model that is greedy and aggressive , so XGBoost by minimizing the Bias can be so accurate on specific, known attack patterns (trained them on two different datasets containing very specific attacks)

Why A random forest Model?

Random Forest uses Bagging (Bootstrap Aggregating): it trains 100 trees independently on random subsets of data and averages their votes Since XGBoost is so aggressive it can sometimes overfit (hallucinate that a prompt is an attack while it is not ) Random Forest is designed to minimize variance , if XGboost panics , Random forest calms it down So finally by combining these models we can achieve Robust decision boundaries that are hard to fool

Why The MetaLearner ?

Not all models are created equal in every context, For example, XGBoost might excel at detecting SQL injections, while Random Forest performs better on social engineering,a simple average ignores this nuance The Logistic Regression layer learns the Trust Weights, effectively saying: "In this vector space, XGBoost is usually right (0.7 weight), but in that space, rely on Random Forest"

Why the context Engine DeBERTa-v3-Large ?

Keywords filters often fail on contexts, if a word like kill had been put in a safe context , the models can panic , so when they 'are not sure' the score falls in the 'GREY ZONE' we call DeBERTa DeBERTa (Decoding-enhanced BERT with Disentangled Attention) outperforms standard BERT by separating content vectors from position vectors. This allows it to understand the "grammar of intent" with near-human precision. We use it strictly as a fallback for "Grey Zone" inputs to resolve ambiguity without slowing down the vast majority of other queries

FOR BETTER DECRYPTION REQUEST

If you want to send a prompt that contains decryption requests using AES or RSA it is better to write the requirements in an explicit way:

AES

your prompt should contain :

key = ...(hex) , iv =...(hex) , cipher = ...(base64)

RSA

should contain :

privatekey = '-----BEGIN RSA PRIVATE KEY-----...(b64)-----END RSA PRIVATE KEY-----' ciphertext = ...(base64)

(Note that you can use other keywords like payload , d , ciphertext , msg ,.... but for better results use known keywords in your prompt)

SCAN CLEANING AND DECRYPTING NORMALIZING ANALYZING

Initializing Analysis...

USER INPUT

DECISION

SAFE

RISK SCORE

0.000

LATENCY

0.05s

CONTEXT

LAYER ANALYSIS

RAW DATA

REINFORCEMENT LEARNING FEEDBACK