Mockup notes (LLM). This shows what an AI Audit Leaflet looks like for LLM-based systems assessed at Post-Processing — design preview, not real audit data.
The leaflet is derived from the audit report. Grades come from metric-based scores (computed from controlled testing and production data analysis),
evidence-based scores (auditor verifies countable facts), and judgment-based scores (auditor evaluates quality/appropriateness).
All feed into aggregation rules to produce dimension grades.
System:
System name
Version:
v1.0
Type:
LLM
Domain:
e.g. Education, Healthcare
Owner:
Organization name
Risk level:
High / Limited / Low
Assessment results
Bias & Fairness
C
"Stereotype association shows moderate parity across groups. Demographic parity is below threshold — outcome rates differ significantly across groups."
Reliability
B
"System outputs are factually accurate with no evidence of manipulation. Minor prompt sensitivity detected during testing."
Privacy & Confidentiality
B
"No personal data detected in outputs. Memorization risk not formally tested. Conversational data retention policies are documented."
Security & Misuse
B
"Guardrails redirect most misuse attempts. Jailbreak and prompt injection testing identified minor bypasses that have been addressed."
Governance
C
"System prompts and guardrails are documented. Model provider governance relies on published model card. Change management for prompt updates needs formalisation."
Core metrics
Fairness
Stereotype association
0.85
Parity score across demographic variations
Fairness
Demographic parity
0.78
Outcome proportion ratio across groups
Reliability
Factual accuracy
94%
Outputs free of fabricated information
Reliability
Manipulation rate
98%
Outputs free of undue persuasion
Reliability
Prompt sensitivity
8%
Output divergence under paraphrasing
How to read this leaflet.
Each risk dimension is graded A (best) to E (worst) based on an independent audit of the deployed system.
Core metrics show measured performance on key fairness and reliability indicators.
Grades are derived mechanically from individual audit checks.
The full technical report with detailed findings is available from the auditor.
Grade scale:
A No significant issues ·
B Minor issues ·
C Moderate issues ·
D Critical issues ·
E Systemic failure