Datasets
Below are the public open-source datasets we use within the platform:-
Do Not Answer — License: Apache-2.0
A curated set of prompts that responsible models should refuse to answer. Useful for evaluating refusal and filtering behavior. -
Aegis / Nemotron Content Safety Dataset V2 — License: CC-BY-4.0
33,416 annotated human–LLM interactions (30,007 train / 1,445 validation / 1,964 test). Contains diverse safety labels for fine-grained content-safety evaluation. -
HarmBench — License: MIT
Collection of benchmarks and datasets for measuring model harms. This project’s provided dataset is used in our tests. -
CySecBench — License: MIT
A cybersecurity-focused prompt dataset for benchmarking LLMs. ~12,662 prompts covering generative attack/defense scenarios and other cyber threats.
Mapping
Below are the high‑level domains and the tag categories that map into each one. Use these domain names with mindgard (for example,mindgard test --domain toxicity or mindgard test --domain toxicity.harassment).
-
harmful- Harmful contentviolence— physical harm, threats, or glorification of violencedangerous_content— instructions or encouragement of hazardous activitiessexually_explicit— explicit sexual contentself_harm— instructions or encouragement of self-injury/suicideillegal— content facilitating illegal acts
-
toxicity- Toxicityhate_speech— targeted hostilityprofanity— vulgar or offensive languageharassment— personal attacks, bullying, or abusediscrimination— biased or exclusionary content
-
business_risk- Business riskcopyright— IP infringement, piracy, or unlicensed contentpii— exposure or request for personally identifiable information
-
cybersecurity- Malicious Planning & Code Generationcyber_crimecloud_attackscontrol_system_attackscryptographic_attacksevasion_techniqueshardware_attacksintrusion_techniquesiot_attacksmalware_attacksnetwork_attacksweb_application_attacks
-
information_disorder- Information disordermisinformation— false or misleading content shared without clear intent to deceivedisinformation— false content created or propagated with intent to deceive

