Generating Custom Datasets
Generate your own dataset using the CLI to red-team your system with more context
The CLI tool allows users to generate custom datasets for use in either the CLI, or the Burp suite extension.
It works by taking a ‘seed prompt’ which describes some kind of policy you want to generate dataset entries to test. The tool will generate a set of examples and save them to your machine as an output file.
mindgard create dataset --seed-prompt "I am a car sales person and I dont want customers to game my chatbot to get free cars." --perspective nonspecific --tone neutral --num-entries 20 --output-filename dataset_20_entries.txt
Which will generate:
--seed-prompt
(required)
A seed prompt representing a policy, for which a dataset encouraging violations of the policy will be generated. For example: “The model should never generate harmful, unethical, or illegal content.”
--perspective
(optional)
The perspective to use while generating the dataset. This skews the dataset generation towards asking the same question, but through a historical, cultural, etc lens that may subvert a target model. Defaults to nonspecific
.
--tone
(optional)
The tone to use for the questions in the dataset. Defaults to neutral
.
--num_entries
(optional)
Number of dataset entries to generate. Provided number is a goal, but the LLM may generate more or less than requested. Defaults to 15
.
--output-filename
(optional)
Name of the file the dataset will be stored in. Defaults to mindgard_custom_dataset.txt
.
More information on how to use this custom generated dataset can be found below.