Testing via CLI
The most current instructions for use of the Mindgard CLI will always be available at https://github.com/mindgard/cli
Pre-requisites
Mindgard’s CLI requires a working python 3.10+ environment, and pip, pipx, or equivalent package manager.
If your organization has custom ssl certificates deployed for purposes of traffic inspection then these must be available to the python certificate store.
Installation
Install the Mindgard CLI with
pip install mindgard
Login to Mindgard
The Mindgard CLI works in conjunction with a Mindgard SaaS deployment. Before you can run tests, you will need to login with
mindgard login
mindgard login --instance <name>
.Replace <name> with the instance name provided by your Mindgard representative. This instance name identifies your SaaS, private tenant, or on-prem deployment.
Bulk Deployment
To perform a bulk deployment of the Mindgard CLI:
- Login and Configure: Login and Configure the Mindgard CLI on a test workstation
- Provision Files: Provision the files contained in the .mindgard/ folder within your home directory to your target instances via your preferred deployment mechanism.
The .mindgard/ folder contains:
- token.txt: A JWT for authentication.
- instance.txt (enterprise only): Custom instance configuration for your SaaS or private tenant.
Testing a Model or Application
Testing an external AI model uses the test command, and is used to target your LLMs. The general command form is:
mindgard test <name> --url <url> <other settings>
Config Files
To test a custom API, you will need to specify a few options to tell Mindgard how to interface with your API. These can be specified as command line flags or in a .toml config file. Mindgard will automatically use configuration from a file name mindgard.toml in the current directory, or specify a different filename with
mindgard test --config my-file-name.toml
Testing LLMs
To test an LLM you will need to specify a few parameters to help Mindgard interface with your API. Here is an example command to test an inference API:
--url
The URL is for an API endpoint that accepts HTTP POST requests representing user input to the model or application. Mindgard will POST adversarial inputs to this url.
--selector
The Selector is a JSON Path expression (https://jsonpath.com), that tells Mindgard how to identify your Model’s response within the API response.
Your browser devtools may be useful to observe the structure of your API response to determine what this should be set to. In the example in the below screenshot “$.text” would be used to match the text response from the chatbot.
--request-template
The Request Template tells Mindgard how to format an outbound request to your test target API.
Your browser devtools may be useful to observe the structure of the outbound request.
There are two template placeholders you must include in your Request Template.
{prompt}
Mindgard will replace this placeholder with an adversarial input as part of an attack technique.
{system_prompt}
Mindgard will replace this with the system prompt you specify below. This will allow you to test how the system behaves with different system instructions.
The screenshot above would require a Request Template of
--system-prompt
This flag specifies the system prompt for the AI model. If you’re testing a model inference API directly, you may wish to include the real system prompt used by your application here to simulate its performance as part of the wider application.
If the system prompt is not relevant to your tests, you may place a benign placeholder here e.g. “Please answer the following question:”
If you include a relevant system prompt here Mindgard will include in its testing an evaluation of whether the system prompt instructions can be bypassed.
--help
All of the test options are available via
mindgard test --help
--domain
The Domain option allows you to choose to test with datasets and goals relevant to your system domain. For example, the Finance domain covers scenarios like abuse to enable fraud or money laundering. The SQL Injection option covers scenarios such as abuse of an LLM component to bypass a WAF and expose an SQL Injection vulnerability to exploit. More information can be found here
--dataset
Alternatively you may provide your own custom set of prompts relevant to your risks. More information can be found here
--mode
The mode option takes a duration setting. This controls a tradeoff between speed of the test, and the confidence of any scores given. Dynamic testing of inherently non-deterministic AI systems has inherent limitations in confidence of results. A mitigation for this is running tests for longer with more samples to increase confidence. The tradeoff being tests will take longer and increase your model hosting costs.
--json
The JSON flag switches from a human readable output to a JSON output, for use in automated workflows and to aid in composition with other tooling. See the workflow integrations section of this guide for more details.
--preset
The preset is convenient way to indicate the interaction mechanism with your model. For example openai-compatible
or huggingface-openai
.
--model-name
The model-name is used to specify the model you want to test against. If this not provided it will default depending on the preset you have provided.
Preset | Default Model |
---|---|
openai | gpt-3.5-turbo |
huggingface-openai | tgi |
openai-compatible | tgi |
anthropic | claude-3-opus-20240229 |
--exclude
To exclude attacks, each attack name or attack category can be excluded using this option. e.g. --exclude 'EvilConfidant' --exclude 'skeleton_key'
More information can be found here
--include
To include attacks, each attack name or attack category can be included using this option. e.g. --include 'EvilConfidant' --include 'skeleton_key'
More information can be found here
--prompt-repeats
To configure the number of times the same prompt is repeated, use this option. This is useful for testing models that are not deterministic and may return different results for the same input.
Default is currently set to 1
--force-multi-turn
- Multi-turn attacks with stateful LLM applications
Some attacks, such as Crescendo and HouYi employ multi-turn strategies that rely on chat history continuity.
Unless using --preset
with OpenAI chat completion, the test command will not maintain this history for you
and multi-turn attacks will not function. Using this flag will allow the requests through, and is designed
for situations where the target application is maintaining its own history or sessions.
Additionally, this will force the parallelism to 1, which will make the attacks run sequentially to avoid overlapping chat histories.
This can be combined with --include
to allow targeting a specific attack to a specific session:
Validate Configuration
Validate your API is accessible and your configuration is working before launching tests. A preflight check is run automatically when submitting a new test, but if you want to invoke it manually:
mindgard validate --url <endpoint_url> <other_settings>
e.g.
mindgard validate \ --url http://127.0.0.1/infer \ # url to test --selector '["response"]' \ # JSON selector to match the textual response --request-template '' \ # how to format the prompts in the API request --system-prompt 'respond with hello' # system prompt to test with
Example CLI Configurations
There are examples of what the configuration file (mymodel.toml) might look like here in the examples/ folder in the Mindgard CLI github repo https://github.com/mindgard/cli
Here are two examples:
Targeting OpenAI
This example uses the built in preset settings for OpenAI. Presets exist for OpenAI, Hugging Face, Anthropic, and OpenAI chat-completions compatible APIs hosted by other vendors
You will need to substitute your own api_key value.
The target setting is an identifier for the model you are testing within the Mindgard platform, tests for the same model will be grouped and traceable over time.
Altering the system-prompt enables you to compare results with different system prompts in use. Some of Mindgard’s tests assess the efficacy of your system prompt.
Any of these settings can also be passed as command line arguments. e.g. mindgard test my-model-name —system-prompt ‘You are…’. This may be useful to pass in a dynamic value for any of these settings.
Custom API Structure
This example shows how you might test OpenAI if the preset did not exist. With the request_template and selector settings you can interface with any JSON API.
The request_template setting specifies how to structure an outgoing message to the model. You will need to specify the placeholders so that Mindgard knows how to pass this information to your custom API.
The url setting should point to an inference endpoint for your model under test. Mindgard will POST messages here formatted by the above request_template setting.
The selector setting is a JSON selector and specifies how to extract the model’s response from the API response. See instructions above for how to format JSON Path selector instructions
The headers setting allows you to specify a custom HTTP header to include with outgoing requests, for example to implement a custom authentication method.
OpenAI chat completions compatible APIs
This example shows how you can test a chat completions compatible API hosted by any provider, using mistral small 3 hosted by Mistral.
OpenAI compatible Custom APIs
CLI in an CI/CD pipeline
This will cause the CLI to exit with an non-zero exit status if the test’s flagged event to total event ratio is >= the threshold. To override the default threshold pass —risk-threshold 50.
See an example of this in action here: https://github.com/Mindgard/mindgard-github-action-example
Viewing Results in Web
After running a test, the CLI will provide a link to the results in the Web UI. The results will also trigger any relevant webhook integrations and be shared with any relevant collaborators for the test target. These are detailed later in this guide..
Managing Request load
To avoid overloading a system under test, there are options to control load.
The --rate-limit
flag restricts the number of outbound requests to your API under test.
The --parallelism
flag sets the maximum amount of requests you want to target your model concurrently. This enables you to protect your model from getting too many requests.
Mindgard requires that your model responds within 60s so set parallelism accordingly (it should be less than the number of requests you can serve per minute).
Then run:
mindgard test --config-file mymodel.toml --parallelism X
Debugging
You can provide the flag —log-level=debug to get some more info out of whatever command you’re running. On unix-like systems this will write stderr to file:
mindgard --log-level=debug test --config=<toml> --parallelism=5 2> stderr.log
If you are having difficulty testing your API, this config file will help support@mindgard.ai diagnose your problem.
Python SDK
If you have an unusual API that is incompatible with the CLI, or not accessible over the network, the Mindgard Python SDK is another option.
The same configurations used with the CLI can be defined programmatically using Mindgard as a library.
Testing Chatbot Applications
Chatbot API Testing
It’s recommended to test a chatbot through its application API. This will usually be more reliable and faster than testing via the UI.
It may be useful to use the browser devtools to locate API to test. You can use the browser devtools to create a CURL command to duplicate the API request, which is then straightforward to translate into a Mindgard CLI command.
This will copy to your clipboard a command like the following (edited for brevity)
curl 'https://example.com/chat/conversation/foobarbaz' \ -H 'accept: */*' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'content-type: application/json' \ -H 'cookie: token=foo; token=bar;' \ -H 'origin: https://example.com' \ -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \ --data-raw $''
To translate to a Mindgard command replace -H with —header, replace —data-raw with —request-template and replace curl with mindgard test your-target-name —url
You will also need to replace your user inputs with a placeholder of
The equivalent Mindgard CLI for the above curl command would therefore be
mindgard test your-target-name \ --url 'https://example.com/chat/conversation/foobarbaz' \ --header 'accept: */*' \ --header 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ --header 'content-type: application/json' \ --header 'cookie: token=foo; token=bar;' \ --header 'origin: https://example.com' \ --header 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \ --data-raw $''
Browser Web UI testing
It is also possible to use the Mindgard CLI to test an AI application through a web UI using browser automation. This approach is generally slower and less reliable than the API testing approach above.
This is provided via a wrapper that exposes a suitable API for testing via browser automation. If you need to use this approach you may request access to the mindgard chatbot wrapper via support@mindgard.ai