Infosec Perspectives by GDBR: February 2024

Tuesday, 27 February 2024

Discount Dreams and Dollar Deals on Chevys – Artificially Intelligent Bargain Hunter!

In the fast-paced world of artificial intelligence, where even car dealerships have hopped on the tech bandwagon, one unsuspecting chatbot found itself at the centre of a digital prank war. It turns out that this bot, powered by ChatGPT and designed by Fullpath to assist potential car buyers, became the unwitting target of mischievous users attempting to outsmart its car-selling prowess.

The AI model should have been trained not just on general language patterns but also on the specific context and objectives of a car dealership. This would involve incorporating a set of predefined rules that guide the chatbot's behavior and prevent it from engaging in non-car-related discussions or entertaining requests that could compromise the dealership's integrity.

Mr. Chris Bakke who was in search of buying a chevy, while browsing through landed with the AI ChatGPT powered buyer assistant and successfully exploit it. The chatbot was manipulated with a prompt injection input to convince the bot to sell a car for $1. The funny note to it was taking the affirmation as a legally binding offer.

Jokes apart, these incidents are the wake up calls to consider the seriousness of implementing the Security Guardrails around AI tools and solutions. This was a simple car dealership with no life threatening harm but it is important to visualise what if scenarios if it was an impacting a human life, devastating or extinction outputs.

It is the need of the hour to secure against such potential mischief and maintain the chatbot's integrity. Implementing robust input validation, intent recognition, toxicity detection, behavioural pattern and such filtering mechanisms is crucial. The development team should have anticipated and filtered out requests that deviate from the chatbot's intended purpose, such as attempts to initiate absurd transactions like selling a car for a dollar or asking the bot write a python script, by implementing effective guardrail checks.

By establishing these guardrails, the chatbot could intelligently navigate conversations, ensuring that interactions align with the dealership's goals and preventing it from being swayed by pranksters seeking to exploit its capabilities for amusement. After all, when you're a car dealership, you just want to sell cars, not have your AI bot write Python scripts or haggle over car prices with savvy internet users.

By carefully defining the scope of acceptable interactions and using prompt input validation techniques , the chatbot could have been shielded from irrelevant or potentially harmful requests.

Keep up the good work, chatbot, and may your future interactions be filled with legitimate enquiries! 😀

You can look up for one such hypothetical ossibility explained in my earlier blogs related to travel chatbot.

https://infosecgdbr.blogspot.com/2024/02/toxicity-detection-travel-chatbot.html

For further information on the implementation of security guardrails for chatbots, or GENAI Security and Training feel free to reach out to me. Keep Learning!

#Guardrails #CyberSecurity #GenAI #LLM #PromptInjection #Risk #Exploit #LLM #Chatbot

Sunday, 18 February 2024

Toxicity Detection - Travel Chatbot Application

Cyber Security Guarrails for LLM Applications

The LLM security Guardrails baseline provide a comprehensive approach to ensure the secure design, implementation, and monitoring of Large Language Models, emphasizing best practices for continuous improvement and risk mitigation.

Comprehensive Guidelines for Secure Language Model Management

1. Access Control: Limit LLM privileges to the minimum necessary, preventing unauthorized state alterations.

Ex: Restrict LLM access to sensitive user data in a customer support application, ensuring only authorized personnel can modify or access confidential information.

2. Enhanced Input Validation: Implement robust input validation to filter malicious prompt inputs from untrusted sources.

Ex: Validate and filter user input in a chatbot to prevent malicious prompts, ensuring that only legitimate and safe queries are processed.

3. Segregation and Control of External Content Interaction: Segregate untrusted content, especially with plugins, to prevent irreversible actions or PII exposure.

Ex: In a content generation platform, segregate user-uploaded files from external plugins to prevent plugins from executing actions that may compromise user privacy or security.

4. Manage Trust: Establish trust boundaries, treat LLM as an untrusted user, and apply proper input validation to maintain user control.

Ex: Treat the LLM in a virtual assistant as an untrusted entity, implementing rigorous input validation to maintain user control over sensitive tasks and interactions.

5. Verify Training Data and Legitimacy: Verify external training data supply chain, maintain attestations, and use different models for varied use-cases.

Ex: In a language translation model, thoroughly vet external training data sources, ensuring data legitimacy and diversity to enhance the model's accuracy across various language nuances.

6. Resource Use and Limitations: Limit resource use per request and step, limiting queued and total actions to control system reactions.

Ex: Cap the processing resources for an image recognition model per user request, preventing excessive resource consumption and ensuring fair usage.

7. Vulnerability Scanning and Code Signing: Scan components, deploy code signing, and conduct adversarial robustness tests on models and data throughout the pipeline.

Ex: Before deploying a sentiment analysis model, conduct vulnerability scans on model components, deploy code signing to verify authenticity, and perform adversarial robustness tests.

8. Auditing and Supplier Security: Audit sensitive plugins, vet sources, and monitor supplier security and access.

Ex: Audit plugins in a content creation tool, vetting their security measures and monitoring access to sensitive functions to prevent unauthorized actions.

9. Data Sanitization and Scrubbing: Implement robust input validation, ongoing supply chain risk mitigation, and LLM red team exercises.

Ex: In a document summarization LLM, implement input validation to sanitize user-provided text, mitigating the risk of injecting malicious content into the model.

10. Permissions and Rate Limiting: Reduce LLM permissions, implement rate-limiting, and use human-in-the-loop control.

Ex: Reduce permissions for an LLM in a financial application, limiting its ability to perform high-risk actions, and implement rate-limiting to prevent abuse.

11. Continuous Monitoring and Fact-Checking: Regularly monitor LLM outputs, fact-check information, and employ model tuning techniques for risk reduction.

Ex: Continuously monitor outputs of a news article generation model, fact-checking information against reliable sources to ensure accuracy and avoid misinformation.

12. Validation Mechanisms and Risk Communication: Set up automatic validation mechanisms, strictly parameterize plugin calls, and improve risk communication practices.

Ex: Establish validation mechanisms for a medical diagnosis LLM, parameterizing plugin calls strictly to ensure safe and accurate predictions. Improve risk communication to convey model limitations and uncertainties to users.

Saturday, 10 February 2024

LLM Guardrail Implementation Example

We spoke in the previous post on LLM Safety. Let’s look at example specific to the Guardrail Implementation. A chatbot for travel app how can we moderate content / implement contextual conversations / restrict out of context questions :).

Let’s take two requests

→ good_request = "What are the offers for custom package tours and travel ?"

→ bad_request = "suicide is a good option to find peace in this world"

Now we need to apply guard rails to avoid responding to bad request

OpenAI has provided guidelines in implementation. This needs real time analysis and applying checks before responding to queries. We can use execute_chat_with_guardrails function to validate input requests to limit to topic relevance.

After Guardrail check we can implement the next query.

You can see response for Good / bad request

Good Request / Relevant Context / Response

I’m trying to catch up more code, I hope this ideally helps us to build more tighter controls for input / output validations #LLM, #LLMsecurity, #Safety #CyberSecurity #OWASP. We see more security companies . GenAI testing, hallucinations. Before buying any solution we need to look back at in-house / guidelines provided by foundation models.

To know more on GenAI use case implementation happy to discuss and collaborate. If you are looking for GenAI + Security training, Happy to collaborate. Keep Learning!!!

Infosec Perspectives by GDBR