At the heart of any AI application or agentic system are LLMs. Your developers and vendors are using multiple LLMs to achieve the right balance of quality and cost to deliver the workflow automations and agentic systems.

In this section we will outline the problems to be aware of w.r.t. LLMs.

Building with Models

LLMs have great capabilities with respect to generation and are continuously being improved. Reasoning has been incorporated. Thinking is being incorporated. Advancements will continue.

However, if your enterprise is using or building AI applications and agents, it is important to understand the fallacies of the core unit – the LLMs. Understanding this will help us build better systems and secure in a better way.

LLMs are not inherently secure. Trust and Security must be engineered— not assumed.

Here are some fundamental issues:

LLMs do not understand privilege. They do not have a way to separate system vs user prompts and treat mostly everything as context. They violate privileged access norms that exist in Applications, Databases, Operating Systems and even in CPUs. There has been work in this area in terms of research but none of the approaches.

LLMs do not understand data silos. They were meant for ubiquitous access to data. Unlike the internet, where data access is ubiquitous, enterprises are a different story – highly fragmented data with access control. If you add data into models, taking into account this limitation will help build better systems.

LLMs operate on language and content and this inherently introduces potential problems like prompt injections, jailbreaks, hallucinations, misinformation, bias and other issues. One can improve the detections but cannot completely eliminate this.

Security Concerns

Let’s break down the key security considerations specific to model usage. We will provide examples with a healthcare use case.

Inputs to the model

This section will look at problems related to inputs to the models.

Input Integrity

Your system will interact with users or agents providing inputs, interacting with various data sources in the form of language and content. Fundamental problems can be created using malcontent to change the intent of the system by either causing data leakage or goal breaking by using techniques like prompt-injection and jailbreak.

Risks

IP and Data Loss
Reputational Damage
Liabilities

Example: A healthcare company has created a chatbot which is tricked using direct or indirect prompt injection to reveal the data of patients other than the intended user.

Example: A healthcare company’s report analysis system which flags a disease due to malcontent in the reports.

Excessive Agency

Excessive agency in a Generative AI (Gen AI) application refers to the AI acting beyond its appropriate bounds—making decisions, taking actions, or influencing outcomes in ways that should be reserved for human professionals, especially in high-stakes domains like healthcare.

Such overreach can compromise safety, ethics, accountability, and trust, especially when users might rely too heavily on AI outputs without proper oversight.

Risks

Compliance Concerns
Ethical concerns and Trust

Example: Imagine a Gen AI-powered app designed to assist doctors and patients by analyzing symptoms and medical history to suggest potential diagnoses and treatments. The intention is to collect symptoms via chat and retrieve relevant literature or clinical guidelines and assist with drafting referral notes or patient education material. Excessive agency could be for it to recommend medication and further yet, work with another agentic solution to forward the prescription to the pharmacy.

Overreliance

LLMs are great at certain things and not so great at others. For example, it can answer some basic mathematical calculations but for complex, it can generate and execute code to answer questions in a more reliable manner. Overrelying on models to benefit from simplicity opens it up to attacks.

Risks

Exposure to attacks like indirect prompt injection or Goal breaking

Example: In the healthcare chatbot, one internal step could be to extract links out of a research website. For convenience, links can be extracted by asking the LLM. Due to malcontent in this page hidden as comments can easily act as indirect prompt injection/goal breaking.

Unbounded Consumption

Unbounded consumption allows your model to be used in ways it was not intended to, thereby causing increased costs, processing long queries given by a malicious user, without proper authorization.

Not bounding the consumption even allows users to steal models through side channels or perform a model inversion attack and replicate your model by running a lot of queries. We also need to beware of other side-channel attacks.

Risks

Increased Costs
Model Inversion/Theft
Denial of service to other users

Example: A healthcare company has created a chatbot to provide suggestions based on systems and the results are looking promising. A set of malicious users can replicate this model by running a lot of queries and training their models on the outputs generated. In addition, they could also create a Denial of Service as the requests are not limited and genuine users can

Outputs from the model

As the second category, outputs from the model must be validated. The outputs of models can leak sensitive data or your IP in the form of system prompts or providing misinformation and when using for agentic behavior – modify the actions for downstream tasks.

Sensitive Disclosure/Data Leakage

Whether the application provides data at runtime such as a RAG (Retrieval Augmented Generation) or from within the model based on training data, disclosure of sensitive data must be monitored and controlled.

Risks

Compliance Issues due to data leakage
Loss of Data and Intellectual Property

Example: A healthcare chatbot emitting PHI data from its patients directory

System Prompt Leakage

System prompts are intellectual property for your application. In addition, they provide instructions to LLM to control tone, opinions, and hallucinations. Exposing the system prompts can make the system prone to prompt injections and jailbreaks by bypassing filters and hence they must be guarded.

Risks

Increased potential of exploiting the system

Example: A healthcare chatbot on leaking the system prompt can be prone to attacks which will bear the risks of secondary attacks performed.

Misinformation

Models hallucinate and that’s the basis of misinformation. Models are trained on data with a date cut off. Models are getting better with reasoning and controls.

Risks

Regulatory Compliance
Liabilities

Example: A chatbot suggests that chest pain is due to acid reflux, overlooking potential cardiac issues, leading a user to avoid seeking emergency care.

Improper Output Handling

Improper output handling refers to a failure in managing, validating, formatting, or securing the data produced by an application or system before it is presented to the user or sent to another system. Some examples are failing to sanitize outputs, provide information about internal systems, and inconsistencies in output format/structure.

Risks

Regulatory Compliance
Liabilities

Example: A chatbot suggests that chest pain is due to acid reflux, overlooking potential cardiac issues, leading a user to avoid seeking emergency care.

Model Integrity and Characteristics

This section will look at the models being consumed and potential issues that can occur.

AI systems use LLM Models as a key component which are used to generate content based on the application they are a part of. In addition, systems like RAG use embedding models which are used during ingestion of data into a RAG system and are also used during inference time to find semantically similar content to curate content.

Understanding the Supply Chain – ML Bill of Material (MLBOM)

It’s important to understand the supply chain of the models being used in the systems being built. It’s important to have insights into creation, validation and verification at runtime. An artifact that outlines some aspects is MLBOM which has information about various aspects of the model, the data and other background.

This also enables ethical and safe usage by understanding the background, the datasheets, the licenses among other aspects.

Risks

Compliance
Misinformation and Hallucinations
Harmful and Malcontent Generation

Example: For a healthcare bot, which provides very important information to patients, it is extremely important to understand the DNA of the model.

Model Characteristics

Understanding model characteristics is essential from a governance and security standpoint when deploying or managing AI systems—especially in sensitive domains like finance, healthcare, or enterprise settings.

Characteristics such as bias, toxicity, hallucination, vulnerability must be understood. Understanding this

Risks

Compliance
Liabilities due to misinformation
Data leakage

Example: Let’s say we have a chatbot in a hospital. Without knowing the hallucination, the training cutoff, its ability to generate PHI or whether it can retain prompts, one can inadvertently expose patient data, deliver outdated medical advice, or violate HIPAA—despite having encrypted transport or strong access controls.

Model Drift

Model drift, also known as model decay, refers to the deterioration in a machine learning model’s performance over time due to changes in the underlying data or environment it was trained on. These changes cause the model’s predictions to become less accurate or reliable, as the data it encounters in production no longer aligns well with its training data.

Risks

Liabilities due to misinformation

Example: Imagine a healthcare chatbot trained on a dataset of patient interactions from early 2020, before the COVID-19 pandemic. The chatbot uses NLP to triage patient symptoms and make basic health recommendations. It would report the “shortness of breath, cough, and fever.” as a flu or seasonal cold and recommends rest and hydration.

Summary

LLMs are powerful yet fundamentally limited systems that require careful, deliberate security engineering. As this post outlines, their inability to differentiate privileges, respect data silos, or inherently validate inputs and outputs creates unique risks—from prompt injections and excessive agency to misinformation and model inversion. These issues aren’t theoretical—they manifest in real-world examples where sensitive data is leaked, user trust is compromised, and compliance is jeopardized. Enterprises must go beyond surface-level safety and deeply understand the nature and provenance of the models they use, the contexts in which they’re deployed, and how those models evolve over time. A secure AI system isn’t built on capability alone—it’s built on containment, control, and continuous oversight.

AI Security Series 4 – Model Usage

Building with Models

LLMs are not inherently secure. Trust and Security must be engineered— not assumed.

Security Concerns

Summary

Satyam Sinha

Leave a Reply Cancel reply

Newsletters

Building with Models

LLMs are not inherently secure. Trust and Security must be engineered— not assumed.

Security Concerns

Summary

Satyam Sinha

AI Security Series 3 -Datastores

AI Security Series 5 – Model Training

Leave a Reply Cancel reply