As enterprises increasingly adopt Large Language Models (LLMs), some choose to pre-train or fine tune models. This blog describes problems that one needs to be aware of when they are indeed training models.

In this part of the series we will outline when to use pretraining or finetuning, describe the security problems and explain with examples from the healthcare industry.

Why train a model?

Pretraining and finetuning are foundational steps in building effective AI systems, especially in contexts like healthcare, finance, law, and customer service where general-purpose models often fall short.

Pretraining is the process of training a large model on a vast and diverse dataset (usually general public data like Wikipedia, books, websites) to learn the basics of language, logic, structure, and reasoning. The benefits are Language Understanding, Knowledge Accumulation and Reusable Foundation. However, this comes at a cost and may have marginal benefit for your organization based on objectives.

An example of a healthcare use, pretraining ensures the model understands language and general knowledge about medicine from publicly available texts (e.g., Wikipedia, PubMed abstracts).

Finetuning on the other hand is the process of taking a pretrained model and adapting it to a specific domain, task, or style by training it further on a more focused dataset. The benefits of this process are Domain Specialization, Improved Accuracy, Task Alignment and Customization from tone and behavior.

Explaining this in the context of a healthcare use, finetuning with internal patient records, discharge summaries, or clinical trial reports ensures the model understands the specific way this hospital documents care, uses EHR systems, or describes treatments.

Security Concerns

Let’s break down the key security considerations specific to model training, and provide examples with a healthcare use case.

Data Privacy and Confidential IP Exposure

During training, if sensitive data such as patient information or proprietary IP such as medical research is included, it becomes part of the model’s internal memory. LLMs are not selective: what goes in can come out via the right prompt.

Risks

Compliance Violations
Privacy Breaches
Loss of customer trust
IP Leakage

Example: A hospital network finetunes an LLM on raw physician notes that include names, diagnoses, and personal health histories. Later, a model user innocently prompts it for “example cases of rare heart disease,” and the model outputs actual patient narratives—exposing private information.

Example: Confidential IP is just as vulnerable. A biotech firm includes unpublished research on a new surgical method in training data. The model then outputs these proprietary steps in response to a generic question about surgery techniques—compromising the company’s competitive edge.

Collapse of Access Controls and Data Silos

Training flattens all data silos. If you feed in datasets with different access levels (e.g., for physicians vs. researchers), the model has no way of enforcing those distinctions. It becomes a single surface area for all the information.

Adding controls post-training is insufficient. Access restrictions must be handled before or during training—not after.

Risks

Non compliance with least privilege access
Overexposure of data
Violation of existing RBAC.

Example: A healthcare provider maintains two datasets – full patient records (for physicians) and De-identified, limited-use datasets (for researchers). When these datasets are combined during finetuning, a researcher can be provided with sensitive identifiers from the full patient records—even though the researcher only asked about treatment options. This is an example of a data leak due to model training without guardrails during training phase.

Data Poisoning

If bad data is intentionally introduced into the training pipeline, it can poison the model—leading to manipulated or dangerous outputs.

Risks

Reputational Damage
Compromised outcomes and misinformation

Example: A disgruntled data scientist subtly modifies patient satisfaction scores to reflect that Drug A causes severe side effects, even though it doesn’t. After training, the model begins recommending alternatives over Drug A—even when it’s the clinically appropriate choice.

NOTE: Data poisoning is not only an insider threat vector, there are additional scenarios with other security breaches due to external attacks.

Modified Model Characteristics and confidence

Finetuning can dramatically shift a model’s behavior. It can not be assumed that because a base model was safe or reliable, the same applies after finetuning.

Finetuning is a simpler process but it isn’t just a “minor update”—it’s a transformation. Post-finetune evaluation should be as rigorous as initial model validation.

Risks

Misleading recommendations and outputs

Example: A clinic finetunes an LLM on 30 days of patient notes from its neurology department, intending to use the model for note summarization across all specialties. But because the data is too narrow, the model generalizes poorly, often hallucinating findings when asked about dermatology or pediatric care.

Training Data Quality and Analysis – Bias and Misinformation Injection

LLMs inherit the properties of their training data. If the data is biased, contains outdated treatments, or reflects systemic inequities, the model will reinforce these issues in outputs.

Risks

Bias, Discrimination and Misinformation
Compliance Violations

Example: A model is trained using patient records from an urban hospital where minority populations historically received different treatment recommendations. The model then learns to propose lower-tier treatments for similar cases when prompted—amplifying real-world bias.

Example: A model is trained using unvetted online forums where alternative therapies are discussed. It begins recommending unproven remedies for cancer care, believing them to be standard due to their frequency in the dataset.

Auditability

One must maintain a clear record of datasets used, history of mutations of datasets based on users and processes. In addition the infrastructure used and model integrity.

Risks

Regulatory and Compliance Challenges with respect to lineage of data
Lack of traceability
Ineffective model incident response

Example: After deployment, the model suggests a treatment protocol that’s 10 years outdated. When the team tries to investigate, they realize there’s no audit trail of what clinical documentation was used during finetuning—nor whether any quality controls were applied.

Model Theft

As an enterprise goes through model training it spends resources to build datasets, train the models and improve effectiveness. This model is now an IP for the enterprise. Insecure pipelines and repositories can cause loss of IP.

Risks

IP Loss
Potential Data Loss

Example: A healthcare company spends human resources to gather data and train models, iterates through this process but after getting to a successful point loses its model which is a high value asset due to lack of security.

Summary

Training is not just a technical step—it is a critical security boundary. Once sensitive or biased data enters a model, it cannot be trivially removed. The behavior of the model is forever shaped by what it saw during training.

In sensitive sectors like healthcare, organizations must:

Treat training data as high-risk material
Perform deep audits on data lineage and access controls
Implement secure, centralized model development workflows

By embedding security into the model training lifecycle, enterprises can build AI systems that are not only smart—but safe, trustworthy, and compliant.

AI Security Series 5 – Model Training

Why train a model?

Security Concerns

Data Privacy and Confidential IP Exposure

Collapse of Access Controls and Data Silos

Data Poisoning

Modified Model Characteristics and confidence

Training Data Quality and Analysis – Bias and Misinformation Injection

Auditability

Model Theft

Summary

Satyam Sinha

Leave a Reply Cancel reply

Newsletters

Links and Legal

Why train a model?

Security Concerns

Data Privacy and Confidential IP Exposure

Collapse of Access Controls and Data Silos

Data Poisoning

Modified Model Characteristics and confidence

Training Data Quality and Analysis – Bias and Misinformation Injection

Auditability

Model Theft

Summary

Satyam Sinha

AI Security Series 4 – Model Usage

MCP Server: The Dangers of “Plug-and-Play” Code

Leave a Reply Cancel reply