AI Security Series 5 – Model Training

As enterprises increasingly adopt Large Language Models (LLMs), some choose to pre-train or fine tune models. This blog describes problems that one needs to be aware of when they are indeed training models.

In this blog we will outline when to use pretraining or finetuning, describe the problems and explain with examples from the healthcare industry. 

Why train a model?

Pretraining and finetuning are foundational steps in building effective AI systems, especially in contexts like healthcare, finance, law, and customer service where general-purpose models often fall short. 

Pretraining is the process of training a large model on a vast and diverse dataset (usually general public data like Wikipedia, books, websites) to learn the basics of language, logic, structure, and reasoning. The benefits are Language Understanding, Knowledge Accumulation and Reusable Foundation. However, this comes at a cost and may have marginal benefit for your organization based on objectives. 

An example of a healthcare use, pretraining ensures the model understands language and general knowledge about medicine from publicly available texts (e.g., Wikipedia, PubMed abstracts).

Finetuning on the other hand is the process of taking a pretrained model and adapting it to a specific domain, task, or style by training it further on a more focused dataset. The benefits of this process are Domain Specialization, Improved Accuracy, Task Alignment and Customization from tone and behavior.

Explaining this in the context of a healthcare use, finetuning with internal patient records, discharge summaries, or clinical trial reports ensures the model understands the specific way this hospital documents care, uses EHR systems, or describes treatments.

Security Concerns

Let’s break down the key security considerations specific to model training, and provide examples with a healthcare use case. 

  1. Data Privacy and Confidential IP Exposure

During training, if sensitive data such as patient information or proprietary IP such as medical research is included, it becomes part of the model’s internal memory. LLMs are not selective: what goes in can come out via the right prompt.

Risks

  • Compliance Violations 
  • Privacy Breaches 
  • Loss of customer trust
  • IP Leakage

Example: A hospital network finetunes an LLM on raw physician notes that include names, diagnoses, and personal health histories. Later, a model user innocently prompts it for “example cases of rare heart disease,” and the model outputs actual patient narratives—exposing private information.

Example: Confidential IP is just as vulnerable. A biotech firm includes unpublished research on a new surgical method in training data. The model then outputs these proprietary steps in response to a generic question about surgery techniques—compromising the company’s competitive edge.

  1. Collapse of Access Controls and Data Silos

Training flattens all data silos. If you feed in datasets with different access levels (e.g., for physicians vs. researchers), the model has no way of enforcing those distinctions. It becomes a single surface area for all the information.

Adding controls post-training is insufficient. Access restrictions must be handled before or during training—not after.

Risks

  • Non compliance with least privilege access
  • Overexposure of data 
  • Violation of existing RBAC.

Example: A healthcare provider maintains two datasets – full patient records (for physicians) and De-identified, limited-use datasets (for researchers). When these are combined during finetuning. Later, a researcher asks the model about treatment outcomes, and the model responds with sensitive identifiers from the full patient records—even though the researcher should never have had access.

  1. Data Poisoning

If bad data is intentionally introduced into the training pipeline, it can poison the model—leading to manipulated or dangerous outputs.

Risks

  • Reputational Damage 
  • Compromised outcomes and misinformation 

Example: A disgruntled data scientist subtly modifies patient satisfaction scores to reflect that Drug A causes severe side effects, even though it doesn’t. After training, the model begins recommending alternatives over Drug A—even when it’s the clinically appropriate choice.

NOTE: Data poisoning is not only an insider threat vector, there are additional scenarios with other security breaches due to external attacks.

  1. Modified Model Characteristics and confidence

Finetuning can dramatically shift a model’s behavior. It can not be assumed that because a base model was safe or reliable, the same applies after finetuning.

Finetuning is a simpler process but it isn’t just a “minor update”—it’s a transformation. Post-finetune evaluation should be as rigorous as initial model validation.

Risks

  • Misleading recommendations and outputs 

Example: A clinic finetunes an LLM on 30 days of patient notes from its neurology department, intending to use the model for note summarization across all specialties. But because the data is too narrow, the model generalizes poorly, often hallucinating findings when asked about dermatology or pediatric care.

  1. Training Data Quality and Analysis – Bias and Misinformation Injection

LLMs inherit the properties of their training data. If the data is biased, contains outdated treatments, or reflects systemic inequities, the model will reinforce these issues in outputs.

Risks

  • Bias, Discrimination and Misinformation 
  • Compliance Violations 

Example: A model is trained using patient records from an urban hospital where minority populations historically received different treatment recommendations. The model then learns to propose lower-tier treatments for similar cases when prompted—amplifying real-world bias.

Example: A model is trained using unvetted online forums where alternative therapies are discussed. It begins recommending unproven remedies for cancer care, believing them to be standard due to their frequency in the dataset.

  1. Auditability 

One must maintain a clear record of datasets used, history of mutations of datasets based on users and processes. In addition the infrastructure used and model integrity. 

Risks

  • Regulatory and Compliance Challenges with respect to lineage of data
  • Lack of traceability
  • Ineffective model incident response

Example: After deployment, the model suggests a treatment protocol that’s 10 years outdated. When the team tries to investigate, they realize there’s no audit trail of what clinical documentation was used during finetuning—nor whether any quality controls were applied.

  1. Model Theft 

As an enterprise goes through model training it spends resources to build datasets, train the models and improve effectiveness. This model is now an IP for the enterprise. Insecure pipelines and repositories can cause loss of IP.

Risks

  • IP Loss 
  • Potential Data Loss 

Example: A healthcare company spends human resources to gather data and train models, iterates through this process but after getting to a successful point loses its model which is a high value asset due to lack of security. 

Summary

Training is not just a technical step—it is a critical security boundary. Once sensitive or biased data enters a model, it cannot be trivially removed. The behavior of the model is forever shaped by what it saw during training.

In sensitive sectors like healthcare, organizations must:

  • Treat training data as high-risk material
  • Perform deep audits on data lineage and access controls
  • Implement secure, centralized model development workflows

By embedding security into the model training lifecycle, enterprises can build AI systems that are not only smart—but safe, trustworthy, and compliant.

http://acuvity.ai

​Satyam Sinha is the Co-founder and CEO of Acuvity, an AI security company focused on providing enterprises with visibility, governance, and granular controls over employee use of AI applications. He has a significant background in building enterprise products across infrastructure and security. Prior to Acuvity, he co-founded Aporeto Inc., a machine identity-based cybersecurity startup that was acquired by Palo Alto Networks.


Leave a Reply

Your email address will not be published. Required fields are marked *