JUNE 17, 202613 MIN READAI AGENTS

AI and GDPR: is it safe for businesses?

AI and GDPR: is it safe to let AI handle your company data? How the legal basis works, the biggest risks, the ChatGPT trap and a practical checklist to follow.

BYFILIP THAI

AI and GDPR is the question that stops more AI projects than any technology does: do you dare let AI near your customer data? The short answer is that AI is not unsafe in itself. The risk sits in how the data flows, which service you choose and what routines you have.

This article sticks to data protection in practice: what GDPR actually requires, where data leaks, whether ChatGPT is safe to use and a checklist you can follow before AI touches a single piece of personal data. The rules in the EU AI Act are a separate matter, and we cover those in the guide to the EU AI Act for European businesses.

Is AI safe for businesses?

AI is not unsafe in itself. Safety is decided by which service you choose and what routines you put in place, not by the technology. According to the Swedish data protection authority's guidance on GDPR and AI, the data protection rules apply as soon as personal data is processed when AI is built or used, and your company is then normally the data controller, responsible for making sure it happens lawfully.

The same underlying model can therefore be both safe and unsafe. ChatGPT on a free private login and the same model under a business agreement with data stored in the EU are technically almost identical, but from a data protection standpoint two completely different worlds. That is where most people go wrong: they ask "is AI safe?" when the real question is "how do we use AI?".

Who is responsible for the data?

GDPR distinguishes between the controller and the processor, and the difference is decided by who determines the purpose, not by who owns the technology. If you decide what the data will be used for, you are the controller. The AI vendor that processes the data on your behalf is the processor. Responsibility cannot be outsourced to a vendor by blaming their model. The rule of thumb for this whole article: when you control the data flow, you control the risk.

GDPR sets three basic requirements before AI touches personal data: you need a legal basis (Article 6), a data processing agreement with the vendor (Article 28) and technical security measures (Article 32). The European Data Protection Board (EDPB) also established in Opinion 28/2024 that an AI model does not automatically count as anonymous.

Legal basis and data minimisation

Before a piece of personal data is entered into an AI tool, you need one of the six legal bases in Article 6 of the GDPR. For businesses it is most often consent, performance of a contract or legitimate interest.

The EDPB confirms that legitimate interest can work, but it then requires a three-step balancing test you can show: that the interest is legitimate, that the processing is genuinely necessary for it, and that it does not override the rights of the individual. The whole assessment should be done before you start, not constructed after the fact if someone complains.

The two principles that are hardest to uphold with AI are purpose limitation and data minimisation: enter what the model actually needs, not everything that happens to be available. A common mistake is uploading an entire customer register for a task that only needs one column.

Processing agreements and actual responsibility

As soon as a vendor processes personal data on your behalf, you need a written data processing agreement that governs instructions, security and sub-processors. That enforcement is real was shown by the Italian data protection authority Garante, which fined OpenAI 15 million euros in December 2024, the first GDPR fine against a generative AI tool. This is not a scare tactic. It simply shows that the rules apply even to the largest vendors.

What do the security requirements mean in practice?

Article 32 requires appropriate technical and organisational measures, and for AI that means a few concrete things: encrypt data in transit and at rest, control who may enter and read it, pseudonymise where possible and review regularly that the protection still holds.

What counts as appropriate depends on how sensitive the data is. A tool that summarises public texts needs less protection than one that handles patient records, but some level of measure is always required when personal data is involved. This is also where documentation becomes your friend: if you can show which measures you chose and why, you already have half the answer when a customer or authority asks.

What are the biggest AI security risks?

The biggest leak is rarely the vendor, but the employee who pastes customer data into a free AI tool. Think in three steps: what data flows into the AI, what can go wrong and how do you reduce the risk? IBM's 2025 report shows that such ungoverned use was involved in roughly 20 percent of all data breaches and drove up the cost by an average of 670,000 dollars per breach.

Six places where data can leak

The risks are concrete and can be closed one at a time:

Prompt leakage: sensitive data is pasted into a prompt and ends up outside your control.
Training on your input: the vendor uses your inputs to improve the model.
Storage and retention: prompts and answers are kept longer than you think.
Sub-processors: the vendor passes the data on to a third party.
Weak access control: more people than necessary can enter and read the data.
Shadow AI: employees use their own, ungoverned tools on work data.

Shadow AI: the hidden leak

The last point is the most dangerous, precisely because it is invisible. When Samsung's engineers pasted source code into ChatGPT three times over roughly three weeks in 2023, the company introduced an internal AI ban.

The pattern is broad: according to several surveys, around three in four employees who use AI enter work data into chatbots, often through private accounts the employer cannot see. It is rarely malice, but convenience, and that is exactly why an outright ban rarely holds in the long run.

IBM notes that a large majority of the organisations hit by AI-related breaches lacked basic access controls. The solution is not a ban, but an approved tool, a clear rule about what may never be pasted in, and a simple alternative that is just as smooth as the free tool.

It depends on which ChatGPT and how you use it. The consumer version (Free, Plus and Pro) trains on your conversations by default unless you turn it off. The business versions (Enterprise, Business and Edu) and the API do not train on your business data by default, and with a processing agreement and data storage in the EU they can be GDPR-compliant.

Turn off training in the consumer version

If the team uses free ChatGPT on real data, the first measure is to turn off model training under Settings, Data controls, so that your chats are not used to improve the model. It takes ten seconds, but it actually has to be done, and it only solves one of the risks above.

No training is not the same as no storage

Even when a service does not train on your data, it may store it. OpenAI's enterprise terms describe that API data can be kept for a short period for abuse monitoring, with the option of zero retention for eligible customers, plus data storage and processing within the EU. Anthropic similarly describes that Claude does not train on your business data.

The large vendors have also expanded data storage within the EU for business customers, and Claude can run in European regions via cloud platforms such as AWS Bedrock and Google Vertex. The difference from the consumer version is therefore not the technology, but the agreement, where the data is stored and which settings you chose. The real risk is not the tool, but that it is used without governance.

How do you build AI safely with privacy by design?

Build the protection in from the start instead of patching it afterwards: choose a vendor with data inside the EU, sign a processing agreement, turn off training, set a deletion routine and minimise the data before it is entered. The EDPB places a clear responsibility on whoever uses AI to check how the model was built, so that a breach further up the chain does not rub off on you.

In practice, privacy by design begins before the first prompt. Remove or pseudonymise names and identity numbers the model does not need, put sensitive data behind a separate access control and decide in advance how long data may be kept. The most expensive mistake is collecting everything with the idea of cleaning up later. Clean up before the data even reaches the AI, because what is never entered can never leak.

We are an AI agency ourselves, so we have to practise what we preach. In practice that means all data is processed within the EU, that we sign processing agreements when needed, build with privacy by design and never let your data be used to train AI models. We are also tool-agnostic, so the choice of platform is governed by what is safest for you, not by a licence we happen to sell.

What matters is not that you choose us, but that these are the requirements you should set for whoever you hire. We describe what an AI project looks like from start to finish in the guide to AI agents for European businesses.

Checklist: AI and data protection

Here is a hands-on list to go through before AI touches personal data. Take it from the top down, because the first points decide the later ones.

Map the data flow. List which prompts, documents and customer or personal data go into the AI and where they end up. You cannot protect data you do not know you are sharing.
Decide the legal basis and minimise. Establish the basis under Article 6 before AI touches data, and enter only what is actually needed. Sensitive data (health, ethnicity) also requires an explicit exemption.
Choose EU data and sign a processing agreement. Use a vendor that stores data within the EU or EEA and govern the processing in a DPA under Article 28. If data ends up outside the EU, a valid transfer mechanism and your own assessment are required.
Turn off training on your input. Make sure it happens both in the setting and in the agreement. A model is not automatically anonymous, and the risk that it memorises data is real.
Set retention. Decide how long prompts and data are kept, and check that deletion actually happens technically. Many services keep history by default.
Provide an approved tool and train staff. A sanctioned tool with a simple rule about what may never be pasted in is what actually stops shadow AI.
Log and limit access. Control which roles may enter which data, and log usage so that a possible breach can be investigated.
Carry out an impact assessment when needed. For sensitive data, profiling or large-scale processing, a DPIA under Article 35 is required, done before deployment.
Have an incident routine. Prepare the 72-hour notification to the supervisory authority and require in the agreement that the vendor alerts you without undue delay.
Be transparent and review the vendor. Tell customers and employees how AI is used, and check that the model was not obviously built on unlawfully collected data.

If you go through the ten points you have not only reduced the risk, you also have the documentation you need if an authority or a customer asks how you handle data. That is where safe AI goes from a feeling to something you can show.

Frequently asked questions

01Are employees allowed to use free ChatGPT at work?

Technically yes, but that is exactly where the biggest leak arises, because the consumer version trains on inputs by default and you cannot see what is shared. The solution is not a ban but an approved tool with data storage in the EU plus a clear rule about what may never be pasted in.

02Do we need consent to let AI process customer data?

Not necessarily. Consent is one of six legal bases in Article 6, and for businesses performance of a contract or legitimate interest is more often the right one. Legitimate interest requires a documented three-step balancing test. What matters is that you have a basis and can show it, not that it has to be consent.

03What applies if the AI vendor stores data outside the EU?

Then a valid transfer mechanism is required, either that the vendor is certified under the Data Privacy Framework or that you use standard contractual clauses, plus your own assessment of the protection level. The simplest path is to choose a service with data storage within the EU or EEA from the start, which the large vendors now offer.

04Who is responsible if the AI vendor leaks our data?

Your company is the data controller and bears the main responsibility towards customers and the authority, even when a vendor caused the leak. The processing agreement governs the vendor's obligations, and you must report serious incidents to the supervisory authority within 72 hours. Responsibility cannot be contracted away by blaming the vendor.

05Is it enough to turn off AI training on our data?

No. Turning off training removes one risk, but data can still be stored, logged and accessed by more people than necessary. No training is not the same as no storage. You also need to set retention, limit access and choose where the data is stored to cover the whole picture.

06Do we have to carry out an impact assessment for our AI?

It depends on the data. An impact assessment (DPIA) under Article 35 is required for sensitive data, profiling, automated decisions or large-scale processing. For simpler internal AI without personal data it is usually not needed. If you are unsure, a short DPIA is cheap insurance, and it should be done before deployment, not after.

AUTHOR

Filip ThaiCEO & Founder

AI consultant focused on automation and AI agents for SMBs. Builds solutions that actually deliver measurable savings.