The Hidden Cost of Convenience: Why Your Generative AI Inputs May Never Be Truly Confidential

Jessica Eaves Mathews
Jul 29
8 min read

By Jessica Eaves Mathews

Generative AI platforms have revolutionized how we work, create, and gather information. Tools like ChatGPT, Perplexity.ai and Gemini offer incredible convenience, seemingly providing instant answers and creative solutions. But beneath the surface of this innovation lies a significant, often overlooked risk: the confidentiality of the data you input. As a recent legal development with OpenAI demonstrates, even if you are not sharing your inputs to help train the model you are using, your inputs might not be truly confidential. The comfort of knowing that ChatGPT/OpenAi will delete your chats and not use them to train their model may offer a false sense of security. However, your seemingly private conversations are potentially at risk of being made “public.”

Adding to the growing concerns around data privacy, OpenAI CEO Sam Altman recently voiced his own apprehension regarding the lack of legal confidentiality for inputs into generative AI platforms. Altman warned users – particularly those who might use ChatGPT for "therapy" or to discuss highly sensitive personal matters – that their conversations are not legally privileged in the same way discussions with a human therapist, lawyer, or doctor would be. He explicitly stated that if a lawsuit arises, OpenAI could be legally compelled to produce those chat logs, including deleted content, calling the situation "very screwed up" and emphasizing the urgent need for a new legal or policy framework that provides AI conversations with the same level of privacy protection enjoyed in traditional privileged relationships.

While Altman's statements highlight a genuine and significant user privacy concern, it's also important to acknowledge a potential conflict of interest. As the CEO of a leading AI company, stricter legal protections for user inputs, particularly those preventing compelled disclosure, would directly benefit OpenAI by reducing their legal obligations in litigation and potentially fostering greater user trust, which in turn could lead to increased adoption and data accumulation. The challenge for lawmakers will be to craft legislation that addresses these legitimate privacy fears, especially concerning attorney-client privilege and the protection of confidential and proprietary inputs, without completely undermining the fundamental obligations of AI companies to respond to legitimate subpoenas, comply with legal discovery processes, and cooperate with essential government oversight. This could involve creating specific statutory definitions for "AI-client privilege" that mirror traditional legal protections, establishing clear guidelines for data anonymization and encryption at the legal level, and imposing strict conditions and high evidentiary thresholds for compelling the disclosure of user data, thereby balancing innovation with robust privacy rights and the crucial demands of justice.

The Nuances of Data Handling: Beyond "Training"

When you engage with a generative AI tool, it's crucial to understand the dual nature of data handling. Most reputable platforms employ strong encryption and secure storage, and you can often opt out of allowing your data to "train" their models. This is a critical distinction:

Collecting and storing your data: Under their privacy policies, AI platforms often collect and store your prompts and outputs. This is distinct from using your inputs to train their models.
Using inputs for training: Many platforms offer settings to prevent your specific inputs from being used to further train their AI models, which can alleviate concerns about your proprietary information becoming part of the broader knowledge base.

However, even if you meticulously review terms of service and privacy policy, consult cybersecurity experts, and opt out of data training, a fundamental risk remains: the possibility of litigation holds and discovery disputes.

The OpenAI Precedent: A Wake-Up Call

A recent federal court order illustrates this very real issue in the copyright case New York Times v. OpenAI & Microsoft. As of mid-May 2025, Judge Wang in the Southern District of New York has placed a “litigation hold” on OpenAI. What is a litigation hold?

basically a legal instruction that tells a person or company to freeze and save all documents and information (including electronic ones) that might be relevant to a potential lawsuit.

Think of it like this:

Imagine you're about to get into a big argument with someone, and you know there's a chance it might end up in court. A litigation hold is like a judge or a lawyer telling you, "Okay, from this moment on, you cannot throw away or delete anything that has to do with this argument. You need to keep every email, every text message, every document, every note – even things you might normally get rid of after a while. We might need to look at it later as evidence."

Here's the key takeaway in layman's terms:

It's a "don't touch that!" order: It's a legal command to stop deleting or altering any information that might be needed in a pending or soon-to-be-filed lawsuit.
It applies broadly: It covers almost all kinds of information, from physical papers to emails, chat logs, social media posts, and even deleted files.
It's mandatory: If you're under a litigation hold, you must follow it, or you could face serious legal penalties.
It's about preserving evidence: The whole point is to make sure that all potential evidence is kept safe and can be reviewed if needed for a legal case.

Looping back to the OpenAI litigation hold, the court’s litigation hold order mandates that OpenAI preserve and segregate all ChatGPT chat logs, including those intended to be deleted by users, across all tiers (Free, Plus, Pro, Team, and even API users without Zero Data Retention).

What does this mean for you?

Indefinite Retention: Chat histories that OpenAI would normally auto-delete after 30 days under their privacy policy are now kept indefinitely under court order.
Broad Scope: This applies broadly, even to users not directly involved in the case. Your conversations, regardless of their content, are now potential legal evidence.
Controlled, Not Public, But Still Discoverable: While this doesn't mean every conversation is instantly public, the data is sequestered and accessible to a small, audited legal/security team for potential legal discovery. A second court order could make sensitive data visible to opposing parties.

This situation presents significant privacy and confidentiality risks. Users who believed their chats were temporary or private may find sensitive or even privileged data (especially from legal or healthcare queries) now held under court order. Even if ChatGPT's privacy policy allowed the use of non-ZDR (Zero Data Retention) data for training unless disabled, that same data is now subject to discovery.

Global Compliance Conflicts and Attorney-Client Privilege

This litigation hold also creates potential conflicts with global data privacy regulations. For instance:

GDPR (EU): Data retention beyond stated limits could violate user rights under Article 17 ("Right to be Forgotten").
CCPA/CPRA (California): This may raise questions regarding consumer consent and deletion rights.

For legal professionals and their clients, the implications are particularly dire. If confidential prompts or privileged communications were entered, they may now exist within litigation-scope data, potentially violating attorney-client privilege. While not automatically visible and currently only subject to a hold and sequestration, such data could become accessible through further court orders.

The Gold Standard: Zero Data Retention (ZDR)

Given these risks, the concept of Zero Data Retention (ZDR) emerges as the strongest safeguard for confidential information. A ZDR policy is a contractual and technical guarantee from an AI provider that it will not store or retain any data you send to or receive from its system.

With ZDR:

The provider does not log, store, reuse, or learn from your prompts (inputs) or the AI's responses (outputs).
Data is processed in memory only and is discarded immediately after the session.

ZDR is typically available only under enterprise or regulated contracts. Providers like OpenAI (with API ZDR flag enabled), Microsoft Azure OpenAI Service, Anthropic (with enterprise agreements), and Google Gemini (in some enterprise tiers) offer ZDR settings. Crucially, free or "Plus" versions of tools like ChatGPT do not offer ZDR. This likely makes ZDR level protections unattainable to all but wealthy individuals or corporations.

Sidenote:

AI companies often benefit significantly from a 30-day deletion rule for user inputs, as it provides them with a valuable window to analyze and leverage this data, even if they've committed not to use it for direct model training. During this period, companies can use the data for various purposes, such as improving core service reliability, debugging, monitoring for abuse or harmful content, and performing internal research and development to understand user behavior and identify new feature opportunities. For instance, by analyzing aggregated (and often anonymized) data on common queries, user engagement patterns, and error rates, they can optimize their infrastructure, refine their algorithms for better performance (without directly training on the content), and enhance the overall user experience. This retained data also serves as a crucial resource for security analysis, fraud detection, and compliance auditing, helping them identify and mitigate risks. While they may not "monetize" this data in the traditional sense of selling it directly, the insights gained contribute to product improvement and innovation, which in turn drives user growth, enhances customer retention, and justifies subscription tiers or enterprise solutions, ultimately leading to increased revenue and a stronger market position. Without this temporary retention, such crucial operational and analytical benefits would be severely limited, making it harder to maintain service quality and evolve their offerings.

It's clear that the technical solution to the confidentiality problem, Zero Data Retention (ZDR), already exists and is implemented for enterprise-tier clients and specific API uses by major generative AI companies. This demonstrates that these companies possess the capability to process user inputs and generate outputs without retaining any of that data. However, instead of making ZDR the default or only option for all users, these companies are actively advocating for regulatory frameworks that permit them to retain user data for at least 30 days. This practice primarily benefits the AI companies by allowing them a window to conduct vital operational tasks such as debugging, abuse monitoring, and improving service reliability through aggregated, anonymized insights. While these activities are presented as essential for product improvement, they inherently create a continuous risk for users, whose personal and confidential data remains stored and therefore vulnerable to litigation holds, accidental breaches, or misuse during this period. The question then becomes, why should individual users bear this persistent risk for the operational convenience and indirect monetization strategies of AI companies, especially when a demonstrably safer alternative (ZDR) is already available? The onus of protecting user data should unequivocally rest with the companies collecting and processing it, not with the users who are increasingly reliant on these powerful tools.

Best Practices for Confidential Work

In light of these evolving risks, particularly the OpenAI litigation hold, it is paramount to adopt rigorous best practices when using generative AI tools, especially with sensitive information:

Prioritize ZDR: If you are handling your own, client or patient data, processing legal or compliance-sensitive work, or operating under GDPR, HIPAA, or attorney-client privilege, Zero Data Retention is the best and safest option. It means no saving, no logging, no training, and no exposure.
Avoid Public/Free Tools for Confidential Data: Unless you are using a product explicitly governed by a ZDR agreement, do not input your own confidential information, or proprietary business information (including IP), client names or identifiers, case strategies, privileged communications, internal law firm documentation, or personally identifiable information (PII) into tools like ChatGPT.
Understand the "Train Model" Toggle: Even if you toggle off the option to let an AI system use your prompts and data to train its program, your data and prompts might still be saved for a period of time and could be handed over pursuant to a court order.
Consult Experts: If you are unsure about the security and privacy implications of an AI tool, always consult your IT and cybersecurity experts.
Redact When Possible: If ZDR solutions are not feasible, redact client information from your prompts to still leverage AI's benefits without compromising confidentiality.
Assume Discoverability: Just like when you use email, text messages or platforms like Google Workplace, assume that anything you type into consumer-level AI tools could be retained and discoverable—even if you delete it. Avoid using such platforms for any sensitive or client-specific content unless there are specific rules in place to protect that type of information, or unless you are in a protected data environment with ZDR.
Maintain Audit Trails: For all AI-assisted outputs, maintain clear audit trails and citations.
Stay Current: The landscape of AI and ethics is rapidly evolving. Continuously educate yourself on the latest developments and adjust your practices accordingly.

The convenience of generative AI is undeniable, but it comes with a significant responsibility to protect confidential information. The OpenAI litigation hold serves as a stark reminder that in the world of AI, there can never be an absolute guarantee of confidentiality, and proactive measures, especially the adoption of Zero Data Retention (ideally by the AI companies, but at least by you as an end-user), are critical for safeguarding sensitive data.