Large Language Models (LLMs) have proven powerful, but the risk of privacy
leakage remains a significant concern. Traditional privacy-preserving methods,
such as Differential Privacy and Homomorphic Encryption, are inadequate for
black-box API-only settings, demanding either model transparency or heavy
computational resources. We propose Prompt2Forget (P2F), the first framework
designed to tackle the LLM local privacy challenge by teaching LLM to forget.
The method involves decomposing full questions into smaller segments,
generating fabricated answers, and obfuscating the model’s memory of the
original input. A benchmark dataset was crafted with questions containing
privacy-sensitive information from diverse fields. P2F achieves zero-shot
generalization, allowing adaptability across a wide range of use cases without
manual adjustments. Experimental results indicate P2F’s robust capability to
obfuscate LLM’s memory, attaining a forgetfulness score of around 90% without
any utility loss. This represents an enhancement of up to 63% when contrasted
with the naive direct instruction technique, highlighting P2F’s efficacy in
mitigating memory retention of sensitive information within LLMs. Our findings
establish the first benchmark in the novel field of the LLM forgetting task,
representing a meaningful advancement in privacy preservation in the emerging
LLM domain.

Prompt2Forget (P2F): Advancing Privacy Preservation in Large Language Models (LLMs)

Large Language Models (LLMs) have become increasingly powerful in various domains, from natural language processing to machine translation. However, with their vast capabilities comes the risk of privacy leakage, which remains a significant concern. Privacy-preserving methods such as Differential Privacy and Homomorphic Encryption have been traditionally used but are not suitable for black-box API-only settings, where model transparency or heavy computational resources are required.

To address this challenge, we introduce Prompt2Forget (P2F), a framework specifically designed to tackle the local privacy challenge in LLMs. P2F takes a unique approach by teaching LLMs to forget sensitive information. The method involves decomposing full questions into smaller, less identifiable segments, generating fabricated answers, and obfuscating the model’s memory of the original input.

A benchmark dataset was carefully crafted with questions containing privacy-sensitive information from diverse fields, ensuring the evaluation of P2F’s effectiveness across various use cases. Remarkably, P2F demonstrates zero-shot generalization, meaning it can adapt to a wide range of scenarios without requiring manual adjustments.

Experimental results showcase P2F’s robust capability to obfuscate LLM’s memory, achieving an impressive forgetfulness score of around 90% without any utility loss. This represents an enhancement of up to 63% compared to the naive direct instruction technique. These results highlight the efficacy of P2F in mitigating memory retention of sensitive information within LLMs.

This research significantly advances the field of privacy preservation in the emerging domain of LLMs. By establishing the first benchmark for the novel LLM forgetting task, P2F establishes itself as a meaningful step towards addressing the privacy concerns associated with LLMs. Additionally, P2F highlights the multi-disciplinary nature of the concepts involved, requiring a deep understanding of natural language processing, privacy preservation techniques, and machine learning.

Read the original article