Publicly-Detectable Watermarking for Language Models

stp2yJanuary 7, 20250 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 27 Oct 2023 (v1), last revised 4 Jan 2025 (this version, v4)]

View a PDF of the paper titled Publicly-Detectable Watermarking for Language Models, by Jaiden Fairoze and 4 other authors

View PDF
HTML (experimental)

Abstract:We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Submission history

From: Jaiden Fairoze [view email]
[v1]
Fri, 27 Oct 2023 21:08:51 UTC (1,038 KB)
[v2]
Mon, 27 May 2024 09:24:16 UTC (420 KB)
[v3]
Tue, 28 May 2024 06:10:45 UTC (420 KB)
[v4]
Sat, 4 Jan 2025 13:52:49 UTC (354 KB)

Source link
lol

By stp2y