Literature Review of Provenance.
Membership inference: given a machine learning model and a record, determine whether this record was used as part of the model’s training dataset or not.
The adversary’s access wto the model is limited to black-box queries that return the model’s output oon a given input.
Backdoor Attack : lead a model to infer the desired output with a stealthy included trigger. As backdoor attack can output the desired pattern, it could be used for watermarks.
MI : given a machine learning model and a record, determine whether this record was used as part of the model’s training dataset or not.
MI+Seq-to-Seq : Given black-box access to an MT model, is it possible to determine whether a particular sentence pair was in the training set for that model?
Simple one-off attacks based on shadow models, which proved successful in classification problems, are not successful on sequence generation problems: this is a result that favors the defender. Nevertheless, we describe the specific conditions where sequence-to-sequence models still leak private information.
A training set for MT consists of a set of sentence pairs ${ (f_i^{(d)}, e_i^{(d)})}$. We use a label $d\in { \ell_1, \ell_2, \cdots, }$ to indicate the domain (the subcorpus or the data source).
Specifically, there is a potential risk to individual’s privacy if their private data is used to train these LLMs without any authorization. This has incited significant apprehension regarding both data privacy and copyright protection.
To prevent the illegal or unauthorized usage of private text data in LLMs, individuals should be able to verify whether or not a company used their personal text to train a model.
This calls for a secure watermarking system to guarantee copyright protection through leakage tracing or ownership identification.
Our full method improves upon the previous work on robustness by +16.8% point on average on four datasets, three corruption types, and two corruption ratios.
Digital watermarking is a technology that enables the embedding of information into multimedia in an unnoticeable way without degrading the original utility of the content.
Deep watermarking has emerged as a new paradigm that improves the three key aspects of watermarking: payload (i.e., the number of bits embedded), robustness (i.e., the number of bits embedded), robustness (i.e., accuracy of the extracted message), and quality of embedded media.
Previous research has focused on techniques such as lexical substitution with predefined rules and dictionaries or structural transformation.
Recent work have either replaced the predefined set of rules with learning-based methodology, thereby removing heuristics or vastly improved the quality of lexical substitutions.
A well-known proposition of a classical image watermarking work: That watermarks should “be placed explicitly in the perceptually most significant components” of an image. If this is achieved, the adversary must corrupt the content’s fundamental structure to destroy the watermark. This degrades the utility of the original content, rendering the purpose of pirating futile.
Modification in individual pixels is much more imperceptible than on individual words. But to this, while we adhere to the gist of the proposition, we do not embed directly on the most significant component. Instead, we identify features that are semantically or syntactically fundamental components of the text and thus, invariant to minor modifications in texts.
A third party with knowledge of the has function and random number generator can re-produce the red list for each token and count how many times the red list rule is violated.
$H_0$ : The text sequence is generated with no knowledge of the red list rule.
Spark Entropy
the spike entropy of $p$ with modulus $z$ as
\[S(p,z) = \sum_k \frac{p_k}{1+z p_k}\]
Copyright
The watermark enables proactive detection of Ai-generated content in the future: a content is AI-generated if a similar watermark can be extracted from it.
An image is predicted as AI-generated if the bitwise accuracy of the decoded watermark is larger than a threshold $\tau$, where bitwise accuracy is the fraction of matched bits in the decoded watermark and the ground-truth one. The threshold should be larger than 0.5 since the bitwise accuracy of original images without watermarks would be around 0.5.
Robustness against post-processing, which post-processes an AI-generated image, is crucial for a watermark-based detector.
As threshold $\tau$ in a prediction increased, the false positive rate decreases as we have less number of positive predictions.
An image $I$ is predicted as AI-generated if the bitwise accuracy of its decoded watermark is larger than a threshold $\tau$, i.e., $BA(D(I), w)> \tau$. Bitwise accuracy is computed by
\[BA(D(I_0), w) = \frac{m}{n}\]The service provider should pick the ground-truth watermark $w$ uniformly at random. Thus, the decoded watermark $D(I_0)$ is not related to the randomly picked $W$, and each bit of $D(I_0)$ is not related to the randomly picked $w$, and each bit of $D(I_0)$ matches with the corresponding bit of $w$ with probability 0.5. As a result, $m$ is a random variable and follows a binominal distribution $B(n, 0.5)$.
Therefore, the FPR of the single-tail detector with threshold $\tau$ can be calculated as follows:
\[\begin{aligned} FPR (\tau) &= Pr(BA(D(I_0), w) > \tau) \\ &= Pr(m > n\tau) = \sum_{k= \lceil n\tau \rceil}^n \begin{pmatrix} n \\ k \end{pmatrix} \frac{1}{2^k} \frac{1}{2^{n-k}} \end{aligned}\]Thus, to make FPR rate less than $\eta$, we should find $\tau$ such that
\(\tau^{*} = \arg \min_{\tau} \sum_{k= \lceil n\tau \rceil}^n \begin{pmatrix} n \\ k \end{pmatrix} < \eta\) For instance, when $n=256$ and $\eta = 10^{-4}$, we have $\tau \ge \tau^* \approx 0.613$.
As adversarial attack can evade the watermark, the bitwise accuracy should remain at 0.5. The authors proposed a double-tail detector that detects an image $I$ asa AI-generated if its decoded watermark ahs a bitwise accuracy larger than $\tau$ or smaller than $1-\tau$, i.e., $BA(D(I), w) > \tau$ or $BA(D(I), w) < 1-\tau$. The FPR of the double-tail detector with threshold $\tau$ is:
\[\begin{aligned} FPR_{double}(\tau) &= Pr(BA(D(I_o), w)> \tau \operatorname{ or } BA(D(I_o), w) < 1-\tau) \\ &= Pr(m>n\tau \operatorname{ or } m < n -n\tau) = 2 \sum_{k=\lceil n\tau \rceil}^n \begin{pmatrix} n \\ k \end{pmatrix} \frac{1}{2^n} \end{aligned}\]Copyright
Provenance and watermarking methods can improve traceability, alleviate concerns about the origin of the AI generated or edited content, and promote trust among parties.
If properly vetted against adversarial manipulation, they can also help states use AI-generated products more confidently, knowing that the outcomes can be traced back to their source.
Coalition for Content Provenance and Authenticity (C2PA), whose members include Adobe, Microsoft, Intel, and so on, is an industry-led initiative that develops technical standards for establishing the source and history of media content.
C2PA specifications, provenance methods can be split between “hard” and “soft” binding
More Info on PAI’s Responsible Practices for Synthetic Media
Watermarking can serve as a verification mechanism to confirm the authenticity and integrity of AI generations. Watermarking involves embedding low probability sequences of tokens into the outputs produced by AI systems.
Drawbacks : Watermarks are not tamper-proof (변조 방지). Bad actors can use “paraphrasing attacks” to (1) remove text watermarks, (2) spoofing to infer hidden watermark signatures, or even (3) add watermarks to authentic content.
Removal of Watermark Evading Watermark based Detection of AI-Generated Content
Open provenance standards and open sourcing AI detection technologies should be encouraged to help reduce the cost of security.
The proliferation of foundation models means that provenance and watermarking is unlikely to be applied evenly by all developers.
Copyright
LLMs are built on the transformer neural network architecture, which in turn relies on a mathematical computation called Attention that uses the softmax function. \
To solve copyright regression for the softmax function, we show that the objective function of the softmax copyright regression is convex, and that its Hessian is bounded.
[Gil 19 ] investigates copyright infringement in AI-generated artwork and argues that using copyrighted works during the training phase of AI programs does not result in infringement liability.
[VKB23] proposes a frameowkr that provides stronger protection against sampling protected content, by defining near access-freeness (NAF)
($\tau$-Copyright-Protected )
If there is a trained model $f_\theta$ with parameter $\theta$ that satisfies $$ \frac{L(f_\theta(A_1), b_1)}{n_1} \ge \tau + \frac{L(f_\theta(A_2)), b_2}{n_2} $$ then we say this model $f_\theta$ is $\tau$-Copyright-Protected.