Study on Causal Abstraction for Distributed Alignment Search.

What is Causal Abstraction?

Causal Abstraction is an approach to simplify the relationships between multiple variables in a complex system, making it easier to understand the key causal relationships. This process involves the following steps:

This process provides several benefits:

Problems

What is interchange intervention accuracy; IIA?

Interchange intervention accuracy (IIA) is a graded measure of abstraction that computes the proportion of aligned interchange interventions on the algorithm and neural network that have the same output.

Distributed Interchange Interventions

which are “soft” interventions in which the causal mechanisms of a group of neurons are edited such that (1) their values are rotated with a change-of-basis matrix. (2) the targeted dimensions of the rotated neural representation are fixed to be the corresponding values in the rotated neural representation created for the source inputs. (3) the representation is rotated back to the standard neuron-aligned basis.

Remark: The target dimensions is the eigenvector of the rotation matrix.