Recent Publications

Preserving Task-Relevant Information Under Linear Concept Removal

We introduce SPLICE, an oblique-projection method that lets us erase sensitive concepts from a representation while exactly preserving its covariance with the task label. Theory says SPLICE is the only projection that meets both goals. On Bias-in-Bios and WinoBias, we mitigate bias signals yet preserve main-task accuracy.

Floris Holstege*, Shauli Ravfogel*, Bram Wouters

PDF

Preserving Task-Relevant Information Under Linear Concept Removal

RELIC: Evaluating Compositional Instruction Following via Language Recognition

We present RELIC, a benchmark that lets us probe instruction following through language-recognition tasks. In RELIC we ask an LLM to decide whether a string belongs to a formal grammar, forcing it to compose many grammar rules retrieved from context. Because the grammars are synthetic, we can tune their complexity up or down and generate unlimited new samples—so there’s no data leakage. Testing today’s best LLMs, we find accuracy drops to chance once the grammars get moderately complex; and, concurrently, model behaviour shifts from true rule-following to shallow heuristics.

Jackson Petty, Michael Y. Hu, Wentao Wang, Shauli Ravfogel, William Merrill, Tal Linzen

PDF

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Gumbel Counterfactual Generation From Language Models

We conceptualize LMs as Generalized Causal Models (GCMs), enabling us to generate authentic counterfactual strings from a given input string. By leveraging the Gumbel-Max trick, we separate the deterministic computations of the LM’s forward pass from the inherent randomness of the sampling process. We identify the noise responsible for generating a specific string and reuse the same noise when generating a counterfactual string from the model, post-intervention.

Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson, Ryan Cotterell

PDF Code

Gumbel Counterfactual Generation From Language Models

GRADE: Quantifying Sample Diversity in Text-to-Image Models

We propose an automatic pipeline for quantifying a notion of diversity in the generation of text2image models.

Royi Rassin, Aviv Slobodkin, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

PDF

GRADE: Quantifying Sample Diversity in Text-to-Image Models

Language Concept Erasure for Language-invariant Dense Retrieval

We use information-erasure techniques to make retrieval models more multilingual and language-invariant.

Zhiqi Huang, Puxuan Yu, Shauli Ravfogel, James Allan

PDF

Language Concept Erasure for Language-invariant Dense Retrieval

Representation Surgery: Theory and Practice of Affine Steering

We present a theory of linear steering of LM representations, and derive optimal steering intervenetions.

Shashwat Singh*, Shauli Ravfogel*, Jonathan Herzig, Roee Aharoni, Ryan Cotterell, Ponnurangam Kumaraguru

PDF Code

Representation Surgery: Theory and Practice of Affine Steering

Natural Language Counterfactuals through Representation Surgery

We present a method to convert representation counterfactuals into string counterfactuals, allowing us to analyze the linguistic changes resulting from interventions in the representation space of language models. This approach helps us understand the specific textual modifications made and can be used to mitigate bias in classification through data augmentation.

Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel

PDF

Natural Language Counterfactuals through Representation Surgery

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

We explore how large language models (LLMs) handle (un)answerable queries and often exhibit hallucinatory behavior due to overconfidence. Our findings indicate that LLMs encode the answerability of queries, with the first decoded token being a strong indicator, revealing new insights into their latent representations and suggesting pathways for improved factual adherence in decoding techniques.

Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel

PDF

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Retrieving Texts based on Abstract Descriptions

We identify the task of retrieving sentences based on abstract descriptions of their content. We demonstrate the inadequacy of current text embeddings and propose an alternative model that significantly improves when used in standard nearest neighbor search.

Shauli Ravfogel, Valentina Pyatkin, Amir DN Cohen, Avshalom Manevich, Yoav Goldberg

PDF Slides

Retrieving Texts based on Abstract Descriptions

Linear Guardedness and its Implications

We foramlly define linear guardedness as the inability to linear predict a concept from the representation. We link intrinsic and extrinsic fairness, and show that in the multiclass setting, downstream linear classifiers can recover some of the linearly removed information about the concept of interest.

Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

PDF Code

Conformal Nucleus Sampling

We assess whether nucleus (top-p) sampling is aligned with its probabilistic meaning in various linguistic contexts. We employ conformal prediction, a calibration procedure that focuses on the construction of minimal prediction sets according to a desired confidence level, to calibrate the parameter p as a function of the entropy of the next word distribution. We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.

Shauli Ravfogel, Yoav Goldberg, Jacob Goldberger

PDF Code

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

We point out to surprising flaws in the way text2image models map words to visual concepts. For instance, we demonstrate a semantic leakage between different words in the prompt, and cases where words with multiple meanings are depicted with all their meanings at once.

Royi Rassin*, Shauli Ravfogel*, Yoav Goldberg

PDF Code

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

Analyzing Gender Representation in Multilingual Models

We study the extent to which the concept of gender is represented cross-lingually in multilingual models. Our analysis shows that gender representations consist of several prominent components that are shared across languages, alongside language-specific components. The existence of language-independent and language-specific components provides an explanation for an intriguing empirical observation we make: while gender classification transfers well across languages, interventions for gender removal, trained on a single language, do not transfer easily to others. (Best paper at RepL4NLP@ACL22!)

Hila Gonen, Shauli Ravfogel, Yoav Goldberg

PDF

Adversarial Concept Erasure in Kernel Space

Neural representations of text can encode human-interpretable concepts, such as gender, in a nonlinear manner. We propose a kernalization of the linear concept-removal objective, and show that it is effective in guarding against the ability of ceratin nonlinear adversaries to recover the concept of interest; at the same time, it is difficult to guard against arbitrary nonlinear adversaries.

Shauli Ravfogel, Francisco Vargas, Yoav Goldberg, Ryan Cotterell

PDF Code Poster

Linear Adversarial Concept Erasure

Can we prevent a model from encoding arbitrary concepts, such as gender, in its representations? We formulate the problem of identifying and erasing concept subspaces – linear subspaces whose removal prevents linear classification of concepts. We formulate the problem as a constrained instance of a general adversarial problem, show that existing techniques are not optimal for this task, and propose effective solutions.

Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

PDF Code Poster Slides

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.

Elad Ben-Zaken, Shauli Ravfogel, Yoav Goldberg

PDF Code

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Which linguistic factors drive the behavior of neural LMs? we propose a method for the generation of counterfactual representations by altering how a given feature is encoded, while leaving intact all other aspects of the original representation. By measuring the change in a model’s word prediction behavior when these counterfactual representations are substituted for the original ones, we can draw conclusions about the causal effect of the linguistic feature in question on the model’s behavior.

Shauli Ravfogel*, Grusha Prasad*, Tal Linzen, Yoav Goldberg

PDF Code Slides DOI

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Measuring and Improving Consistency in Pretrained Language Models

We study the consistency of Pretrained Language Models (PLMs) with respect to factual knowledge.

Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav Goldberg

PDF DOI

Measuring and Improving Consistency in Pretrained Language Models

Contrastive Explanations for Model Interpretability

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both …

Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg

PDF DOI

Contrastive Explanations for Model Interpretability

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Probing neural models for linguistic knowledge does not allow us to draw causal conclusions on the relation between the probed concepts and the behavior of the model. We propose a complementary probing technique which relies on behavioral interventions, focused on concepts we identify with Iterative Nullspace Projection (INLP).

Yanai Elazar, Shauli Ravfogel, Alon Jacovi, Yoav Goldberg

PDF Code

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

We propose a way to derive word-level translation from multilingual BERT, and explicitly decompose its representations to a language-dependent component and a lexical, language-invariant component.

Hila Gonen, Shauli Ravfogel, Yanai Elazar, Yoav Goldberg

PDF DOI

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Can we edit representations derived from neural representations in a post-hoc manner? We propose a data-driven projection method to selectively remove information from neural representation. When evaluated in the context of neutralizing gender information, we demonstrate that the method is highly effective in reducing bias while maintaining interpretability and tractability.

Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg

PDF Code Slides Video DOI

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Langauges differ in multiple ways, such as word order and morphological complexity. We generate synthetic variations of English to study, in a controlled manner, how does this complexity interact with the ability of neural models to learn the syntax of the lagnauge.

Shauli Ravfogel, Yoav Goldberg, Tal Linzen

PDF Code Slides DOI

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Ab Antiquo: Neural Proto-language Reconstruction

We study whether neural models can learn the systemtic patterns of language evolution, and reconstruct proto-forms based on words on existing langauges.

Carlo Meloni*, Shauli Ravfogel*, Yoav Goldberg

PDF Slides DOI

Ab Antiquo: Neural Proto-language Reconstruction

Can LSTM Learn to Capture Agreement? The Case of Basque

Agreement prediction has been proposed as a task that implicitly tests acquisition on syntax. We use agreement prediction to study how do the models fare in this task when trained on a language with a complex morphology: Basque.

Shauli Ravfogel, Francis M. Tyers, Yoav Goldberg

PDF DOI

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

We propose a method to distill neural representations for syntax, discarding lexical inforamtion.

Shauli Ravfogel*, Yanai Elazar*, Jacob Goldberger, Yoav Goldberg

PDF DOI

Faculty Fellow

NYU

Interests

Education

Recent Activities

Recent Publications

Preserving Task-Relevant Information Under Linear Concept Removal

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Gumbel Counterfactual Generation From Language Models

GRADE: Quantifying Sample Diversity in Text-to-Image Models

Language Concept Erasure for Language-invariant Dense Retrieval

Representation Surgery: Theory and Practice of Affine Steering

Natural Language Counterfactuals through Representation Surgery

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Retrieving Texts based on Abstract Descriptions

Linear Guardedness and its Implications

Conformal Nucleus Sampling

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

Analyzing Gender Representation in Multilingual Models

Adversarial Concept Erasure in Kernel Space

Linear Adversarial Concept Erasure

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Measuring and Improving Consistency in Pretrained Language Models

Contrastive Explanations for Model Interpretability

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Ab Antiquo: Neural Proto-language Reconstruction

Can LSTM Learn to Capture Agreement? The Case of Basque

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Recent Posts

Iterative Nullspace Projection (INLP)

Contact