Faculty Fellow

NYU

Hey! I am a Postdoctoral Researcher and Faculty Fellow (a linear combination of a faculty and a postdoc) at the NYU Center of Data Science. I earned my PhD from the Natural Language Processing Lab at Bar-Ilan University, supervised by Prof. Yoav Goldberg.

My research focuses on analyzing and controlling the internal representations of generative models, particularly language models. I study how neural networks encode structured information, use it to solve tasks, and represent interpretable concepts. I try—sometimes even successfully—to develop mathematically-principled approaches to interpretability.

During my PhD, I’ve worked on techniques to selectively control information in neural representations, with some fun linguistic side tours. More recently, I’ve explored framing LMs as causal models and tackling questions of learnability in a controlled setting.

Feel free to reach out if you have questions about my work or if you’re interested in potential collaborations in these areas! You can also find my CV here.

Interests

  • NLP
  • Representaton Learning
  • Interpretability

Education

  • MSc in Computer Science

    Bar Ilan University

  • BSc in Computer Science

    Bar Ilan University

  • BSc in Chemistry

    Bar Ilan University

Recent Activities

  • September - December 2024: Visting at ETH Zurich.
  • September 2024: I have received the IAAI Best PhD Thesis Award.
  • June 2024: Happy to have received the Blavatnik Prize for Outstanding Israeli Doctoral Students in Computer Science.
  • July 2023: Invited talks at J.P Morgan and DeepMind.
  • Summer 2023: Internship at Bloomberg London.
  • December 2022: Happy to have received the Bloomberg Data Science PhD Fellowship!
  • May 2022: Our paper Analyzing Gender Representation in Multilingual Models has received the best-paper award at RepL4NLP@ACL22.
  • May-June 2022: Visiting ETH Zurich (Again!)
  • October 2021-February 2022: Teaching an Introduction to Machine Learning course for brain science students.
  • October 2021: invited talk at the SIGTYP Lecture Series ( recording)
  • August-Septemebr 2021: Visiting student at Prof. Ryan Cotterell’s group, ETH Zurich.
  • January 2020: invited talk at NLPhD speaker series @ Saarland University.
  • December 2020: invited talk at prof. Roi Reichart’s group, Technion ( slides)
  • July 2020: presenting our paper (virtually) at ACL2020.
  • February-March 2020: Visiting student at prof. Tal Linzen’s research group, Johns Hopkins University.
  • March 2020: Visited prof. Bob Frank research group, Yale University.
  • January 2020: started an internship at AI2 israel.

Recent Publications

Gumbel Counterfactual Generation From Language Models

We conceptualize LMs as Generalized Causal Models (GCMs), enabling us to generate authentic counterfactual strings from a given input string. By leveraging the Gumbel-Max trick, we separate the deterministic computations of the LM’s forward pass from the inherent randomness of the sampling process. We identify the noise responsible for generating a specific string and reuse the same noise when generating a counterfactual string from the model, post-intervention.
Gumbel Counterfactual Generation From Language Models

GRADE: Quantifying Sample Diversity in Text-to-Image Models

We propose an automatic pipeline for quantifying a notion of diversity in the generation of text2image models.
GRADE: Quantifying Sample Diversity in Text-to-Image Models

Language Concept Erasure for Language-invariant Dense Retrieval

We use information-erasure techniques to make retrieval models more multilingual and language-invariant.
Language Concept Erasure for Language-invariant Dense Retrieval

Representation Surgery: Theory and Practice of Affine Steering

We present a theory of linear steering of LM representations, and derive optimal steering intervenetions.
Representation Surgery: Theory and Practice of Affine Steering

Natural Language Counterfactuals through Representation Surgery

We present a method to convert representation counterfactuals into string counterfactuals, allowing us to analyze the linguistic changes resulting from interventions in the representation space of language models. This approach helps us understand the specific textual modifications made and can be used to mitigate bias in classification through data augmentation.
Natural Language Counterfactuals through Representation Surgery

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

We explore how large language models (LLMs) handle (un)answerable queries and often exhibit hallucinatory behavior due to overconfidence. Our findings indicate that LLMs encode the answerability of queries, with the first decoded token being a strong indicator, revealing new insights into their latent representations and suggesting pathways for improved factual adherence in decoding techniques.
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Retrieving Texts based on Abstract Descriptions

We identify the task of retrieving sentences based on abstract descriptions of their content. We demonstrate the inadequacy of current text embeddings and propose an alternative model that significantly improves when used in standard nearest neighbor search.
Retrieving Texts based on Abstract Descriptions

Linear Guardedness and its Implications

We foramlly define linear guardedness as the inability to linear predict a concept from the representation. We link intrinsic and extrinsic fairness, and show that in the multiclass setting, downstream linear classifiers can recover some of the linearly removed information about the concept of interest.
Linear Guardedness and its Implications

Conformal Nucleus Sampling

We assess whether nucleus (top-p) sampling is aligned with its probabilistic meaning in various linguistic contexts. We employ conformal prediction, a calibration procedure that focuses on the construction of minimal prediction sets according to a desired confidence level, to calibrate the parameter p as a function of the entropy of the next word distribution. We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
Conformal Nucleus Sampling

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

We point out to surprising flaws in the way text2image models map words to visual concepts. For instance, we demonstrate a semantic leakage between different words in the prompt, and cases where words with multiple meanings are depicted with all their meanings at once.
DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

Analyzing Gender Representation in Multilingual Models

We study the extent to which the concept of gender is represented cross-lingually in multilingual models. Our analysis shows that gender representations consist of several prominent components that are shared across languages, alongside language-specific components. The existence of language-independent and language-specific components provides an explanation for an intriguing empirical observation we make: while gender classification transfers well across languages, interventions for gender removal, trained on a single language, do not transfer easily to others. (Best paper at RepL4NLP@ACL22!)

Adversarial Concept Erasure in Kernel Space

Neural representations of text can encode human-interpretable concepts, such as gender, in a nonlinear manner. We propose a kernalization of the linear concept-removal objective, and show that it is effective in guarding against the ability of ceratin nonlinear adversaries to recover the concept of interest; at the same time, it is difficult to guard against arbitrary nonlinear adversaries.

Linear Adversarial Concept Erasure

Can we prevent a model from encoding arbitrary concepts, such as gender, in its representations? We formulate the problem of identifying and erasing concept subspaces – linear subspaces whose removal prevents linear classification of concepts. We formulate the problem as a constrained instance of a general adversarial problem, show that existing techniques are not optimal for this task, and propose effective solutions.
Linear Adversarial Concept Erasure

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Which linguistic factors drive the behavior of neural LMs? we propose a method for the generation of counterfactual representations by altering how a given feature is encoded, while leaving intact all other aspects of the original representation. By measuring the change in a model’s word prediction behavior when these counterfactual representations are substituted for the original ones, we can draw conclusions about the causal effect of the linguistic feature in question on the model’s behavior.
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Measuring and Improving Consistency in Pretrained Language Models

We study the consistency of Pretrained Language Models (PLMs) with respect to factual knowledge.
Measuring and Improving Consistency in Pretrained Language Models

Contrastive Explanations for Model Interpretability

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both …
Contrastive Explanations for Model Interpretability

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Probing neural models for linguistic knowledge does not allow us to draw causal conclusions on the relation between the probed concepts and the behavior of the model. We propose a complementary probing technique which relies on behavioral interventions, focused on concepts we identify with Iterative Nullspace Projection (INLP).
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

We propose a way to derive word-level translation from multilingual BERT, and explicitly decompose its representations to a language-dependent component and a lexical, language-invariant component.

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Can we edit representations derived from neural representations in a post-hoc manner? We propose a data-driven projection method to selectively remove information from neural representation. When evaluated in the context of neutralizing gender information, we demonstrate that the method is highly effective in reducing bias while maintaining interpretability and tractability.
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Langauges differ in multiple ways, such as word order and morphological complexity. We generate synthetic variations of English to study, in a controlled manner, how does this complexity interact with the ability of neural models to learn the syntax of the lagnauge.
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Ab Antiquo: Neural Proto-language Reconstruction

We study whether neural models can learn the systemtic patterns of language evolution, and reconstruct proto-forms based on words on existing langauges.
Ab Antiquo: Neural Proto-language Reconstruction

Can LSTM Learn to Capture Agreement? The Case of Basque

Agreement prediction has been proposed as a task that implicitly tests acquisition on syntax. We use agreement prediction to study how do the models fare in this task when trained on a language with a complex morphology: Basque.

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

We propose a method to distill neural representations for syntax, discarding lexical inforamtion.

Recent Posts

Iterative Nullspace Projection (INLP)

This post describes INLP, an algorithm we’ve proposed for removing information from representations, as an alternative to adversarial removal methods. It uses linear algbera to “edit” the representation and control its content, and was found effective in mitigating gender bias.

Contact