Computer Science PhD Student

Bar Ilan University

Hey! I am a second year PhD student in the Natural Language Processing Lab at Bar-Ilan University, supervised by prof. Yoav Goldberg. I am also a research intern at AI2 Israel.

I am interested in representation learning, analysis and interpretability of neural models, and the syntactic abilities of NNs. Specifically, I am interested in the way neural models learn distributed representations that encode structured information, in the way they utillize those representatons to solve tasks, and in our ability to control their content and map them back to interpretable concepts.

In my master’s, I have been studying the ability of NNs to acquire syntax without explicit supervision. During my PhD, I have been mainly working on developing techniques to selectively control the information encoded in neural representations, with some fun linguistic sidetours.

My CV is available here.

Interests

  • NLP
  • Representaton Learning
  • Interpretability

Education

  • MSc in Computer Science

    Bar Ilan University

  • BSc in Computer Science

    Bar Ilan University

  • BSc in Chemistry

    Bar Ilan University

Recent Activities

  • May 2022: Our paper Analyzing Gender Representation in Multilingual Models has received the best-paper award at RepL4NLP@ACL22.
  • May-June 2022: Visiting ETH Zurich (Again!)
  • October 2021-February 2022: Teaching an Introduction to Machine Learning course for brain science students.
  • October 2021: invited talk at the SIGTYP Lecture Series ( recording)
  • August-Septemebr 2021: Visiting student at Prof. Ryan Cotterell’s group, ETH Zurich.
  • January 2020: invited talk at NLPhD speaker series @ Saarland University.
  • December 2020: invited talk at prof. Roi Reichart’s group, Technion ( slides)
  • July 2020: presenting our paper (virtually) at ACL2020.
  • February-March 2020: Visiting student at prof. Tal Linzen’s research group, Johns Hopkins University.
  • March 2020: Visited prof. Bob Frank research group, Yale University.
  • January 2020: started an internship at AI2 israel.

Recent Publications

Adversarial Concept Erasure in Kernel Space

Neural representations of text can encode human-interpretable concepts, such as gender, in a nonlinear manner. We propose a kernalization of the linear concept-removal objective, and show that it is effective in guarding against the ability of ceratin nonlinear adversaries to recover the concept of interest; at the same time, it is difficult to guard against arbitrary nonlinear adversaries.

Linear Adversarial Concept Erasure

Can we prevent a model from encoding arbitrary concepts, such as gender, in its representations? We formulate the problem of identifying and erasing concept subspaces – linear subspaces whose removal prevents linear classification of concepts. We formulate the problem as a constrained instance of a general adversarial problem, show that existing techniques are not optimal for this task, and propose effective solutions.
Linear Adversarial Concept Erasure

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.

Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Which linguistic factors drive the behavior of neural LMs? we propose a method for the generation of counterfactual representations by altering how a given feature is encoded, while leaving intact all other aspects of the original representation. By measuring the change in a model’s word prediction behavior when these counterfactual representations are substituted for the original ones, we can draw conclusions about the causal effect of the linguistic feature in question on the model’s behavior.
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction

Measuring and Improving Consistency in Pretrained Language Models

We study the consistency of Pretrained Language Models (PLMs) with respect to factual knowledge.
Measuring and Improving Consistency in Pretrained Language Models

Contrastive Explanations for Model Interpretability

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both …
Contrastive Explanations for Model Interpretability

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Probing neural models for linguistic knowledge does not allow us to draw causal conclusions on the relation between the probed concepts and the behavior of the model. We propose a complementary probing technique which relies on behavioral interventions, focused on concepts we identify with Iterative Nullspace Projection (INLP).
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

We propose a way to derive word-level translation from multilingual BERT, and explicitly decompose its representations to a language-dependent component and a lexical, language-invariant component.

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Can we edit representations derived from neural representations in a post-hoc manner? We propose a data-driven projection method to selectively remove information from neural representation. When evaluated in the context of neutralizing gender information, we demonstrate that the method is highly effective in reducing bias while maintaining interpretability and tractability.
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Langauges differ in multiple ways, such as word order and morphological complexity. We generate synthetic variations of English to study, in a controlled manner, how does this complexity interact with the ability of neural models to learn the syntax of the lagnauge.
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Ab Antiquo: Neural Proto-language Reconstruction

We study whether neural models can learn the systemtic patterns of language evolution, and reconstruct proto-forms based on words on existing langauges.
Ab Antiquo: Neural Proto-language Reconstruction

Can LSTM Learn to Capture Agreement? The Case of Basque

Agreement prediction has been proposed as a task that implicitly tests acquisition on syntax. We use agreement prediction to study how do the models fare in this task when trained on a language with a complex morphology: Basque.

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

We propose a method to distill neural representations for syntax, discarding lexical inforamtion.

Recent Posts

Iterative Nullspace Projection (INLP)

This post describes INLP, an algorithm we’ve proposed for removing information from representations, as an alternative to adversarial removal methods. It uses linear algbera to “edit” the representation and control its content, and was found effective in mitigating gender bias.

Contact