Publications

(2023). On the Robustness of Intrinsic Evaluation Metrics for Stereotypes in Language Models. (Copy available upon request).

(2023). Language model acceptability judgements are not always robust to context..

PDF

(2019). Unsupervised Word Translation Through Images.