# Ali Ramezani-Kebrya

I am an Associate Professor (with tenure) in the Department of Informatics at the University of Oslo (UiO). Before joining UiO, I was a Senior Scientific Collaborator at EPFL, working with Prof. Volkan Cevher in Laboratory for Information and Inference Systems (LIONS). Before joining LIONS, I was an NSERC Postdoctoral Fellow at the Vector Institute in Toronto working with Prof. Daniel M. Roy. I received my Ph.D. from the University of Toronto where I was very fortunate to be advised by Prof. Ben Liang and Prof. Min Dong. My Ph.D. research was focused on developing theory and practices for next generation large-scale distributed and heterogeneous networks. I am a member of the ELLIS Society.

I work in machine learning and study **scalability**, **robustness**, **privacy**, **generalization**, and **stability** aspects of machine learning algorithms. In particular, I am developing **highly scalable**, **privacy-preserving**, and **robust** algorithms to train very large models in a distributed manner. Our algorithms can be used in so-called *federated learning* settings, where a deep model is trained on data distributed among multiple owners who cannot necessarily share their data, e.g., due to privacy concerns, competition, or by law. I also study the design of the underlying architecture, e.g, **neural networks** over which a learning algorithm is applied, in particular, the fundamental question of *How much should we overparameterize a neural network?*

# Recent News

- 6/2023: Our paper
*Federated Learning under Covariate Shifts with Generalization Guarantees*has been published in**Transactions on Machine Learning Research**. - 4/2023: I am now a PI at the Visual Intelligence!
- 3/2023: I gave a talk titled Scalable and Robust Deep Learning at the Visual Intelligence.
- 1/2023: We have an open Ph.D. position. Deadline: 28 Feb, 2023.
- 1/2023: Our paper
*Distributed Extra-gradient with Optimal Complexity and Communication Guarantees*has been accepted to**ICLR 2023**. - 1/2023: I am now a member of the ELLIS Society!
- 1/2023: I joined the Department of Informatics at the University of Oslo!
- 12/2022: I gave a talk titled
`Randomization Improves Deep Learning Security`

at the Annual Workshop of the VILLUM Investigator Grant at Aalborg University. - 10/2022: Our paper
*MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks*has been published in**Transactions on Machine Learning Research**. - 8/2022: I gave a talk titled
`How Did DL Dominate Today’s ML? What Challenges and Limitations Remain?`

at the University of Oslo. - 6/2022: I gave a talk titled
`Scalable ML: Communication-efficiency, Security, and Architecture Design`

at the University of Edinburgh. - 2/2022: I gave a talk titled
`Scalable ML: Communication-efficiency, Security, and Architecture Design`

at the University of Liverpool. - 09/2021: Our paper
*Subquadratic Overparameterization for Shallow Neural Networks*has been accepted to**NeurIPS 2021**.

# Selected Publications

Even for a single client, the distribution shift between training and test data, i.e., intra-client distribution shift,has been a major challenge for decades. For instance, scarce disease data for training and test in a local hospital can be different. We focus on the **overall generalization** performance on multiple clients and modify the classical ERM to obtain an unbiased estimate of an overall true risk minimizer under **intra-client and inter-client covariate shifts**, develop an efficient density ratio estimation method under stringent privacy requirements of federated learning, and show importance-weighted ERM achieves smaller generalization error than classical ERM.

Ali Ramezani-Kebrya*, Fanghui Liu*, Thomas Pethick*, Grigorios Chrysos, and Volkan Cevher, **Federated Learning under Covariate Shifts with Generalization Guarantees**, Transactions on Machine Learning Research, June 2023.

pdf code openreview

Beyond supervised learning, we accelerate large-scale monotone variational inequality problems with applications such as training GANs in distributed settings. We propose **quantized generalized extra-gradient (Q-GenX) family of algorithms** with the optimal rate of convergence and achieve noticeable speedups when training GANs on multiple GPUs without performance degradation.

Ali Ramezani-Kebrya*, Kimon Antonakopoulos*, Igor Krawczuk*, Justin Deschenaux*, and Volkan Cevher, **Distributed Extra-gradient with Optimal Complexity and Communication Guarantees**, ICLR 2023.

pdf bib code openreview

ML models are vulnerable to various attacks at training and test time including data/model poisoning and adversarial examples. We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed. **MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks** increases computational complexity of designing tailored attacks for an informed adversary.

Ali Ramezani-Kebrya*, Iman Tabrizian*, Fartash Faghri, and Petar Popovski, **MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks**, Transactions on Machine Learning Research, Oct. 2022.

pdf bib code arXiv openreview

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. In **Subquadratic Overparameterization for Shallow Neural Networks**, we achieve the best known bounds on the number of parameters that is sufficient for gradient descent to converge to a global minimum with linear rate and probability approaching to one.

Chaehwan Song*, Ali Ramezani-Kebrya*, Thomas Pethick, Armin Eftekhari, and Volkan Cevher, **Subquadratic Overparameterization for Shallow Neural Networks**, NeurIPS 2021.

pdf bib code arXiv openreview

In training deep models over multiple GPUs, the communication time required to share huge stochastic gradients is the main performance bottleneck. We closed the gap between theory and practice of unbiased gradient compression. **NUQSGD** is currently the method offering the highest communication-compression while still converging under regular (uncompressed) hyperparameter values.

Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, and Daniel M. Roy, **NUQSGD: Provably Communication-Efficient Data-Parallel SGD via Nonuniform Quantization**, Journal of Machine Learning Research, vol. 22, pp. 1-43, Apr. 2021.

pdf bib code arXiv

Communication-efficient variants of SGD are often heuristic and fixed over the course of training. In **Adaptive Gradient Quantization for Data-Parallel SGD**, we empirically observe that the statistics of gradients of deep models change during the training and introduce two adaptive quantization schemes. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups.

Fartash Faghri*, Iman Tabrizian*, Ilya Markov, Dan Alistarh, Daniel M. Roy, and Ali Ramezani-Kebrya, **Adaptive Gradient Quantization for Data-Parallel SGD**, NeurIPS 2020.

pdf bib code arXiv

# Students

- Supervision at the University of Oslo
- Zhiyuan Wu, Ph.D. in progress, University of Oslo.

- Co-supervision at the University of Toronto and EPFL
- Thomas Michaelsen Pethick, Ph.D. in progress, EPFL.
- Igor Krawczuk, Ph.D. in progress, EPFL.
- Fabian Latorre, Ph.D. in progress, EPFL.
- Xiangcheng Cao, master in progress, EPFL.
- Seydou Fadel Mamar, master in progress, EPFL.
- Mohammadamin Sharifi, summer intern, EPFL.
- Wanyun Xie, MSc KTH, first job after graduation: Ph.D. at EPFL.
- Fartash Faghri, Ph.D. UoT, first job after graduation: research scientist at Apple.
- Iman Tabrizian, M.A.Sc. UoT, first job after graduation: full-time engineer at NVIDIA.