Proceedings 2019 - Part 1     Proceedings 2019 - Part 2

Proceedings 2020 - Part 1     Proceedings 2020 - Part 2

Proceedings 2021 - Part 1     Proceedings 2021 - Part 2

Proceedings 2023 - Part 1     Proceedings 2023 - Part 2

 

The course project can be categorized as a literature review, original research, or a literature review that leads to original research.

You are encouraged to team up. Optimal size for a team is 3 students: deviations from this number will be considered given the circumstances (either there is no one to team up, or you have your own research you want to push forward, or the project is difficult and allows more team-members).

Milestones


Suggested List of Projects & Papers

Below is a list of potential project ideas. You are welcome to propose your own, subject to instructor approval.


Topic 1: Efficient Model Adaptation and Low-Rank Training

The rise of massive foundation models has made full fine-tuning computationally prohibitive. This area explores parameter-efficient techniques that modify only a small fraction of a model’s weights.

Project Focus: This project is heavily implementation-focused. The goal is to implement and compare three or more recent LoRA-like methods. A successful project would involve fine-tuning a small-to-medium language model (e.g., GPT-2, Llama-2-7B) on a downstream task and benchmarking the methods on both task performance (e.g., accuracy) and computational efficiency (memory usage, training time).


Topic 2: Mixture-of-Experts (MoEs) and Learning to Specialize

MoE models activate only a fraction of their parameters for any given input, enabling massive model scale with constant computational cost. This topic explores the optimization and design of these architectures.

Project Focus: A blend of literature review and implementation. Review the evolution of routing algorithms in MoEs. The implementation could involve building a simple MoE layer on top of a standard transformer block and experimenting with different routing strategies. The report should discuss the trend towards parameter-efficient and dynamic routing.


Topic 3: The Quest for the Optimal Learning Rate

The learning rate is arguably the most critical hyperparameter in deep learning. This project area explores the frontier of adaptive and “learning-rate-free” optimization algorithms that aim to automate this choice.

Project Focus: A mix of theoretical review and implementation. The team should compare the theoretical assumptions and guarantees behind 2-3 of the listed methods. The implementation component would involve benchmarking these adaptive methods against standard SGD with momentum and Adam on a well-known vision or language task, analyzing their convergence speed and final performance.


Topic 4: Optimization for Distributed, Federated, and Asynchronous Systems

How do you train a model when data is decentralized and workers are unreliable? This area studies optimization algorithms that are robust to communication delays, client drift, non-IID data, and asynchronicity.

Project Focus: A mix of review and optional implementation. Review the core challenges in federated/distributed optimization. Implement a simple simulation of Federated Averaging (FedAvg) and introduce artificial delays to observe their effect. The report should analyze recent theoretical advances that provide convergence guarantees under these challenging, realistic conditions.


Topic 5: Scalable Training via Model Decomposition (IST)

Instead of training a monolithic network, what if we could break it into smaller parts that can be trained semi-independently? This topic explores various strategies for model decomposition to enable efficient, distributed training.

Project Focus: A literature review project. Compare and contrast different decomposition strategies, such as Independent Subnetwork Training (IST), vertical decomposition, and layer-wise approaches like in Resist. A strong report will synthesize these ideas and discuss their respective trade-offs in terms of communication cost, scalability, and model performance. A strong report also would consider extending the optimizer used in IST from simple SGD to a momentum/Adam type.


Topic 6: AI for Scientific Discovery & Material Science

Optimization is at the heart of modern scientific discovery, powering generative models for molecules and crystals, and driving the “active learning” loop in autonomous labs that decide the next experiment to run.

Project Focus: This is a literature review project focusing on problem formulation. The student should select one sub-area (e.g., crystal generation) and deeply analyze how the scientific goal is translated into a tractable optimization problem. What are the objective functions? What are the constraints (e.g., chemical validity)? What specific algorithms (e.g., diffusion models, VAEs) are used to solve them? Teams with students from non-CS/non-ECE background will be given priority: consider your own discipline and how AI can be used in your field. I’m open to discuss more areas: chemistry, civil engineering, weather forecasting, physics, etc.


Topic 7: Second-Order Optimization in Deep Learning

While first-order methods like SGD and Adam dominate, second-order methods, which use curvature (Hessian) information, promise faster convergence. The challenge lies in making them scalable.

Project Focus: This is a theory, implementation and literature review project. Compare and contrast how different methods (like K-FAC and Shampoo) approximate the Hessian to make computation feasible. A deep dive into the trade-offs between per-iteration cost, memory, and convergence rate would make for a strong report. Consider implementation of K-FAC, Shampoo, etc in small neural networks.


Topic 8: Optimization in Quantum Machine Learning

Training quantum circuits as machine learning models presents unique optimization challenges not found in classical settings, such as the barren plateau phenomenon.

Project Focus: A literature review and an implementation focused on the translation of classical optimization concepts to the quantum realm. Students should explore the challenges of gradient estimation on quantum hardware and review algorithms designed to navigate the complex loss landscapes of variational quantum algorithms. This is more about teaching the audience about the different ideas and settings considered in quantum optimization, compared to classical optimization. It is also about putting in code these ideas with existing packages like PennyLane that simulate quantum computation. The topic can be narrowed down down the line.


Topic 9: Embedding Solvers as Differentiable Neural Network Layers

What if a layer in your network was an optimization solver? This paradigm allows for embedding constrained optimization, combinatorial solvers, or even physics simulations directly into end-to-end trainable models.

Project Focus: Review-heavy with a potential implementation component. The core task is to review the role of the Implicit Function Theorem in enabling backpropagation through a solver. The goal is to move beyond the convex case. A more advanced project could involve implementing a simple differentiable optimization layer using a library like cvxpylayers (if convex) or using something like julianonconvex.


Topic 10: The Interplay of Energy-Based Models and Optimization

Energy-Based Models (EBMs) offer a flexible framework for generative modeling but are notoriously difficult to train. This project explores the unique optimization challenges they present.

Project Focus: A theoretical review. What are the connections between the properties of the energy function and the optimization landscape? Review the role of MCMC-based methods for gradient estimation (like contrastive divergence) and discuss how these sampling methods interact with the optimizer’s convergence. What is the “twist”? The twist is to connect EBM training to the broader class of saddle-point optimization problems.