About

I am currently a Ph.D. candidate in the Department of Electrical and Computer Engineering at the University of Wisconsin–Madison, advised by Prof. Kangwook Lee and Prof. Robert D. Nowak. Before coming to Madison, I received my B.S. in Electrical and Computer Engineering from Rutgers University, where I was advised by Prof. Waheed U. Bajwa.

For more about my experiences, please check out my CV.

My research interests span the theoretical and practical aspects of signal processing and machine learning. I focus on bridging theory and practice in deep learning by developing theory for modern paradigms such as multi-task learning and implicit neural representations, and using these insights to guide the design of practical systems. More recently, I have become interested in the foundations of large language model pretraining, including optimization dynamics, memorization, and generalization in large-scale models.

Publications

A New Neural Kernel Regime: The Inductive Bias of Multi-Task Learning

Julia Nakhleh, Joseph Shenouda, Robert D. Nowak.

Advances in Neural Information Processing Systems (NeurIPS) 2024

We show that when a neural network is trained on many tasks at once, it behaves much more predictably and produces a unique, smooth solution — similar to classical kernel methods. In contrast, training on a single task can lead to many very different solutions, even when they fit the data equally well.

Figure from A New Neural Kernel Regime: The Inductive Bias of Multi-Task Learning.

Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression

Joseph Shenouda, Rahul Parhi, Kangwook Lee, Robert D. Nowak.

Journal of Machine Learning Research (JMLR) 2024

We introduce a variation space framework that explains how multi-output ReLU networks share structure across tasks in multi-task learning. This leads to principled methods for improving generalization and compressing networks without sacrificing performance.

ReLUs Are Sufficient for Learning Implicit Neural Representations

Joseph Shenouda, Yamin Zhou, Robert D. Nowak.

International Conference on Machine Learning (ICML) 2024

We show that standard ReLU activations yield an ill-conditioned loss landscape for INR problems. We then propose a B-spline wavelet–inspired modification to ReLU and demonstrate its effectiveness on standard INR tasks. Finally, we connect these findings to my prior work on measuring function regularity for efficient hyperparameter selection.

Figure from ReLUs Are Sufficient for Learning Implicit Neural Representations.

A Continuous Transform for Localized Ridgelets

Joseph Shenouda, Rahul Parhi, Robert D. Nowak.

Sampling Theory and Applications Conference (SampTA) 2023 (contributed talk)

A Guide to Computational Reproducibility in Signal Processing and Machine Learning

Joseph Shenouda and Waheed U. Bajwa.

IEEE Signal Processing Magazine 2023

Industry Experience

Amazon AGI Foundations — Applied Scientist Intern

Fall 2025 · Host: Abhishek Kumar

  • Trained and analyzed 0.5B–1B parameter LLMs under Warmup–Stable–Decay (WSD) and Cosine LR schedules in controlled pretraining settings.
  • Designed synthetic factual injection datasets to isolate memorization, forgetting, and consolidation across pretraining phases.
  • Demonstrated that factual consolidation under WSD is confined to cooldown, while Cosine is injection-time invariant at matched loss.
  • Developed scheduler-aware diagnostics based on relative weight update magnitude to predict knowledge acquisition capacity.
  • Built custom datasets, training hooks, and evaluation metrics for token-level factual retention.

NEC Research Labs — Machine Learning Research Intern

Summer 2025

  • Researched mathematical foundations for in-context learning in Large Language Models (LLMs).
  • Developed theoretical constructions for how LLMs can learn in-context models and conducted computational experiments to verify the theory.

MIT Lincoln Laboratory — Summer Research Intern

Summer 2021

Teaching

University of Wisconsin–Madison

  • (Teaching Assistant) ECE/CS 761: Mathematical Methods in Machine Learning — Spring 2024. Delivered 3 lectures throughout the semester and organized weekly problem solving sessions.
  • (Teaching Assistant) ECE 203: Signals, Information and Computation — Fall 2024. Prepared weekly lab assignments and assisted students through weekly office hours. Awarded the ECE 2025 TA Excellence Award for outstanding performance.
  • (Teaching Assistant) ECE 888: Nonparametric Methods in Data Science — Spring 2025.
  • (Teaching Assistant) ECE 532: Matrix Methods in Machine Learning — Spring 2026.

Workshop Papers

A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks

Joseph Shenouda, Rahul Parhi, Kangwook Lee, Robert D. Nowak.

International Conference on Machine Learning (ICML) Duality Principles for Modern ML Workshop (contributed talk)

Watch the talk video Opens on SlidesLive

A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets

Liu Yang, Jifan Zhang, Joseph Shenouda, Dimitris Papailiopoulos, Kangwook Lee, Robert D. Nowak.

Neural Information Processing Systems (NeurIPS) OPT-ML Workshop 2022

Preprints

PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks

Liu Yang, Jifan Zhang, Joseph Shenouda, Dimitris Papailiopoulos, Kangwook Lee, Robert D. Nowak.

In Review