Architecting Privacy as a Computing Resource in Data Aggregation Workloads: From Browser to Datacenter


Defense

Date: October 7th, 2025
Time: 12.00pm (EST)
Location: CSB 480 conference room, Department of Computer Science, Columbia University (non-Columbia affiliates need a guest pass to access the campus)
Zoom link: Register here to receive the link


Abstract

User privacy is a critical resource consumed by modern aggregation-oriented data workloads, such as machine learning, yet unlike other system resources, it is rarely accounted for, tracked, or managed to ensure that its consumption does not harm users. Differential privacy (DP) offers strong theoretical foundations for tracking this resource and protecting user privacy, but adoption in practice remains limited. This dissertation advances the thesis that DP's practicality and deployability can be substantially improved by following four principles. First, DP should be integrated into shared infrastructure such as browsers, databases, or datacenter platforms so that many applications can benefit, rather than relying on custom-built solutions as is common today. Second, privacy guarantees should be defined in semantics tailored to the context of a system, since traditional DP semantics may not always fit. Third, system design should be rooted in formal analyses that faithfully model system behavior, avoiding the misassessments that result from oversimplified models, as is customary today. Fourth, privacy should be treated as a first-class computing resource, drawing on resource management techniques from the systems literature, in addition to DP-specific methods, to address challenges such as budget exhaustion.

Applying these principles, we design, prototype, and evaluate three classes of DP systems, each demonstrating how they enhance DP's practicality and deployability. In the web domain, Cookie Monster introduces an efficient in-browser budgeting system for ad measurement APIs, leading to its adoption as the core privacy architecture of a W3C draft standard now supported by major browsers. Big Bird extends this foundation with quotas and fair scheduling to protect the shared privacy budget against adversarial depletion, showing how resource allocation techniques can defend privacy in adversarial environments and enabling DP adoption at scale in browsers. In datacenter workloads, PrivateKube integrates privacy as a native resource in Kubernetes with a scheduler tailored to its unique properties, while DPack improves efficiency by maximizing task throughput under fixed privacy guarantees; together, they illustrate how formally grounded design and resource-aware scheduling can overcome the budget exhaustion barrier. Finally, in analytics workloads, Turbo transforms the theoretical private multiplicative weights algorithm into a practical caching layer for DP databases that reuses past results to reduce privacy cost and improve accuracy, demonstrating how systems techniques and DP-specific mechanisms can make previously impractical algorithms usable in practice.

Together, these contributions establish a principled foundation for architecting privacy into modern data workloads, from the browser to the datacenter. By treating privacy as a resource, embedding it into infrastructure, and tailoring privacy guarantees and analyses to systems realities, this dissertation demonstrates how a systems approach to privacy can help advance the practicality, robustness, and deployability of differential privacy.


Resources

Dissertation: In-progress draft (PDF)

Non-technical explainers:


Related Publications