Research AgendaToday’s computer systems have been continuously evolving to catch up with the demands of modern society. The technological progress is stretching the boundaries of what is possible, creating new unprecedented operational challenges. To that end our work focuses on enhancing computer systems with secure and efficient designs. We conduct rigorous evaluations of computer system vulnerabilities and employ hardware-software codesign techniques towards their mitigation. Research ProjectsMicroservice Management (Research Areas: Cloud, Distributed Systems)We are developing a lightweight and efficient microservice resource manager that, unlike existing ML-based approaches, neither needs extensive training nor causes any intentional SLO violations. Moreover, Our feedback-based approach does not require training, allowing it to adapt quickly to microservices’ dynamic operating environments. [Related publications: HPDC’22, SGMETRICS’22 poster, CLOUD COMPUTING’22.] User-in-the-Loop Management (Research Areas: Economics of Computing, Cloud, HPC)We are developing a novel market-based resource control where the users actively participate in HPC management. Our highly scalable approach alleviates the HPC manager's burden of power-aware job scheduling for oversubscribed HPC systems. [Related publications: HPCA’23, HPCA’18, HPCA’16, HPCA’15] Efficient Federated Learning (Research Areas: Security and Privacy, Edge Computing)We are developing novel algorithms to efficiently address the data heterogeneity problem in Federated Learning (FL). We propose computation efficient methods to auto-generate aggregation weights for the central model server. On the client side, we are developing lightweight checkpoints to decide early exit from participating in FL model updates and saving clients’ computation and communication costs. [Related publications: IEEE Edge’22, SIGMETRICS’22 poster] Novel Monitoring for Data Center Reliability (Research Areas: Security, Cyber-Physical Systems, Sensors)We are developing an acoustic sensing-based system using microphone arrays to measure a server's ‘‘true’’ power consumption from its cooling fan noise. Our system will be able to mitigate the threat of behind-the-meter heat injection attacks from servers with integrated batteries. [Related publications: HPCA’21, SIGMETRICS’18, CCS’17] Server-Level Power Monitoring in Data Centers (Research Areas: Cyber-Physical Systems, Sensors)We are developing the first-of-its-kind ultra-low-cost server-level power monitoring in data centers that extracts server power usage information from the conducted electromagnetic interference (EMI) generated by server power supplies. Our approach will significantly lower the data center's instrumentation cost by eliminating the need for dedicated power meters for each server. Instead, we use a single sensor's voltage measurement from a single point to provide server-level power consumption. This project is funded by NSF under the three-year $400K grant ECCS 2152357. [Related publications: SenSys’22 poster, SIGMETRICS’20, CCS’18] Monitoring Behind-the-Meter Distributed Energy Resources (Research Areas: Sensors, IoT)We are developing a novel voltage probing and analysis prototype that captures high frequency (10kHz~100kHz) conducted electromagnetic interference (EMI) signatures from grid-tied inverters enabling utilities to identify the presence and operational status of behind-the-meter distributed energy resources (DERs) such as solar. Our approach breaks significant barriers in situational awareness of customer-side BTM DERs. We do not rely on any historical patterns/characteristics; instead, we directly monitor the DER inverter's real-time generation. We can identify battery-coupled DER systems. We enable a non-intrusive utility-side and fully utility-managed BTM DER monitoring system. [Related publications: SIGMETRICS’20, CCS’18] Prior Works (before 2019)Data Center SecurityOur recent works focus on data center security. The sheer scale of the Internet and cloud computing mandates massive computer systems housed in mission-critical data centers. And, due to the criticality of hosted services, data centers are emerging as a prime target for malicious attacks. While securing data centers in the cyberspace has been widely studied, a complementary and equally important security aspect — data center physical infrastructure security — has remained largely unchecked and emerged to threaten the data center uptime. In our research, we contribute to data center security by enhancing the physical infrastructure security, with a particular focus on mitigating the emerging threat of ‘‘power attacks’’ in multi-tenant ‘‘colocation’’ data centers. Taking up nearly 40% of all data center energy usage, multi-tenant data centers are shared facilities that house physical servers of multiple tenants who pay for using the facility's non-IT infrastructure (e.g., cooling). We identify that there are serious vulnerabilities lurking in the multi-tenant data center infrastructures exposing them to well-timed power load injection attacks (i.e., power attacks). Power attacks can create dangerous capacity overloads resulting in million-dollar losses. We show that a malicious tenant, or an attacker, can extract runtime power usage of benign tenants by exploiting physical side channels due the unique physical co-residency of multiple tenants in the shared data center. Specifically, we study the vulnerability and defense strategies against a thermal side channel due to server heat recirculation, an acoustic side channel due to server fan noise, and a voltage side channel due to Ohm's Law. [Related publications: CCS’18, SIGMETRICS’18, CCS’17] Efficient Operation Through CoordinationData center operators aspire efficient operation for reasons such as improved infrastructure utilization, lowered electricity bill, reduced carbon emission, etc. However, in a multi-tenant colocation data center, the servers are owned and operated by individual tenants. This restricts the data center operator to employ many existing centralized efficient operation techniques. For example, it has been commonly proposed in the literature to slow down CPUs, put servers in low-power modes, or even temporarily shut them down to reduce power consumption. They utilize the workload information to decide which servers to be slowed down or turned off with minimum performance impact (e.g., a server/cluster with a low workload is a suitable candidate for power reduction). These techniques cannot be applied to multi-tenant data centers because, first, the operator does not have any information on tenants’ workload, and second, it also does not control the tenants’ servers. We propose market based frameworks that establish coordination and communication between operator and tenants toward their mutual benefit. [Related publications: HPCA’18, HPCA’16, HPCA’15, ICCAD’15] Data Center SustainabilityData centers consume a massive amount of water to cool down servers through cooling towers. Data centers’ water footprint poses a great sustainability challenge, especially for lengthy drought-stricken places such as California. While water consumption and energy consumption are related, data center water efficiency varies with time and location due to its strong tie to weather conditions (e.g., outside temperature). I demonstrated that the Spatio-temporal variation of water efficiency is a perfect fit for data centers’ workload flexibility - migrating workloads to locations with higher water efficiency and/or deferring workloads to water-efficient times. Using my purely "software-based approaches", we significantly reduce water consumption without upfront capital investment or facility upgrades. [Related publications: TCC, IGCC’14] |