Traditional Linux OSPM based Power and Performance (PnP) management does NOT address cloud applications' highly diversified dynamic needs. We developed a framework and various intelligent algorithms that replace Linux OSPM to do cloud server PnP modeling and control knobs (including parameters) selection. The PnP model automatically learns the server system's compute ingredients (e.g. CPU, GPU) behavior under various cloud loads, does comprehensive and progressive optimization, and outputs control recommendations. Examples: 1) the model predicts two threads' competition for shared cache or pipeline before the competition happens, and informs the OS scheduler to take actions; 2) it can manage use of deep C states (e.g. power gating state) to minimize or avoid performance penalty on latency sensitive workloads. We created an ASIC to enable out-of-band control and to accelerate prediction.
Jun (Justin) Song is chief power architect of Alibaba Group responsible for server and data center power optimization and management. Before joining Alibaba Justin worked as a silicon, chip and platform architect with Intel for 14 years. Justin made technical presentations on public... Read More →