Monitoring Machine Learning Applications: Lessons Learned from Monitoring Traditional Software and Beyond

Abstract

Developers insert monitoring probes (i.e., logging statements) in their source code to monitor the runtime behaviors of software systems. Logging statements print runtime log messages which play a critical role in various operation and maintenance efforts (e.g., anomaly detection or failure diagnosis). However, developers typically insert logging statements in an ad hoc manner, often resulting in fragile logging code, i.e., insufficient logging in some code snippets and excessive logging in other code snippets. Insufficient logging can significantly increase the difficulty of field failure diagnosis, while excessive logging can cause performance overhead and hide truly important information. To understand and support software logging practices, we surveyed software developers and analyzed software repositories (source code, change history, and issue reports) to understand the benefits and costs of logging, where developers distribute their logging code, how they choose the verbosity level and content for their logging code, how they maintain their logging code, and the consistency between logging and other code, and proposed automated approaches to support developers’ logging practices (e.g., auto-generation of logging code). In this talk, I will discuss lessons learned from traditional software monitoring, how they apply to machine learning applications, and the particular considerations for the monitoring of machine learning applications.

Date
Apr 27, 2023 10:30 AM — 11:00 AM
Location
Polytechnique Montreal
2500 Chem. de Polytechnique, Montréal, QC H3T 1J4
Heng Li
Heng Li
Assistant Professor of Software Engineering - Polytechnique Montreal

Heng Li is an assistant professor in the Department of Computer and Software Engineering at Polytechnique Montreal, where he leads the MOOSE lab. He holds a PhD in Computing from Queen’s University. Prior to his academic career, he worked in the industry for years as a software engineer at Synopsys and as a software performance engineer at BlackBerry. His and his students’ research aims to address the practical challenges in software monitoring, software quality engineering, intelligent operations of software systems, and quality engineering of machine learning applications. The outcomes of such research have benefited the daily maintenance and operations of software and software-intensive systems in the industry and inspired follow-up research in related areas. He is a core organizer of the international Software Engineering for Machine Learning Applications (SEMLA) symposium. He is a recipient of the Discovery Grant from NSERC, John R. Evans Leaders Fund from CFI, and NSERC Alliance, among other grants. He is the secretary of the Standard Performance Evaluation Corporation (SPEC) Research Group – DevOps Performance Working Group.