Link to Original: Grounded Research: From Google Brain to MLOps to LLMOps — with Shreya Shankar of UC Berkeley
Summary
Hosts: Alessio (Partner & CTM residence at Decibel Partners) and swyx (Writer & Editor of Latent Space)
Guest: Shankar (formerly of Google and Viaduct, currently unofficially an Entrepreneur in Residence at Amplify while pursuing a PhD in databases at Berkeley)
Key Discussions:
Introduction:
Shankar's academic background was initially mistaken for Stanford, but he clarified that he's a PhD at Berkeley. He has also interned at Google and worked as a machine learning engineer at Viaduct.
While his LinkedIn suggests his ties with Amplify as an Entrepreneur in Residence, Shankar admits this is an informal title. He is presently immersed in his PhD studies.
Personal Interests:
Beyond his professional life, Shankar is an avid hiker and enjoys exploring different coffees in the Bay area.
He has recently cultivated an interest in cooking, particularly pasta dishes. He recently hosted a dinner for 25 people, navigating the challenges of diverse dietary preferences in the Bay area.
ML Development vs. Traditional Software Development:
Shankar has conducted extensive research on machine learning operations (MLOps). One notable paper outlined the three V's of ML development: Velocity, Validation, and Versioning. These aspects became evident after structured interview studies.
ML is experiment-driven, differing from the linear development workflow of conventional software engineering. This results in a high rate of experimentation, even among established companies like Microsoft and Google.
Bridging Development and Production:
There's a significant challenge in aligning the development environments with the production environments in ML.
Production environments don't typically facilitate rapid experimentation like development environments do.
Shankar emphasizes the potential bugs and discrepancies that can arise when transitioning ML models from development to production.
Preventing Data Leakage:
A challenging aspect of ML development is ensuring that data doesn't unintentionally "leak" during the model training process.
Exploratory Data Analysis (EDA) is crucial, but it can inadvertently introduce biases or errors into the ML process if not conducted cautiously.
The podcast session ends with Shankar hinting at the necessity of reimagining EDA in the context of ML development to avoid potential pitfalls.