Latent Space Podcast 4/6/23 [Summary] - Grounded Research: From Google Brain to MLOps to LLMOps — with Shreya Shankar of UC Berkeley

Explore the evolution of MLOps with Shreya Shankar from Google Brain to UC Berkeley. Dive into bridging development & production, understanding ML models as unique database views. #ML #Research

Prof. Otto NomosOct 05, 2023 ∙ 4 min read

Link to Original: Grounded Research: From Google Brain to MLOps to LLMOps — with Shreya Shankar of UC Berkeley

Summary

Hosts: Alessio (Partner & CTM residence at Decibel Partners) and swyx (Writer & Editor of Latent Space)

Guest: Shankar (formerly of Google and Viaduct, currently unofficially an Entrepreneur in Residence at Amplify while pursuing a PhD in databases at Berkeley)

Key Discussions:

Introduction:
- Shankar's academic background was initially mistaken for Stanford, but he clarified that he's a PhD at Berkeley. He has also interned at Google and worked as a machine learning engineer at Viaduct.
- While his LinkedIn suggests his ties with Amplify as an Entrepreneur in Residence, Shankar admits this is an informal title. He is presently immersed in his PhD studies.
Personal Interests:
- Beyond his professional life, Shankar is an avid hiker and enjoys exploring different coffees in the Bay area.
- He has recently cultivated an interest in cooking, particularly pasta dishes. He recently hosted a dinner for 25 people, navigating the challenges of diverse dietary preferences in the Bay area.
ML Development vs. Traditional Software Development:
- Shankar has conducted extensive research on machine learning operations (MLOps). One notable paper outlined the three V's of ML development: Velocity, Validation, and Versioning. These aspects became evident after structured interview studies.
- ML is experiment-driven, differing from the linear development workflow of conventional software engineering. This results in a high rate of experimentation, even among established companies like Microsoft and Google.
Bridging Development and Production:
- There's a significant challenge in aligning the development environments with the production environments in ML.
- Production environments don't typically facilitate rapid experimentation like development environments do.
- Shankar emphasizes the potential bugs and discrepancies that can arise when transitioning ML models from development to production.
Preventing Data Leakage:
- A challenging aspect of ML development is ensuring that data doesn't unintentionally "leak" during the model training process.
- Exploratory Data Analysis (EDA) is crucial, but it can inadvertently introduce biases or errors into the ML process if not conducted cautiously.

The podcast session ends with Shankar hinting at the necessity of reimagining EDA in the context of ML development to avoid potential pitfalls.

Berkeley's Evolving Research Culture & Integrating ML in Applications

Berkeley's Research Lab Culture

Berkeley is renowned for continuously tackling challenging data projects.
Approximately every five years, a new lab emerges.
Recent developments include Rise Lab splitting into Sky Lab (focused on multi-cloud programming environments) and Epic Lab (focused on low-code/no-code and data management tools).
These projects are funded by NSF grants and the specifics of what will come next remain ambiguous.

Transition from Static Data to Dynamically Updated Data in Academia

A trend has emerged where academics, accustomed to static datasets, struggle with the dynamic nature of industry data.
There is a shift, with some in academia now starting to use dynamic data benchmarks.
One challenge faced is the static mindset towards models. Instead of treating models as fixed artifacts, they should be perceived as constantly evolving views on data.

Research Projects and Their Impact

The speaker is developing a system for ML pipeline creation. The aim is to integrate the development and production experience.
Central to this system is the idea that models should be viewed as data transformations, consistently recomputed.
The goal is to ensure consistency between the numbers achieved during development and post-deployment.

Key Principles for ML Practice

Version Everything: Everything, including code and data, should be versioned for traceability, especially during experimentation.
Always Validate Your Data: Despite its importance, many neglect to validate input data. Proper data validation can preemptively identify issues before they affect ML performance.
Choose Model Architecture Wisely: When implementing ML, operational capabilities should be assessed. Some teams might benefit from using APIs, especially if they lack the resources to host and maintain their own models. Large language models like GPT offer high utility but come with higher latency and costs.

In essence, Berkeley's dynamic research culture continually pushes boundaries in data science. While academia is slowly adjusting to the evolving nature of data, there's still room for growth in practices, especially around data validation and model selection.

Exploring the Dynamics of LLMOps, Large Language Models, and Academic Research in the Age of Big Tech.

The LLMOps Stack and Shadow Models

The LLMOps stack requires a state management tool to accommodate dynamic changes in API usage and prompt management.
While the OpenAI API keeps a history of prompts, there are challenges in ensuring uniformity across various prompt injections.
A pressing need exists for tools that compile and manage prompts.
During the deployment of large language models (LLMs) in production, filters play a vital role in refining outputs.
The discussion further emphasizes the potential utility of shadowing simpler models in production, albeit with limitations based on user feedback requirements.

Keeping Up With Research

The exponential growth of machine learning (ML) research publications makes it challenging to keep abreast of new advancements.
Instead of attempting to read all papers, one recommended approach is to focus on the practical tools and GitHub repositories, which provide insights into the real-world problems users are trying to solve.
Additionally, there's a highlighted preference for a more ethnographic research approach that involves understanding real-world problems and their evolving nuances.

Google Brain vs. Academia

The vast resource availability at big tech companies is shifting the nature of research that PhD students in academia can feasibly undertake.
While reproducing large-scale models, like those from OpenAI, is possible, the incentives in academia, driven by publishing, can hinder this approach.

Navigating AI's Landscape: Insights for New Grads and Addressing Diversity in Tech

Advice for New Graduates

New grads often aim to work on impactful projects and become renowned for their contributions.
The speaker suggests engaging in underrepresented areas like data management research in the industry, emphasizing the need for efficiency in model training and data management.
Companies like OpenAI are doing important work, with models that will evolve as more data and computing resources are allocated.

Support for Minorities in Computer Science

The speaker highlighted the "She Plus Plus" organization, which introduces underrepresented high school students to coding and the tech world, offering exposure to Silicon Valley's opportunities.
The speaker emphasized the importance of exposure and awareness, noting that many outside Silicon Valley don't have the same insights into potential careers in tech.
The "dare" program at Berkeley was mentioned as a mentoring initiative for underrepresented students in Computer Science.

AI and Technology

The speaker expressed appreciation for Stable Diffusion's AI product, which aids in generating figures for presentations.
A prediction was made that organizations will face challenges in deploying AI prototypes in real-world scenarios due to data management issues.
An AI tool that can manage and update color palettes for web development projects was identified as a desirable product.
The current period of rapid technological advances in AI was compared to the emergence of platforms like Google and YouTube, suggesting that the AI boom is not just a passing phase.

Personal Anecdotes

The speaker's grandmother's awareness of ChatGPT was cited as evidence of AI's permeation into mainstream culture.

Contact Information

Sreya can be reached via email, but has closed her DMs on Twitter due to overwhelming messages.

Content:

Latent Space Podcast 8/16/23 [Summary] - The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

Prof. Otto NomosOct 05, 2023 ∙ 3 min read

Explore the math behind training LLMs with Quentin Anthony from Eleuther AI. Dive into the Transformers Math 101 article & master distributed training techniques for peak GPU performance.

Latent Space Podcast 8/10/23 [Summary]: LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML

Prof. Otto NomosOct 05, 2023 ∙ 6 min read

Explore the magic of MLC with Tianqi Chen: deploying 70B models on browsers & iPhones. Dive into XGBoost, TVM's creation, & the future of universal AI deployments.

Latent Space Podcast 8/4/23 [Summary] Latent Space x AI Breakdown crossover pod!

Prof. Otto NomosOct 05, 2023 ∙ 7 min read

Join AI Breakdown & Latent Space for the summer AI tech roundup: Dive into GPT4.5, Llama 2, AI tools, the rising AI engineer, and more!

Latent Space Podcast 7/26/23 [Summary] FlashAttention 2: making Transformers 800% faster - Tri Dao of Together AI

Prof. Otto NomosOct 05, 2023 ∙ 7 min read

Discover how FlashAttention revolutionized AI speed with Tri Dao, as he unveils the power of FlashAttention 2, dives into Stanford's Hazy Lab & future AI insights.

Subscribe For The Latest Updates Subscribe to the newsletter and never miss the new post every week.