Latent Space Podcast 6/14/23 [Summary] - Emergency Pod: OpenAI's new Functions API, 75% Price Drop, 4x Context Length (w/ Alex Volkov, Simon Willison, Riley Goodside, Joshua Lochner, Stefania Druga, Eric Elliott, Mayo Oshin et al)

Explore the June 2023 OpenAI updates with top AI engineers from Scale, Microsoft, Pinecone, & Huggingface. Dive into the Code x LLM paradigms and discover Recursive Function Agents.

Prof. Otto NomosOct 05, 2023 ∙ 5 min read

Original Link: Emergency Pod: OpenAI's new Functions API, 75% Price Drop, 4x Context Length (w/ Alex Volkov, Simon Willison, Riley Goodside, Joshua Lochner, Stefania Druga, Eric Elliott, Mayo Oshin et al)

Summary

In this special emergency podcast episode, Swyx dives into the June 2023 updates from OpenAI. Key highlights include:

The unveiling of OpenAI's new Functions API which was previously available only for ChatGPT Plus subscribers. This API allows developers to introduce a new 'function' role, making it easier to structure the output, particularly in formats like JSON.
The significant 75% price reduction for embeddings. This is especially noteworthy following a previous 90% reduction, marking OpenAI's push towards making AI more accessible. The cost to embed the entire internet has gone from $50 million to an impressive $12.5 million.
A 4x increase in context length for the GPT-3.5 model, moving from 4,000 to 16,000 tokens.
Observations on long context models indicating that they may not retain context as efficiently across the entire span.
The podcast also touches on discussions about embedding costs, potential use-cases, and comparisons of GPT versions.

Guests Alex Volkov and others express excitement about the price drops and new models but also share some reservations about the reliability of the API in certain applications.

OpenAI's New Functions API: Developer Reactions & Comparisons

The discussion primarily revolved around OpenAI's new release, addressing the functionality and the potential implications for developers and other AI services.

Introduction to Riley Goodside - He is known for persuading Bard, an AI model, to return results in JSON format. He praises OpenAI for responding to developer needs and maintaining a "hacker ethos". He notes that instead of trying to make the output syntactically perfect, OpenAI has fine-tuned it, though there might be instances of inexact syntax.

Simon Willison's Input - He discusses the pattern of asking the AI to run functions. Simon appreciates OpenAI's ability to identify and deliver what developers truly want. He's excited about the ability to integrate tools with the model. He also mentions the security implications associated with prompt injection and the need for user approval for actions that might modify the world state.

Eric Elliott's Perspective - Creator of Sudo Lang, he recommends defining an interface inside prompts for accuracy. He's observed that when the AI is prompted with pseudo code, it tends to be more compliant and accurate.

Discussion on Functions API and Agents - The conversation shifts to how this new release from OpenAI affects the agent-making space, where tools are created and prompting is used to ask the AI to run those tools.

Contrast with Google Vertex - Vertex from Google is also known to offer a similar functionality where developers can specify a JSON schema.

Overall, the participants discuss the implications, potential challenges, and the value of the new OpenAI release, emphasizing the importance of reliable outputs and the balance between developer needs and security concerns.

Challenges and Future of Code-to-English Interface in AI Systems

Stefania Druga, a researcher with Microsoft Research, discusses her experiences with Fixie AI, emphasizing the spectrum of users from no-code enthusiasts to Python programmers. She raises concerns about the challenges of AI 'hallucinations', controlling outputs, and the balance between natural language prompts and traditional coding. Riley Goodside remarks on the potential shift from English language prompting back to more structured prompts, due to the ambiguity and unpredictability of natural language. The excitement around language models, he mentions, is the potential to democratize automation. Alex Volkov expresses enthusiasm about a transition akin to moving from JavaScript to TypeScript. Roie, from Pinecone, focuses on the surprising reduction in embedding costs, pondering the reasons behind the drastic price drop. Swyx chimes in with potential explanations, mentioning infrastructure improvements and a shift in model versions that could have led to cheaper inferences. He also comments on the strategic significance of reducing embedding costs to potentially lock users into OpenAI's ecosystem. Alex stresses OpenAI's stance on not using data provided via its API for training.

Embeddings and Dynamic Function Selection in AI

Xenova and Huggingface on AI Embeddings

Alex Volkov introduces Joshua (Xenova) from Transformers.js and a recent employee at Huggingface.
Discussion ensues about the advantages of running embeddings on the client side for reasons such as data privacy and lower costs. Xenova acknowledges the benefits of both large-scale embeddings offered by OpenAI and client-side embeddings. He mentions the constraint on embedding dimensions when run locally.
Alex appreciates the utility of Transformers.js, indicating its value extends beyond just experimental use cases.

Function Selection in AI Models

The panel transitions to discussing the dynamic selection of functions by OpenAI. They address the ability for OpenAI to determine which function to run based on user input.
Swyx raises concerns about the unclear limit on the number of functions that can be assigned. He discusses the contrast between using many functions and having a few highly capable functions.
Simon Willison proposes having sophisticated functions that are domain-specific, suggesting SQL or even JavaScript as potential languages.
Alex highlights a new feature where the AI model can either choose a function on its own or be explicitly told which function to run based on user input. He mentions a new role in the chat interface called function role to identify which function generated a particular output.
The panel concludes with the idea that decision-making in choosing or directing functions will be a critical aspect of future AI applications.

Leveraging Function API for Advanced Code Agents

Alex Volkov highlighted the potential of utilizing user and system input to innovate function outputs. He envisioned a function role that would be adaptable to the user's needs.
Swyx identified the challenges faced by developers with code generation tools, emphasizing the need for more inference time and flexibility in error correction. Swyx introduced the idea of a 'code agent' created through three pathways: generating, testing, and calling existing code.
Stefania Druga discussed the blurred lines between functions and agents. She mentioned Google's paper, "Reveal", which presented an innovative method of encoding diverse knowledge sources into a memory structure for faster response time.
Riley Goodside saw potential in harnessing AI's documentation expertise to predict software behavior based on descriptions.
Nisten talked about the limitations of the model's context window, noting that while the input capacity had increased, the output still seemed limited.
Mayo expressed skepticism about the hype, questioning the tangible benefits of a larger context window, especially with the rise of retrieval-based approaches.
Alex Volkov responded, pointing out the convenience of the larger context window for developers who deal with variable user input sizes.

The discourse suggests that while the Function API has exciting potential, there are several challenges and considerations to account for in its practical application.

The 2 Code x LLM Paradigms and Evolving AI Interactions

Roie expresses interest in Sean's perspective on the distinction between two prevailing paradigms related to language models and code. Swyx shares his view that effective utilization of language models demands a solid grounding in code. He emphasizes the growing tension between two approaches:

Retrieval Augmented Generation: Where the model takes center stage with surrounding code (LLM Core with a code shell).
Agent World: Where the model plays a specific, limited role and the code interprets tasks (LM shell with a code core).

Swyx acknowledges OpenAI's recent updates, suggesting a shift from the first paradigm to the second. This transition enhances the capability of language models to more seamlessly invoke functions.

Mayo and Alex discuss the evolution from traditional prompt engineering to fine-tuning. They observe that earlier, achieving specific outputs required intricate prompting, but now there's a directness to the tooling, which simplifies things but also has its drawbacks.

Riley points out the benefits of the updated model, noting it streamlines certain tasks but may come with challenges, like the model "breaking character." He suggests that the model should be seen as a reasoning tool, emphasizing the potential of structured outputs.

Stefania predicts that smaller AI models are the future due to benefits like security, affordability, and decentralization. She emphasizes the need for models to be efficient for real-time applications and shares concerns about over-reliance on chat interfaces, which may not be scalable for every application.

Swyx and Riley conclude that, despite OpenAI's current focus on chat, it's essential to think beyond this format and consider the broader possibilities of AI and UX interactions.

The Double-Edged Sword: AI's Power and Vulnerabilities Explored

Alex Volkov opened the conversation about the security concerns tied to developer-friendly tools. He specifically queried about the potential to address prompt injection issues with the new tools.
Simon Willison expressed concerns, emphasizing that while the new updates reduce the barrier to function access, they could also lead to unintended consequences. He cited the challenge of prompt injections becoming more sophisticated and the need to consider reversibility in AI actions.
Discussing Bing Chat, Alex and Simon touched upon the ability of modern systems to detect prompt injections. Simon voiced skepticism about the effectiveness of filters catching prompt injections, particularly when up against dedicated hackers.
Swyx suggested marking functions as "reversible" or "requires human input" as potential safety measures.
Nisten voiced enthusiasm for the game-changing nature of functions, emphasizing their potential to revolutionize operations and scalability in AI. However, he also acknowledged the possible security holes.
Riley Goodside urged caution, stating that while advancements are exciting, there are still inherent risks such as AI hallucinations. He also highlighted that similar guardrails have been seen in other frameworks, urging moderation in the excitement.
Alex Volkov defended the significance of the advancements, hinting at the success of past OpenAI projects and their potential for transformative change.
The panel touched upon the evolution of function integration, with Simon highlighting how its inclusion in the core platform reduces friction and inspires confidence.
Stefania Druga mentioned the newly launched Garak, a tool for security probing, and pondered on the ethical implications of exposing vulnerabilities. She left with a thought-provoking question about the future demands from OpenAI.

The discussion ended with an invitation for participants to brainstorm on potential projects and innovations using the new tools.

GPT Model Discussions and Future Wishes

GPT Model Upgrades: Alex Volkov emphasized the upgrade from the 3.5 model to the current version. He's curious if the new model adheres to system messages more efficiently. Simon Willison discussed how the new models are now more "steerable" and highlighted the importance of stability. Simon also mentioned that the problem of easily injecting prompts in 3.5 might be resolved in this upgrade but clarified that it hasn't been officially declared as solved.
JSONformer Discussion: Mayo introduced a topic about JSONformer, a tool for generating structured data. Simon provided an explanation of how it functions and its capabilities. The group speculated about OpenAI's choices in terms of implementing similar methods.
Closing Comments - Desired Features: Participants expressed their hopes for future developments:
- Stefania Druga: Wished for knowledge graphs and better data retrieval.
- Simon Willison: Desired widgets in ChatGPT to make it more interactive.
- Nisten: Advocated for a 30B open-source model from OpenAI.
- Far El: Echoed the desire for more open-source models.
- Roie: Wanted an 80% cost reduction for the 32k GPT-4.
- Riley Goodside: Expressed excitement for the upcoming virtual hackathon with substantial prizes.
OpenAI's Open Source Debate: There was a unanimous desire for OpenAI to lean more into open sourcing their models, as emphasized by Mayo, who pointed out the irony in the company's name.
Developer Perspectives and Future Possibilities: swyx (Sean) mentioned the concept of "Franken models" and the potential to use larger models to guide smaller, specialized models. He expressed excitement about exploring this area and rewriting his developer agent.
Podcasts and Future Developments: Alex Volkov gave a shoutout to swyx's podcast, Latent Space. swyx also teased an upcoming interview with George Hotz, hinting at some intriguing developments in the tech world.

Content:

Latent Space Podcast 8/16/23 [Summary] - The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

Prof. Otto NomosOct 05, 2023 ∙ 3 min read

Explore the math behind training LLMs with Quentin Anthony from Eleuther AI. Dive into the Transformers Math 101 article & master distributed training techniques for peak GPU performance.

Latent Space Podcast 8/10/23 [Summary]: LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML

Prof. Otto NomosOct 05, 2023 ∙ 6 min read

Explore the magic of MLC with Tianqi Chen: deploying 70B models on browsers & iPhones. Dive into XGBoost, TVM's creation, & the future of universal AI deployments.

Latent Space Podcast 8/4/23 [Summary] Latent Space x AI Breakdown crossover pod!

Prof. Otto NomosOct 05, 2023 ∙ 7 min read

Join AI Breakdown & Latent Space for the summer AI tech roundup: Dive into GPT4.5, Llama 2, AI tools, the rising AI engineer, and more!

Latent Space Podcast 7/26/23 [Summary] FlashAttention 2: making Transformers 800% faster - Tri Dao of Together AI

Prof. Otto NomosOct 05, 2023 ∙ 7 min read

Discover how FlashAttention revolutionized AI speed with Tri Dao, as he unveils the power of FlashAttention 2, dives into Stanford's Hazy Lab & future AI insights.

Subscribe For The Latest Updates Subscribe to the newsletter and never miss the new post every week.