Plus, making MLOps pipelines more flexible, asking for your ML confessions, hidden gems, and the Slack spotlight. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Austin - February 27
NYC - February 27
Miami - March 4
Paris - March 6 (shout out to Nebius)

Stay up to date with your local chapter here.

Forwarded this email? Subscribe here

It's good to see thinking in action - and we think you'll like this edition's Guest Editor.

Guest Editor - Gleb Lukicov

MLOps Engineer @ Electric Twin // MLOps Community London Co-host

Still buzzing 🤠 after the great Community meetup we had in London last week. We packed in a lightning talk marathon with six speakers - and now I get to guest edit the newsletter!

A quick thanks again to our ⚡speakers: Pier, Matt, Li, Vinay, Richard, and Wendy, and I'm taking this chance to share some of their materials:

How to build Micro SaaS, shared by Pier, also, check out his own projects
Richard's video on Ollama vs vLLM vs HuggingFace TGI: Performance Comparison for Open-Source LLMs on Google Cloud Run.

Be sure to check here to see what’s happening near you.

To the newsletter! This week’s chat between Demetrios, Paul and Floris about real-time voice AI was interesting for us at Electric Twin. Their take on balancing latency with responsiveness - especially how turn detection impacts conversation flow - stood out. And going with the flow ties nicely into Médéric’s blog on building more flexible ML pipelines.

But the thing I’m most interested in this week? The stories that’ll come from the ML Confessions section! Please, share your stories, help us all feel better about ourselves!

Before I go, Demetrios asked me to highlight three ways he can help you (just hit reply and let him know):

Curated intros to other community members
What problems are you dealing with? Let him help you find the best solutions through his network
Looking to augment your staff for an MLOps or AI project? He’s got you covered

Enjoy the newsletter!

The Challenge with Voice Agents

Paul van der Boor // Senior Director Data Science @ Prosus Group

Floris Fok // AI Engineer at @ Prosus Group

There's a lot of talk about AI stealing jobs, but as a podcast host, I wasn’t too worried. After this chat, I’m starting to think I should have a backup plan.

Voice AI is moving fast, and Paul and Floris walked me through what it takes to build real-time voice agents. OpenAI’s Real-Time API is a big step forward, but challenges like latency, memory, and making interactions feel natural still remain. Unlike text-based LLMs, voice models struggle with context, hallucinations, and handling multiple languages.

One of the biggest hurdles is knowing when to listen and when to speak. Turn detection has to balance avoiding awkward silences with not interrupting the user. A few key things stood out:

Adapting to individual speaking styles - A pause might mean “I’m thinking” for one person and “I’m done talking” for another. Fine-tuning for different users is essential.
Handling interruptions properly - If a user cuts the AI off, it needs to know they didn’t hear the last part. OpenAI’s API trims responses when interrupted, which helps.
Allowing for natural clarifications - Most voice models are weirdly overconfident and never ask you to repeat yourself, even if they’ve completely misheard. A simple “Sorry, what was that?” would improve things massively.

We also looked at testing. Custom evals with synthetic voices surfaced some funny issues - certain accents made the model randomly switch languages, and slow speakers accidentally triggered turn detection. Open-source models like Kokura are making text-to-speech more accessible, but voice AI still needs a rethink of workflows to handle the unpredictability of real conversations.

So, my job’s safe for now, but I should probably polish up my resume… just to be safe.

Video || Spotify || Apple

Stop Building Rigid AI/ML Pipelines: Embrace Reusable Components for Flexible MLOps

With thanks to Médéric Hurier for their contribution.

There’s a lot of talk about AI mimicking humans, and MLOps pipelines seem to follow suit - starting flexible but seizing up over time.

This stiffness makes iteration and experimentation harder, but an artifact-driven approach can fix that. This blog breaks down how modularizing each step into reusable components - Python packages, Docker images, and config files - creates a more adaptable workflow. These artifacts are versioned, stored in repositories, and combined using Directed Acyclic Graphs (DAGs) instead of tightly coupled steps.

One of the biggest wins is how DAGs keep workflows flexible:

Easier experimentation – Need to test a new preprocessing method? Swap in a new artifact without reworking everything.
Less duplication – The same preprocessing or model training code can be used for both training and inference, keeping things clean.
Better scalability – DAGs allow parallel execution and smart resource allocation, so big jobs don’t slow everything down.

This approach borrows from functional programming, treating each artifact as a self-contained unit, making workflows more modular and maintainable.

Click below before your pipelines start groaning every time they bend down.

Read it here

💡Job of the week

Senior Research Engineer OLMo // The Allen Institute for AI (Seattle, WA)

The Allen Institute for AI is a research organization focused on advancing artificial intelligence for the common good. This role involves building and optimizing large-scale machine learning infrastructure to support the development of open-source language models like OLMo.

Responsibilities:

Design and optimize large-scale ML training and inference infrastructure for LLMs.
Develop and maintain distributed computing systems for model training and deployment.
Collaborate with researchers to implement and test new ML techniques efficiently.
Improve performance and scalability of AI systems through hardware and software optimizations.

Requirements:

Strong experience with distributed computing and large-scale ML infrastructure.
Proficiency in Python and frameworks like PyTorch or TensorFlow for deep learning.
Background in scaling LLMs, including optimization and model parallelism.
Familiarity with cloud platforms and high-performance computing environments.

Coming soon - ML Confessions

Confess your ML sins!

Psst... we know what you did with that training data.

And we know you got away with it.

Ever skipped a data check and somehow got better results? Maybe you made a massive mistake… but sheer luck (or someone else’s blunder) saved you? Sat silently in a meeting watching someone else take the blame?

We’re collecting anonymous confessions for the hacks, the horror stories, and the times you definitely should’ve been caught but weren’t.

Don’t worry, we’re not HR - it’s all anonymous!

Confess and be absolved here. 🙏

Little finds from around the web

The Production Line // Gem // Song

A blog exploring a selection of AI-powered productivity and coding tools, with a focus on local-first models and workflow integration. It highlights tools like Ollama for running LLMs locally, BoltAI for unified model interaction, and Aider for AI-assisted coding.

Maps // Gem // Song

An AI/ML roadmap outlining a structured learning path for software engineers looking to break into AI, covering key skills, career paths, and industry trends, with insights on job roles, salary expectations, and interview preparation.

Glitches and Bugs // Gem // Song

Examining key challenges in deploying genAI products, including hallucination, unbounded data access, and non-determinism, and solutions such as evals, RAG, and fine-tuning to enhance reliability and performance in production environments.

Compress // Gem // Song

An article covering the latest release of LLM Compressor (0.4.0), which introduces multimodal model quantization support, enabling efficient deployment of vision-language and audio models through GPTQ-based compression techniques.

Slack Spotlights

Stephen Oladele shares some of the chat you might have missed

🔐 Securing LLM Agents: Authentication & Authorization in AI Workflows

This week in #mlops-questions-answered, KK opened up a crucial discussion on how to properly manage authentication and authorization for AI agents, chatbots, and RAG pipelines.

With LLMs now interacting with sensitive enterprise data, how can teams enforce access controls that prevent unauthorized data exposure while maintaining usability?

⏩ Quick Summary:

Traditional RBAC (Role-Based Access Control) works well for human users and service accounts, but AI agents blur the lines between the two.

For teams that want to handle auth in agents, the best approach combines pre-index access filtering, post-retrieval checks, and structured access policies to safeguard data.

🔥 [Community Insights] Best Practices for Handling Authorization with AI Agents

1️⃣ Pre-index Filtering (Before Vectorization): Amy Bachir recommends enforcing access policies before chunking and embedding documents to ensure only permitted data gets indexed (for the authorized user groups or roles).

2️⃣ Post-retrieval Filtering (Before Inference): Filter the retrieved vectors based on user role in your workflow before sending them to the LLM, using metadata like {doc_id, access_level, user_roles}.

3️⃣ Query Guardrails & Prompt Filtering (Not a Solution): System prompts can instruct the model not to reveal unauthorized data. However, you should not rely on them as the primary enforcement mechanism—they are weak enforcement mechanisms. Filtering before inference is key—preventing unauthorized data from ever reaching the LLM is a safer practice.

4️⃣ LLM-Aware Auth Strategies: Will Callaghan pointed out that existing auth vendors focus on app-level permissions, not fine-grained retrieval access within LLMs. The industry is moving toward defining new authorization patterns for AI agents.

5️⃣ Using Open Policy Agent (OPA) & Vector DB Filtering: Amy Bachir also suggested combining vector database access controls with dynamic, fine-grained filtering via OPA to create a multi-layered security approach.

🔎 Is This a Solved Problem?

❌ Existing Auth Vendors: LlamaIndex and LangChain don’t yet offer built-in solutions for fine-grained retrieval access control—most frameworks focus on app-level permissions.

🌱 Early Solutions:

Vector DB filtering (Weaviate, Pinecone) for basic RBAC.
LlamaIndex + Open Policy Agent (OPA) for dynamic, fine-grained filtering before LLM inference (Amy Bachir).
New Research & Tools:
- Papers outlining AI agent security on arXiv (Agent Security Models, Fine-Grained AI Access Control) (Will Callaghan).
- Composio AgentAuth—an early attempt at AI agent authentication but still an integration tool rather than a fully-fledge

💸 VCs are Taking Note: Investors are actively discussing the need for AI-native auth systems (LinkedIn post) (Will Callaghan).

🤔 The Debate: Are New Security Models Actually Needed?

Some members think LLMs can be treated like regular users in an RBAC system. Others think that agents need new permission systems because they can reason, act on their own, and synthesize data from different sources.

💬 Community Perspectives:

Cody Peterson: "Agents are just users with permissions, like any service account. Why do we need new security models?"
KK: "It’s not just about the consumer setting access—it’s also about data sources applying policies to AI agents, like YouTube enforcing content rules for children or travel platforms treating AI-driven bookings differently from human ones."
Will Callaghan: "We don’t necessarily need entirely new security frameworks, but we do need to rethink how agents interact within existing systems."

📢 What's Next?

As LLMs and AI agents take on more autonomous decision-making roles, the industry is still figuring out how to enforce granular access control while ensuring compliance.

Right now, pre-index filtering, metadata tagging, and post-retrieval validation seem to be the best strategies.

🔎 Have thoughts on AI security and access control? Join the discussion in #mlops-questions-answered and share your approach! 🚀

Austin - February 27
NYC - February 27
Miami - March 4
Paris - March 6 (shout out to Nebius)

Stay up to date with your local chapter here.

Interested in partnering with us? Get in touch: [email protected]

Thanks for reading. See you in Slack, YouTube, and podcast land. Oh yeah, and we are also on X and LinkedIn.

The MLOps Community newsletter is edited by Jessica Rudd.