From AI Demos to AI Systems: Notes from ODSC East by Dakota Kim

From AI Demos to AI Systems: Notes from ODSC East

Just wrapped up a very full day at ODSC East 2026.

ODSC is a data science and AI conference with a wide mix of practitioners, vendors, researchers, and leaders across the industry. I went in especially interested in how companies are thinking about AI rollout in the real world, where cost constraints, security reviews, unclear ownership, messy data, changing models, and reliability all matter at the same time.

The energy was high. The keynote room was full enough that people were listening from outside the door. There was plenty of excitement, but the questions felt pragmatic. People already understand that AI can do impressive things. The harder questions were about how to run it, how to govern it, how to keep it affordable, and how to know whether it is actually working.

I did not attend every session. Some of this comes from talks I sat in on directly, and some comes from session material and slides I have started reviewing and plan to keep sipping on over the next few days. A theme-based recap feels more useful than pretending this is a chronological walkthrough of the whole schedule.

The main feeling I left with is that enterprise AI is moving into a more serious phase. The demo phase is still here, of course, but the center of gravity is shifting. The more interesting questions are now about ownership, data, evals, governance, security, model choice, and reliability. Getting something to work once is no longer the bar. The harder work is making it dependable enough to become part of the business.

Adoption Is Still Early

My broad takeaway is that many companies are still early in adoption. That does not mean they are unaware of AI or waiting around to experiment. Plenty of people are building. The early part is the distance between experimentation and operating capability.

That distance showed up across the day in different forms: cost, security, ownership, efficiency, reliability, data quality, evaluation, and governance. None of these are the flashy parts of AI, but they are the parts that determine whether an AI system becomes useful beyond a pilot.

One framing I liked from the morning was the idea that enterprise AI teams often only see part of the elephant. Business leaders see opportunity and operations. Data teams see quality, lineage, and pipelines. Engineering sees reliability, integration, and runtime concerns. Legal and risk teams see exposure. Executives see pressure to move.

All of those views are valid. The problem is what happens when they only meet at handoff points. By then, the product shape may already be wrong, the data assumptions may already be baked in, and the governance conversation may already feel like a blocker instead of part of the build.

That part felt familiar. A lot of friction around AI adoption comes from teams lacking a shared definition of how they are doing things. Who owns the behavior? Who owns the data? Who decides what "good" means? Who approves risk? Who gets paged when the thing fails?

I like the idea of a blueprint, but I think it has to be built around durable principles instead of brittle rules. Things are changing too quickly for a one-time document to stay current. The useful blueprint is more like a shared grammar for decisions: how we evaluate systems, how we handle access, how we document choices, how we decide when humans stay in the loop, and how we revisit those decisions when the technology or workflow changes.

Agents Are Getting More Serious

Agents were everywhere, which is not surprising. The more useful agent conversations were focused on production reality.

There were sessions on CI triage agents, agentic data engineering, multi-agent systems, agent runtimes, control planes, trustworthy deployment, and enterprise workflow orchestration. The shared concern was pretty practical: agents are easy to demo and much harder to operate responsibly.

The agent use cases that feel real to me today tend to have clear boundaries. Code generation, documentation, research, automated workflows through skills and agent workflows, mentoring or personal curriculum development, and internal tooling where the blast radius is understood.

The risky areas are the ones where teams quietly replace human judgment instead of augmenting it. Pull request review is a good example. I absolutely think AI should be part of the process. More review perspectives can be valuable, especially for catching obvious mistakes, summarizing changes, or asking useful questions. But PR review is also how teams maintain shared understanding of a system. Correctness is only one part of it. Delegating that entirely to agents would miss one of the main reasons the practice exists.

For production agents, I would want a pretty high bar. Simulation and evals before deployment. A clear scope of responsibility. Deterministic gates around bad context or risky actions. Audit trails. An easy way to turn the agent off with no end-user impact. Human approval for high-stakes actions.

Incident response is a useful mental model. I can imagine an agent investigating an issue, collecting logs, suggesting a patch, and preparing a summary. I would still want the human on call approving the actual change.

Workflow tools like n8n also make sense in this world. They are a visual abstraction for logic control, and they can be useful for prototyping, integrating with hooks, swapping models, and adding classifier gates where you want more determinism. Code-first frameworks still matter, but not every useful workflow needs to begin as a custom software project.

Data Is Still the Foundation

There was a lot of discussion around AI-ready data, and that phrase is worth taking seriously.

Data can sound like the boring part compared to agents, copilots, and model launches, but it is still the thing that decides whether the system is useful. Garbage in, garbage out has always been true. With AI, the output can be more convincing, more automated, and more deeply embedded in a workflow.

The best AI systems need data that is clean, contextual, connected, and governed. They also need data in the right shape for the job. One point from the session material that stood out to me was that builders should be mindful of the workflow and the formats or features preferred for the function they are supporting. More documents will not automatically create better context. The system needs the information that helps it do the task well.

Access boundaries matter here too. Someone who should not see a piece of data should not be able to access it through a chatbot either. That sounds obvious, but it gets complicated quickly with embeddings, vector search, RAG systems, and agents that can query multiple tools.

Before calling data "AI-ready," I would want to understand who can access it, what parts are in scope, how permissions flow through retrieval, how fresh it is, what context the model sees, and how the system behaves when the data is missing, stale, or ambiguous.

The model choice matters, but the data layer is where a lot of the real product quality comes from.

Evals and Governance Need to Be Part of the Build

Evaluation came up in several sessions, including LLM evaluation with MLflow and AI evaluation from first principles. Governance came up through leadership, security, healthcare, finance, and trustworthy agent discussions.

This is the kind of practice that saves pain when it starts early.

For GenAI systems, "better" is often task-dependent. A system might sound smoother while getting worse at the cases that matter. An agent might complete more tasks while asking too many clarifying questions it should have been able to answer from available context. A model change might reduce cost but introduce subtle quality regressions.

The evaluation surface can include answer quality, latency, cost, policy compliance, security, task completion, user satisfaction, and human override rates. Most teams probably cannot measure everything perfectly on day one. They should still know which risks matter most for the workflow they are shipping.

Good governance should help teams see what is happening. Evals, metrics, analytics, access controls, and review loops should make it easier to build useful systems with confidence. If governance only appears as a final approval step, teams will route around it or resent it.

The practical version is simple to say and annoying to do well: version the important pieces. Prompts, tools, policies, model choices, test sets, retrieval behavior, and human review requirements. Then rerun evals when those things change.

Cost and Model Choice Are Becoming Strategic

My favorite session topic was around the sovereign stack and the idea of owning your intelligence.

Local models and specialized intelligence are real considerations now. In some areas, they have caught up enough to be part of the architecture conversation from the beginning. Intelligence is jagged. A smaller or specialized model can be the right choice for a specific task, especially when cost, latency, privacy, or control matter.

I do not think every team needs to rush into hosting every model themselves. There is real operational work involved. Deploying models to shared infrastructure requires MLOps skills, evaluation discipline, maintenance, and a willingness to own more of the stack. Fine-tuning and domain-specific reinforcement learning are their own research and engineering efforts.

The direction is still worth taking seriously. In general, teams should avoid building systems that assume one model or one provider forever.

Smaller, local, or specialized models make sense for high-volume tasks, low-risk transformations, privacy-sensitive workflows, domain-specific extraction or classification, and assistive tasks where latency and cost matter.

The biggest model may still be the right model for complex reasoning or open-ended work. Sometimes the extra generality is useful. Sometimes it is just added cost and operational dependency.

The practical takeaway is modularity. Build for model swapping. Treat model choice like something you can evaluate and deploy, not a permanent bet hidden deep in the architecture.

What I Am Taking Away

The strongest theme from the day was maturity.

Companies are seeing the gains, but they want an efficient and balanced way to capture them. They want systems that are secure, observable, reliable, and cost-aware. They want to know who owns the output. They want better data. They want evals that tell them whether things are improving. They want governance that helps them move with confidence.

That feels like the next phase of enterprise AI to me.

If I were giving advice to a company starting an AI rollout this quarter, I would keep it pretty grounded: pick narrow workflows with clear success criteria, define ownership early, build evals before scale, treat data access as part of the product, keep humans in the loop for consequential actions, design for model swapping, and measure cost and quality together.

I am leaving Day 2 with a lot to think about and a lot of excitement, especially around local models, specialized intelligence, and what it takes to own more of the AI stack responsibly.

Looking forward to Day 3’s sessions, and I will share another recap soon.

Dakota Kim