ODSC East 2024: Navigating the Frontiers of LLMs, Generative AI, and Data Engineering by Ranjan Bhattacharya

A few of the consultants from EQengineered had the opportunity to attend the recent Open Data Science Conference (ODSC) East 2024 in Boston. The numerous sessions covered multiple facets of the rapid evolution of large language models (LLMs), generative AI techniques, and responsible AI practices.


From left: Ranjan Bhattacharya, Chief Data Officer; Seth Carney, Technical Consultant; Dakota Kim, Principal Technical Consultant; William Ramirez, Senior Technical Consultant; and Ed Lyons, Principal Technical Consultant/Architect.

 

Here are some of the key highlights for AI/ML practitioners:

Large Language Models

Various talks covered the LLM technology, and how it is changing information retrieval techniques. Some of the talks explored advanced architectures like retrieval-augmented generation (RAG) that combine retrieval models with powerful language generators. Best practices were shared for evaluating LLM performance, prompt engineering strategies, parameter-efficient fine-tuning approaches, and fine-tuning embedding models. The operational complexities of deploying LLMs were examined, including transitioning to robust LLMOps pipelines and leveraging tools like LangChain agents to integrate LLMs with external data sources and tools.

Generative AI

A few of the talks spotlighted the latest generative AI breakthroughs for synthesizing images, audio, proteins, and other modalities. Open-source frameworks like HuggingFace and PyTorch were highlighted for developing multimodal generative applications. Techniques for aligning language models using reinforcement learning from human feedback were also covered. Case studies demonstrated innovative generative AI use cases in healthcare, pharma, and robust quality assurance pipelines, while stressing the importance of implementing AI guardrails.

Responsible AI Deployment

As AI capabilities are adopted across multiple domains, numerous sessions emphasized strategies and best practices for responsible, ethical, and trustworthy AI deployment across the full AI lifecycle. This included evaluating generative AI outputs, developing culture-sensitive language models through data diversification, and implementing AI governance frameworks with mechanisms for AI risk management. Formal approaches like reliability engineering were proposed to holistically assess AI systems.

Foundational Machine Learning

Beyond the AI frontiers, we had sessions covering fundamental machine learning libraries (PyTorch, TensorFlow, scikit-learn), algorithms (XGBoost, linear regression), data manipulation (Pandas, SQL), and ML essentials like feature engineering. Emerging areas were also explored, such as topological deep learning, causality, and autonomous AI agents. The biotech/pharma track highlighted ML applications like predicting COVID-19 outcomes and computational drug discovery.

Data Engineering

The conference reaffirmed that while novel AI frontiers like LLMs and generative models are expanding possibilities, success demands a holistic grasp of fundamental data science, robust MLOps capabilities, ethical AI guardrails, and translating complex models into clear insights. It was an energizing confluence of cutting-edge innovation balanced with essential ML engineering practices.

Data Architecture

Several talks discussed the evolution of data architectures like data lakehouses, data meshes, data fabrics etc. and strategies for modernizing outdated monolithic data architectures using open-source frameworks.


The conference allowed us to experience how rapidly the AI/ML landscape is evolving across industries and research frontiers. While LLMs and generative AI stole the limelight, we were reminded of the enduring importance of strong fundamentals, responsible practices, operationalization, and effective data communication. It was an energizing experience to learn from and connect with so many experts pushing the boundaries of what's possible with AI.

Ranjan Bhattacharya