Julian Wiffen on How AI is Revolutionizing Data Engineering

Julian Wiffen on How AI is Revolutionizing Data Engineering

In this episode of The Executive Outlook, we speak with Julian Wiffen, a senior leader in AI and data science at Matillion. Julian is known for taking AI out of theory and turning it into practical tools that real teams can use, especially in data engineering, where speed, accuracy, and usability matter. His journey and examples offer a grounded view of what GenAI can actually deliver today, how it’s changing ETL, and what leaders should focus on to get measurable value.

The Early Days: From Chemistry to Data Science

Julian’s path into AI started unexpectedly. He studied chemistry, and in his final year (a research-focused master’s year in the 1990s), he chose a project in computational chemistry, a relatively new area at the time. His work involved genetic algorithms, experimenting with ways to “breed” solutions that could map input variables to outputs and predict behavior. He recalls the excitement of being able to extract fundamental relationships like the gas laws, something that seems trivial now but was a major milestone then. That early exposure gave him two things that shaped his career: comfort with data-heavy experimentation and an interest in prediction and optimization that later became central to machine learning.
Prefer to listen on the go? Tune in to the full podcast episode on Spotify below:

Early Career: Data Warehousing, BI, and the “Messy Prep Work”

After university, Julian went into management consulting, where he stayed close to data—data warehousing, heavy analytics, and early ML at the edges. Like many professionals, he didn’t start as a “data scientist.” His early work was largely BI and data engineering: getting raw, inconsistent data into warehouses and preparing it for reporting. It’s the unglamorous but essential work that enables everything else. He also spent time contracting, which offered great hands-on delivery and variety, but he noticed a common drawback: contractors often don’t get to see whether the solution they built becomes business-critical or ends up unused.
Watch the full conversation on YouTube by clicking the link below:

Cisco: Data-Driven Operations and Big-Cost Avoidance

Julian then spent several years at Cisco in data-focused roles, including leadership across software/IT teams. One major area was Cisco’s service catalog—the system people used to order services such as infrastructure (on-prem or AWS) and other internal IT capabilities. A key discovery from that work was counterintuitive: when you make provisioning easy, people tend to over-order less. Julian describes it with a simple analogy: if you live next door to the supermarket, you buy what you need; if it’s far away, you stock up and waste more. Translating this to enterprise services, making ordering clear and convenient reduced excessive capacity requests and helped avoid unnecessary infrastructure spending.

Joining Matillion: A Bold Move into AI

After years at larger companies, Julian decided to join Matillion. He described a difference many leaders feel: in big organizations, you can spend more time making the case to experiment than it would take to do the experiment. In a startup/scale-up environment, there’s often more hunger to move quickly, take risks, and learn through action. His lesson for GenAI innovation was direct: progress often comes from “just doing stuff.” Sometimes it’s faster to build and learn than to wait for perfect alignment. At Matillion, Julian’s role was to explore how AI could improve the product—both by understanding how customers use it and by embedding AI capabilities into the software Matillion provides to data engineers.

Generative AI: The Future of Data Engineering

At Matillion, Julian and his team recognized early that GenAI enables a major shift in data engineering: unstructured data becomes usable. Suddenly there’s real value in processing text documents, audio, video, messy wikis, and other formats that traditionally never entered corporate warehouses. Instead of leaving that information buried in file shares (or requiring analysts to manually read and summarize it), large language models can help extract structured insights—making unstructured data something the business can actually measure, query, and act on.

The Practical Trick: Constrained Outputs (Yes/No)

Julian emphasizes a simple method that makes GenAI far more dependable in pipelines: constrain the output. Instead of asking open questions, ask the model for structured decisions—often yes/no answers. For example, given thousands of customer reviews, you can ask:
  • “Is there an actionable defect mentioned? Yes/No.”
  • “Is there a feature request? Yes/No.”
Those answers become clean, measurable fields that can feed dashboards and workflows. This approach helps teams process large volumes quickly and send humans only the items that truly need attention.

The Unexpected Breakthrough: LLMs Could Work with Matillion’s YAML

A turning point in Matillion’s AI journey came from an internal surprise. Matillion pipelines, built via drag-and-drop components and parameters (joins, filters, aggregations), are stored behind the scenes in a YAML format. The team didn’t expect LLMs to handle it because the YAML configuration is internal to the product, but because YAML is human-readable, the models could read and edit it effectively. That discovery unlocked a major leap: an assistant could go beyond “explaining” pipelines and actually help build and modify them. This became a foundation for Matillion’s copilot approach and ultimately Maia.

Two Lenses: “AI in the Pipeline” vs “AI in the Product”

Julian frames Matillion’s work in two buckets:
  • AI in the pipeline: using LLMs to transform data, especially unstructured → structured—so it becomes usable for analytics and operations.
  • AI in the product: embedding an assistant into the platform to help users build, debug, and understand pipelines.
  • Data quality
This matters because it clarifies where value comes from: not just a chat interface, but real workflow acceleration and better data outcomes.

A Key Lesson from Support Automation: Models Are Only as Good as Your Docs

One strong example was a customer support workflow. Incoming support tickets are processed through a pipeline that uses RAG (retrieval-augmented generation) against documentation, known issues, and internal knowledge. The system then drafts a first-response email that a support agent can copy, edit, or reject. The big learning wasn’t just that it saved time; it also revealed a foundational truth: ambiguous documentation creates ambiguous answers. In one memorable case, the model answered “yes” to a connector capability, and even humans were split when reading the docs. The final truth was “no,” but the documentation was vague enough that both people and the model were misled. The takeaway was clear: if you write documentation clearly for humans, it becomes far more effective for AI systems too.

Turning Free Text into Measurable Insights: Financial Services Feedback

Julian described a customer in financial services that must process large volumes of regulatory feedback every year and a manual process consuming thousands of person-hours. The data is mostly free text, which historically makes analysis slow and inconsistent. The solution followed a practical multi-pass approach:
  1. Sample feedback to discover major topics
  2. Deduplicate into a stable list (around 20 topic areas)
  3. Reprocess all responses to tag topics, extract only relevant text for each topic, and generate sentiment per topic
This created far more insight than an “average” sentiment score. A single response could be strongly positive about advice quality but highly negative about fees—insight that gets lost when sentiment is averaged into one number.

A Healthcare Idea: Voice Diaries and Retroactive Insight

Another interesting example was a medical app concept: patients record daily voice notes describing symptoms during a clinical trial. The system transcribes the recording and asks structured questions (e.g., medication taken, appetite, mood indicators). Two benefits stood out:
  1. It captures free, top-of-mind descriptions rather than leading survey responses
  2. You can add new questions later and re-run them across historical recordings—unlocking value even from older datasets that pre-date GenAI

Maia: The AI Assistant Inside Matillion

On the “AI in the product” side, Matillion built Maia, a product-embedded assistant. Users can type requirements in natural language—connecting to APIs using their documentation, building transformations, generating pipeline steps, explaining logic, and helping debug issues. Julian notes Matillion had an earlier copilot precursor focused more on transformations and later launched Maia in June as a more advanced and accurate version.

Measurable Productivity Gains and Legacy Modernization

Julian shared quantified customer outcomes. In one “bake-off” at a large pharmaceutical company, two engineers built the same complex pipeline:
  • Manual build: 10 hours
  • Maia-assisted: 1 hour
Other customers reported 5–10x productivity gains across teams. In one case, a customer said they planned to hire five additional engineer but instead hired one because the assistant removed so much repetitive workload. He also highlights modernization: LLMs are “polyglot,” so they can interpret older tools and languages. Teams can take legacy Informatica jobs, Talend jobs, or long SQL scripts, ask the assistant to explain them in plain language/pseudocode, and then rebuild them in Matillion. A graphical pipeline makes it easier to validate logic step-by-step.

Extra Wins: Multilingual Interfaces

Julian also mentioned an unexpected benefit: Maia can effectively support multilingual interaction. Users can request responses in languages like German, Arabic, or Urdu. Without heavy localization work, this reduces friction for global users who otherwise must translate their work into English mentally while using software.

The Future of Data Engineering: AI and Automation

Looking ahead, Julian Wiffen believes data engineering will shift in two major ways:

1) More Unstructured Data Work

Data engineers will increasingly work with messy formats—transcripts, videos, wikis, audio—and convert them into something structured and useful for analytics and AI.

2) Context and Semantics Become Critical

As “chat with your data” tools grow (text-to-SQL, conversational BI, assistants), success depends on well-curated schemas and context:
  • What does this column really mean?
  • How do tables relate?
  • When someone says “customer,” which of many “customer-like” tables is the right one?
Julian believes data engineers will spend more effort not just wrangling data, but ensuring documentation, semantics, and “gold layers” are well organized so both humans and ML systems can use them reliably.

Advice for Leaders Starting AI

Julian’s guidance is practical:
  • Experiment early: try things hands-on rather than waiting for perfect conditions.
  • Target the boring work: focus on time-consuming tasks where humans add limited value.
  • Define success upfront: decide what “good” looks like before you build.
  • Involve domain experts: they know what a good answer is and will adopt tools they help shape.
  • Don’t abandon POCs too fast: if it’s close but not accurate enough, park it for 3–6 months and revisit—model improvements may solve what wasn’t possible earlier.

Conclusion

Julian Wiffen’s story—from computational chemistry to AI product leadership—highlights a consistent theme: the most valuable AI is the AI that becomes a daily tool for real teams. At Matillion, that shows up in two ways: GenAI-driven pipelines that structure unstructured data and product-embedded assistance through Maia that accelerate how data engineers build, debug, and modernize pipelines. His message is clear: try things, constrain outputs for reliability, invest in documentation and context, and let domain experts define what “good” looks like. That’s how AI moves from demos to durable business impact.
For more stories of leaders shaping the future of data, AI, and strategy, stay tuned with The Executive Outlook.

Editor Bio

Isha Taneja

I’m Isha Taneja, serving as the Editor-in-Chief at "The Executive Outlook." Here, I interview industry leaders to share their personal opinions and provide valuable insights to the industry. Additionally, I am the CEO of Complere Infosystem, where I work with data to help businesses make smart decisions. Based in India, I leverage the latest technology to transform complex data into simple and actionable insights, ensuring companies utilize their data effectively.
In my free time, I enjoy writing blog posts to share my knowledge, aiming to make complex topics easy to understand for everyone.

Leave a Reply

Your email address will not be published. Required fields are marked *