For CDOs, heads of BI, and the C-suite leaders who decide what gets built. Each episode unpacks one pattern from a real AI-on-data project: the architecture decisions, the production trade-offs, and what changed for the business. Occasionally the show steps back to wrestle with the bigger questions about where data work is heading.




The Role of MDM in AI Transformation
with Malcolm Hawker · CDO Profisee
Malcolm Hawker, CDO of Profisee, on why the MDM question has flipped. The interesting problem is no longer how AI augments master data, but how master data governs the 80 to 90 percent of unstructured content your AI now reads.
So your team can search them, your AI can ingest them, and you can pull a quote in seconds. Each transcript lives on the episode page with a chaptered table of contents. Click a timestamp, jump straight to that moment in the audio.
Ctrl-F across the full transcript. Find the exact moment a pattern came up, without scrubbing the audio.
Plain HTML and clean markdown. Paste any episode into Claude, ChatGPT, or your internal RAG without scraping.
Pull a verbatim line for a deck, a Slack thread, or a doc. Each transcript carries a permanent URL and the timestamp.
And we are live. So, okay, so I believe that we can start right now. So Malcolm, before I even say hello, I want to start with the question I get from every customer every single time, every single week that I'm talking with them. The question that maybe makes every single MDM vendor a little bit nervous. So the question is, why should I pay for the MDM license if I can just drop all of my data into Claude and get the answer in 30 seconds? And I would like you to answer this question also in the 30 seconds, and later we'll do this properly.
30-second lightning round on can I use an LLM to do master data management? Well, the short answer is no. You can't. And the number one reason is because you need explainability. What I learned the first time that I ever implemented MDM is that the number one thing that you need to be able to provide are clear answers to your business as to why they're seeing what they are seeing. If you implement MDM and you merge two records together, Acme Incorporated, Acme SA, and create something new, you have to be able to fully explain. MDM is inherently a deterministic enterprise, meaning it is rules-driven. You need clear rules. You need your organization to align on those rules.
Okay, welcome.
This was just teaser. 30 seconds.
Yeah, okay. That's all. We're coming back a little bit later, maybe in 15 minutes from now, and I'm going to push you even harder. So, but let me right now to back up a little bit and prepare, set up our table correctly right now. So welcome everyone. Hello to the first episode of No Hallucinations, and we're in AI meets reality. Like right now, I believe the decisions, the data leaders, the decisions of the data leaders that are being taken right now will shape every single AI transformation that is happening, that is going to happen in the next 5, 10 years. Nobody is talking about the correct foundation, the right setup of all of these transformations that are going to happen. Everyone is chasing the context for AI. But context built on duplicated and untrustworthy data is just a more confident way to be wrong. And my guest today is Malcolm, Malcolm Hawker. Malcolm is the CDO of Profisee, 20 years of experience in master data management, I believe, also. And I believe also that Malcolm, you're coming back from Gartner conference, which happened recently. So maybe let's start with, let's talk about the elephant in the room right now and share with us some insights. So what did happen exactly in this conference? So in Gartner in London, I believe.
So London's coming up. London will be in 2 weeks. The Gartner conference that I was at was in Orlando. That was a few weeks ago now. All the days are melting together. But, you know, Gartner is, I would argue, is the preeminent conference for data and AI leaders. It is mostly attended by senior directors, VPs, C-level executives who are responsible for AI and for data and for analytics. And the number one thing that I heard over and over again, you said it, Michał, is context, context, context. It seemed to be the word that I heard the most at every session, in the hallways, in the exhibit hall. Everybody's trying to figure out context. And if you ask me, the reason why we're talking about context is because everybody's trying to figure out how to operationalize all of their legacy structured data. We've been doing data a long time, but most of the data that we manage is highly structured, sitting in tables, sitting in relational stores. And that data in and of itself is not very actionable by LLMs. LLMs prefer text. LLMs prefer unstructured data. The more text you can put into a prompt window, the more context, the more detail, the more accurate and consistent and predictable the answers are going to be. So if we're going to use all of this relational data that we have, increasingly what companies are realizing is that we need to be able to add context to it. Probably the number one way most are doing context these days are talking about things like knowledge graphs and how to implement knowledge graphs.
Knowledge graphs, RAGs, pipelines, stuff like this. It's almost everywhere right now.
Agreed. The thing that we're not talking enough about is the context that is inherent to MDM. We have hierarchies. We've always had hierarchies within MDM platforms. We've always been a source of truth for context. So the combination of MDM plus data catalogs, often data catalogs will be where these things are defined, where you will have your business glossary. But MDM is where you're operationalizing a lot of this context. So the good news for MDM practitioners, the good news for data and AI practitioners is that context is hot. We've got to figure this out. But it is a little bit challenging because the skill set that is needed to create and manage a lot of context typically within most organizations is happening in what is known as more of a knowledge management function where people are managing complex ontologies and taxonomies. Maybe they're managing your search infrastructure, maybe they're managing your product catalog. There are people in your organization that know this stuff pretty well. We've got to find a way to reach out to those people and pull them into the data and analytics function within our organizations to have a more holistic approach.
This is maybe a very good question right now that I want to ask you. So we have the context, we have agent workflows that can just feed data directly to MDM tables right now. But exactly and very precisely, what does that unlock from the human perspective as well? How does it impact the work that I'm doing, all my organization is doing right now?
Well, the biggest impact to humans is we're going to have to figure out how to master unstructured data. And right now we're not really doing that very much. We may do a little bit of capture of some data for maybe we're capturing data off of forms for healthcare-related use cases or insurance-related use cases. But Gartner says that 80 to 90 percent of all data in organizations is unstructured. And if we are going to use that data, operationalize that data, that 80 to 90 percent of data for AI, we're going to have to govern it. We're going to have to apply data quality to it. We're going to have to master it, even traditional data for traditional MDM. So that's a big change coming for a lot of people: how do we do that? How do we adapt our governance policies to managing, mastering quality for unstructured data?
How do we do it? From my perspective, every single time we have the unstructured data in order to govern it, to model it correctly, we need to squeeze some kind of the structure out of this unstructured data. Let it be the tags, enrichment of the data, putting on additional columns, whatever. LLMs can help us doing so, but right now we are entering these questions about also AI governance, LLM governance, the unstructured data governance. Now, do we even know what does it really mean, like the AI governance? This is a serious question.
It is a serious question, and I'm giving you a serious answer. The answer is no, we don't really know because we've got legacy frameworks, legacy approaches, legacy rules that don't really work that well when it comes to unstructured data, particularly text. If I read a paragraph and come to a certain conclusion about what it meant or its truth, and you read that paragraph and come to a different conclusion, who's right? Who's wrong? These are the questions we need to start to answer. What does it mean to have ethical data? I think we can start to model what ethics mean from the model behavior perspectives, but what does that actually mean? What is ethical data? When we actually look at the data that will be used to train and guide and ground these models, how do we reduce bias in the data? I don't think we really know. We know it when we see it in the models when the models behave a certain way, but in the core data itself, we really don't know. We don't really know what it means to fully govern all of this unstructured data. There's some basic things we can do. Starts with tagging for sure. That's why so many people are focused on data catalogs and increasingly on MDM as well, because we need to tag all of this data. We need to know what's there. We need to apply metadata, create metadata for these video files, for these text files. That's a starting point.
Everything reminds me about libraries, basically. Like the old physical libraries in which you had like the huge collections of the unstructured data. They are called books. And for every single book, you are creating the tags, the categories, the indexes, the indices that are helping to manage all of this. Is it like this metaphor or parallel also similar to you? Like something is happening right now.
It's not a metaphor, it's literal. It is literal. Library scientists who work in our companies, many of them who work in our companies, they typically don't sit in a data and analytics function, which is a bit of a problem. But these library scientists are out there. They're building corporate ontologies. They are building corporate taxonomies. We need to figure out how to pull these people in, or at the very least integrate them to our data governance processes so they're sitting at the table when we're having conversations around things like metadata standards. There's a huge opportunity just to have common definitions for things. This is why MDM will continue to play a critical role in organizations because it is where definitions are operationalized. MDM is how today you make sure that you define something in CRM one way and it's defined the exact same way in an ERP system. And we can extend that. It won't just be CRM and ERP. It will be MP4 files sitting in your marketing SharePoint service. It'll be Word docs sitting somewhere else. So MDM needs to start moving into unstructured data and we need to start pulling unstructured data in.
You did ask a really good question. So we'll be building huge libraries, basically. Huge libraries for all the organizations. And I believe that this is something that I would like also our listeners to stick their head into it because it's an easy concept. Everyone has been to the library. So the people know how does it operate. MDM, ontology, semantic. For the people outside from our data bubble, those concepts may be difficult to grasp. Library is very simple one. So right now I'm a little bit rephrasing our discussion and putting it maybe to a real business case. Something that I've seen in the past years, and as a Profisee also implementator, we've seen lots of M&A cases, like the companies being acquired being merged together. Extremely important topic for both, like, the investors who are purchasing those companies. They want to have synergy effects, let it be, for example, the simplification of the vendor structures or better addressing the customers' needs. And also, this is an important topic, and I believe for some of our listeners, maybe even nightmare, a few nightmares, like the post-merger integration. How right now we can better embrace all of these legacy systems, put them together, master them. And every single M&A that I've seen in my life, there are a few problems, exactly the same ones. So 3 different customer definitions, 4 definitions of vendor, 5 different general ledgers. How can you manage this using the MDM approach, library approach, AI approach? Do you have any stories?
I have many personal stories. Of course, at Profisee we have many clients that are using us to support merger and acquisition use cases. You mentioned post. So that's a big thing, is actually integrating two separate companies. And that's exactly what MDM does all day, every day. We break silos. MDM is about breaking silos, whether that silo exists at the level of a single database or a single application or whether the silo exists at the level of an entire company. Just another way to look at a silo. MDM is very, very good at establishing common definitions for things and then enforcing those common definitions across all of these siloed data assets or siloed business applications. So that's post-acquisition. But something as equally as important is pre-acquisition. Right now, companies spend a ton of money to hire very expensive consultants to do due diligence for a merger acquisition activity to understand: okay, how much do our customer bases overlap? It's the first question that a company will ask because they don't want to buy a duplicate customer, or maybe they do want to buy a duplicate customer, but they want to understand what revenue is at risk. If we merge these two companies together, what revenue will be at risk? To do that, you need to understand what customer base number 1, customer base number 2, and see what the overlap is. It's like a Venn diagram, very basic stuff.
So it's very simple as you're explaining right now. It's really that simple?
It's not that simple. One of the reasons why I'm involved in MDM is because on the surface, it seems like such a simple problem to solve. And when you try to solve it, it is a very complex problem. So in terms of a pre-merger, historically, companies will pay consulting companies to go and basically custom build MDM solutions to answer these questions, when you could just go buy an off-the-shelf MDM platform to understand what are our customer overlaps. Yes, you have to define what a customer is, but that's what MDM does. It's very, very good at helping you scale and automate those.
Hold on, hold on, because I believe that you touch extremely important topic right now. I don't know if I understood it correctly, so let me rephrase it. So the companies, they decided to invest a lot to build like a one-shot solution, just pre-merger, to understand what is the potential overlap between the customer bases. So it's one-shot solution. And later we have also this post-merger activities that are like really painful and they can take years to get solved. So why is this? Do you think that there is lack of consciousness in the market how this problem should be approached? Because in my opinion, why shouldn't we just build one solution to tackle them all? Cheaper, isn't it?
In the case of pre-acquisition or due diligence, that is a world that is dominated by very expensive consultants, and they get billed by the hour. They don't make a ton of money by helping you build a solution that is going to last for the next 30 years. They're going to help you build something that will get you through the due diligence and maybe require you to pay ongoing maintenance for it, if you continue to use it, which most companies don't. So, the question of why did companies do this and throw money and waste money when it comes to M&A activity? Well, partially because large consultancies are involved. And secondly, because I don't think many understand a different way. There's a lot of us that, if you talk to a CIO about what the playbook is for a merger, the first thing they'll talk about is how do we physically integrate systems. How do we take two ERP systems and make them one ERP system? When in reality what you might be able to do is keep them individual but then integrate at a data level. Integrate the data, keep the systems the same, but integrate the data. That's what MDM does, complex integrations across your most important data, the shared data across those two systems. MDM is a great and viable tool in the short run to virtually tie these systems together instead of physically tying these systems together.
I see. Okay. So we discussed a little bit this MDM, the consultancy, which is an extremely important thing. Right now, let's get back to our initial question. Maybe we can find some ways how we can accelerate also.
And by the way, I was talking like Big Five consultancies, not Astral Forest. I know Astral Forest would be looking out for customers' best interest and help them build scalable systems that can go into the future. I'm talking about the big ones that focus more on tax-centric and M&A-centric, the giant ones, the Deloittes and the Accentures and the McKinseys. Not Astral Forest.
Yeah, but anyway, let's get back to the AI right now because every single day I'm using AI and this is like the spiciest question right now that I can ask you. So let's continue the one from the beginning and the reason why I'm pushing you really into it because this is the same question that I am being asked almost every single day. My customers are coming to me and asking, you know, Michał, I just dropped my entire customer database to Claude. It deduces, it answers the questions, it's brilliant. It does everything by itself like magic. So why should I pay for your services and also for Profisee license if I can do everything within Claude? This is the serious question. And I would like really to push you forward with this one.
Well, short answer, you can't do everything in Claude. But let's back up. Can some MDM use cases be fully automated by AI? Maybe. I'm thinking more like maybe CDP, customer data platform use cases, marketing use cases where the cost of being wrong or where the expectation of consistency and predictability is low. If the cost of being wrong is low and the expectation of having the same answer consistently provided over time, and the unit costs are reasonably low, okay, maybe you could orchestrate a reasonably complex agentic workflow that ends up looking a lot like MDM and maybe supporting some very basic use cases. Maybe, that's a big maybe, but it wouldn't be enterprise class and it wouldn't be used outside of a marketing function for sure. But if you need consistency, if you need predictability, if you need accuracy over time, most importantly, I started the first conversation we had today was explainability. MDM at its core runs on deterministic rules. Those deterministic rules are defined by a data governance council. People will come together and align on how do I define things? How are things related to other things? What are our minimum data quality standards? By use case. What are our match rules? How will we define what a unique corporate or party entity looks like? Those are deterministic rules that are determined by a governance council. Where you put those into an MDM, you're going to get the same answer consistently over time. The only thing that's inherently probabilistic is the match process. But even then, you apply data steward resources to it to make sure that what you've got, what you're looking at, is accurate and is consistent in the cases where the probabilities are reasonably low when you run these matches. Put it all together, and what you have is a system that is inherently deterministic, that is running on rules, that is predictable and consistent, and can stand the scrutiny of audit, can stand the scrutiny of compliance, can stand the scrutiny of use cases where the cost of being wrong is high. If you are wrong about your customer name, or if you are wrong about a product name, the cost can often be extremely high. And as a data leader, the last thing that you want as a data leader, to look in the eye of your CEO when the CEO asks: what happened? Why did we fail the audit? Why are we in trouble from a regulator? The last thing that you want to say is: because Claude. Because Claude. Claude did it.
Because Claude.
Because Claude did it.
Okay, so let's flip this coin around, maybe. So how, in your opinion, can Claude actually enhance building the MDM platforms? What kind of processes can be automated? Just pick up one, maybe the best one. From your experience?
I would argue that all MDM critical capabilities, and Gartner says there's 13 of them, all like data modeling, data quality, data governance, I would argue all of them can be augmented by Claude. Claude or any LLM can help you define, and we do this in Profisee. We have an AI orchestrator, we call her Aisi, and she runs on your OpenAI tenant and she can help you define data quality rules. She can help you recommend data models. She can help you recommend match processes. So that's augmentation, but that's very different than full automation. You can drastically scale and accelerate a lot of it.
I believe that right now we are very close to this concept of human in the loop, like the human stays always. Now, in my opinion, human doesn't need to be in the loop. It's maybe a little bit contrarian to the market consensus right now, but I believe that many of the processes can be liberated from the human decision-making process. Humans can orchestrate them, humans can monitor them, humans can make them better. But what's your take on this?
When it comes to building an MDM platform, I think for the short term that we will continue to have humans in the loop because humans are ultimately accountable. If we build the RACI matrix, especially from a governance perspective around those use cases that I was talking about, audit, compliance, financial data accuracy, on and on, as long as humans are accountable, I think that they will remain in the loop to some degree.
So you think it's like accountability question mostly that right now we just cannot allow a Wild West, let's say, and taking the decisions by AI by itself or themselves. There must be human because we need to attribute the decision to someone.
It gets tough. In defining rules that you would configure into an MDM platform or data quality platform or data integration platform, I think it's reasonable to assume that we can continue to scale human beings to meet the demand. However, transactionally, that's an area where I don't think it's feasible to always have a human in the loop. If you have an AI-based process, an AI-based chatbot that is doing customer service help, will you always be able to have a human in the loop on every interaction? No, you're not going to. So we need to pivot. This requires a pivot in how we approach governance from a bottoms-up rules-driven process to more of an exception. There will still be rules. But we need to start working from more of an exceptions-driven process. And this is one of the many ways that we need to adapt our governance processes here because it's not feasible transactionally. If somebody's creating, or if there are agents creating new customer records or new products or new something and they're doing it in milliseconds and they're doing millions a day, we're using AI because it can go that fast because it enables that scale. And if you start throwing humans into every single transactional loop, it's going to break. And there'll be no value.
So basically we need to transform the way that we are doing the business right now to embrace the new capabilities of AI. And in order to do so, we will need more projects basically to do so. Talking about projects, I have one additional question for you. This is something that you mentioned, I believe, last week that we were talking. So you said something about the people in the data landscape, in the data environment, and our own difficulty to measure the impact of our job, to measure the impact of our projects that we are implementing. Everyone needs to measure, everyone needs to forecast the real impact on the organization. However, we are refusing it very often. Can you just elaborate on this topic and say it once again? Share it with our public.
The connection between what we were just talking about, humans in the loop scale, is in essence what you're talking about was finding the right AI use case. I can kind of paraphrase. What is the right use case? What is the use case that is going to be ROI positive? What is the use case that is going to align well to automation? On and on. That requires us to take a more rigorous perspective when it comes to measuring the ROI of our investments in everything. When I was a Gartner analyst, I would say to CDOs and to CIOs, VPs of data and analytics, they'd say things like: I'm not getting any business engagement. Nobody's contributing data stewardship resources. I can't get any more people. On and on. Most of the problems most pressing to CDOs are based around the fact that we don't measure the economic impact of the things that we do. We don't measure, and I'm talking about dollars. Pounds, actual money in the bank that can be attributed to data quality, MDM, better data integration, AI, whatever data and analytics use case that you want, we need to be able to start measuring those. And the thing that drives me a little bit nuts is that you have CDOs out there who are saying: this is impossible. This is impossible. You cannot do this because the benefits are indirect. If I can't attribute a dollar in the bank or a euro in the bank because of a data quality rule, Malcolm, this is impossible. I hear what you're saying and it sounds like a nice academic exercise, but I can't do it. Can you imagine if you're that chief data officer and you're sitting in a C-level meeting and your CEO asks: what is the value of your function? Everybody else at the table has built attribution models. HR has built some sort of idea of understanding: okay, if we invest in HR and employee retention programs or maybe an employee wellness program, I can reasonably assume that our employee retention will improve 10 percent. HR is doing that. Finance is doing measurements around what happens when we invest in audit, what risk, how many dollars do we mitigate by investing in better compliance? Everybody else at that table has developed measurements for the effectiveness of their organization. We're in the business of measuring things. We're literally in the business of measuring things. It's what we do. It's what we're hired and paid to do. Yet we're saying impossible. Can't do it. We cannot measure this. And it's ridiculous. Of course we can measure this.
So one message maybe to everyone, a little bit more of courage and start measuring in the best way possible the impact, the real impact that we are doing because it's for our own sake. And we can do this. Even if some kind of approximations is always possible. So the last one, Malcolm, for you. 5 years from now, how do you see MDM landscape?
It's going to look drastically different, but MDM will still play a critical foundation in the management and governance of data because it will have to. All the things that I talked about, unstructured data. How do we apply data quality? Structured data, how do we apply consistent definitions? How do we manage complex hierarchies and complex relationships? And how do we enforce those into the data that matters the most, which is shared across the organization? Our organizations will remain highly federated. Marketing will continue to have its own language. Finance will continue to have its own language. And you need a layer in between those that is consistent, accurate, and predictable. That's MDM. It's not going away. We're still going to be here.
Okay. So the market is going to be booming and exploding basically about MDM. Because this will be the image generated by AI, I believe, as well.
We're going to be using AI. It'll be AI for MDM. But the bigger use case is MDM for AI, figuring out how to master all that unstructured data.
Thank you all. Thank you for listening to us. Malcolm Hawker from Profisee. Enjoy the day. Enjoy the week.

There's a moment every leader knows. You ask a simple question, "what are our margins this quarter?", and two weeks later three departments send three different answers.
I've spent 15 years inside complex organizations across Europe and beyond, and watched the same loop repeat: data scattered, numbers inconsistent, AI pilots that never reach production. So I co-founded Astral Forest with one mission: make data work for the people running the business, not the other way around.
One client used to spend 2,000 person-days a year producing a quarterly financial report. After working with us, every employee gets the answer they need in 15 seconds. That's what transformation looks like.
The hook, the key pattern, and the transcript link. Unsubscribe in one click.
One email per episode, on release day. That is the entire promise.