Generative AI Answers: Don’t Trust Your Eyes.

A Conversation with Harvard AI Professor Gil Alterovitz About Solving the Difficult Problems of Producing Accurate AI Analytics

One thing we’ve proven at Citate is that the answer you’re seeing when you ask a prompt of a generative AI like ChatGPT, Google AI Overview, Gemini, Meta.ai, Perplexity or CoPilot is not going to be the same answer that everyone else sees.

How are marketers, PR professionals or reputation professionals supposed to understand what AIs, soon to reach billions of people daily, are really saying if their answers are inconsistent?

Some very smart people think it’s impossible. But Gil Alterovitz, Chief AI Advisor for Citate and an AI assistant professor at Harvard Medical School, came up with one of the critical insights that assures that unlike spot checking, Citate’s analytics are scientifically-valid and evidence-based. I caught up with him so he could explain Citate’s Evidence-Based AI Analytics Methodology.

Josh:

Thank you so much for spending the time and talking about your really colorful background in the space. Let’s jump right in.

Can you trust the response you get to a prompt is the same one every other user is seeing when they ask the same question in the same interface? Is it consistent per user?

Gil:

Not only is the response not consistent across users, but it can also vary for the same user if they ask the same question at a different time. This variation might occur within milliseconds. The reason behind this is the probabilistic approach used by these models, which introduces randomness in generating answers. As a result, both the same user and different users may receive different responses to the exact same question

Josh:

Okay, great. And as far as the why and why not? Why is that?

Gil:

So that’s because of the probabilistic nature of how the models typically work. A Large Language Model will use a probability to try and predict what is the next word in a sequence. And that probability, if you can imagine, like a coin being flipped, it might be, 50/50, it may be 40/60, if you have an unfair coin in terms of percentage of landing tails versus head. And so depending how you flip the coin, one time you may be head right, another time you may be tails and that’s how the model is working, as it predicts words in the sequence of the response words that it’s giving you that complete the sentence for its response to you. And so, because it’s probabilistic, the words can change just based on chance.

Josh:

Wow. So can you explain the difference between a probabilistic and determinative model, and why that makes a difference to people who care about the new AI models?

Gil:

Traditional models can be either deterministic or probabilistic. People are more accustomed to deterministic models—like those used in search engines or encyclopedias—where a specific question maps to a single, fixed answer. In contrast, probabilistic models might provide different answers to the same question because they aren’t fully predictable. When a model has a 100% probability of providing a particular answer, it’s deterministic. But if there’s any variation in probability, even a slight chance of a different answer, the model becomes probabilistic. This inherent randomness can lead to different outcomes, which may not be fully predictable.

Josh:

Great. And as far as responses go, just how different might these responses be? Are we talking about just minor wording or entirely different content?

Gil:

Oh, it can be completely different. The differences in responses can range from minor wording changes to entirely different content. In fact, some models have parameters that control this variability. For instance, if you want the model to be more creative, it might produce more varied answers, both in wording and meaning. Adjusting parameters like temperature can make the responses more standard and predictable, but even then, variations can occur as long as the model isn’t deterministic.

Josh:

Nice. Well, how might that affect a marketer or PR person who’s been tracking their brand reputation or their presence on AIS like Google, AI overview, and ChatGPT, Meta or CoPilot, for example. If you ask an AI for the leading brands of pet foods, is it always going to be always the same brands, just in the same or different order, or is there more to it than that?

Gil:

Yeah, that’s a great question. When considering models like the ones you mentioned, they indeed behave differently. However, one commonality among these Large Language Models is their probabilistic nature. This means that when they generate responses, they draw from a vast range of data, leading to variability in the results. As a result, the list of leading brands for something like pet foods might change—not just in order but potentially in the brands included. Over time, this can further vary as the underlying data evolves, affecting the model’s predictions.

Josh:

If there’s no way to exactly know how a Large Language Model will respond, is that the end of the story for people who care about tracking their reputation or brand presence on all the new AIs?

Gil:

Well, I wouldn’t say it’s the “end of the story”. In fact, it’s the beginning of a new paradigm. Just like weather predictions, which aren’t always 100% accurate, these models require us to track trends and patterns over time. Uncertainty in predictions makes it even more crucial to monitor things like brand reputation, as they can change dynamically under certain conditions. By understanding these variables, we can better predict and adapt to changes in brand presence or reputation across these new AI platforms.

Josh:

Ok, well, we are going to be taking a bit of a pivot here. Can you explain the patented Citate Evidence Based AI Analytics Methodology so an average layman can understand it?

Gil:

Citate Evidence-based AI analytics, particularly with large language models, involves analyzing a wide array of AI answers for each prompt. The model identifies patterns and relationships. It processes this information through multiple layers of neural networks, discovering new relationships. This process is itself probabilistic. We display the confidence levels to clients depending on the sample size and/or period of data collection.

Josh:

So, have you tried this? Does it actually work?

Gil:

Yes, these changes can be observed, not just when you ask the same question repeatedly but also over time. The topics that the model tracks may shift. This means that the ideas or brands tracked in responses can vary, reflecting changes in the underlying data or in how the model interprets that data.

Josh:

Well, once you have the confidence that the distribution you’re seeing is valid or at high-confidence, what kind of stuff can you do with the data when it’s applied to analytics?

Gil:

Once you’re confident in the distribution you’re seeing, you can start making informed decisions based on the data. For example, understanding how often certain concepts appear allows you to see patterns—like how weather predictions give you probabilities for different conditions. By analyzing these distributions, you can determine which themes or brands are more prominent and explore the factors driving their prominence. This kind of analysis helps you understand correlations and associations that could inform strategies in various applications.

Josh:

What are some applications of analytics that you’re most looking forward to in the emerging AI space?

Gil:

The AI space holds a lot of potential, particularly in areas like search where we’ve already seen significant shifts. This new approach is akin to weather prediction—it helps us anticipate and understand trends, allowing for more informed decision-making. For instance, if a particular theme is becoming more prominent, you might investigate why and consider what data or trends are influencing this.

Josh:

Great. Well, are there any last arenas you’d like to cover, anything that you’d like to share in the space that you see as upcoming or emerging that we haven’t covered as a final sign off?

Gil:

Absolutely. I think we’ll see significant advancements in industries where data changes rapidly, such as finance and advertising. In these sectors, information is generated and evolves quickly, making them ideal for the application of AI analytics. These models can help track and interpret these fast-changing data landscapes, providing valuable insights in real-time. As a result, AI could have a profound impact, offering more timely and accurate information that can drive decision-making in these dynamic environments. You have a particular stock index that has a particular value, there are particular news stories related to that. Or in advertising, you have a particular ad campaign or particular brand, or particular news that may affect different themes that are within a campaign, those are things that can change really rapidly. And we know that perceptions change quite rapidly in those fields, as opposed to other fields where things might not be changing as rapidly. I feel like these are a couple application areas where you do see a lot of movement, and where some of these approaches can really have a large effect because of their real time nature.

Josh:

Great. Well, thank you so much for taking the time. Gil. This has been really productive.

Gil:

Thank-you for having me Josh. It’s been a great discussion.