Sept 29-30, 2025 |  

New York, NY

Conferences

Analytics & Forecasting 2025

Corporate members – login to register and save your spot!

Corporate members – login to register and save your spot! 

Overview

Synthetic data—artificially generated data that mirrors real-world patterns—is gaining traction as a tool for enhancing market research while addressing privacy and compliance challenges. It is already being used in concept testing, product development, packaging and pricing studies, market sizing, and consumer trend research.  

For market researchers, its value lies in delivering scalable, rapid, and privacy-compliant insights, particularly when engaging with niche or hard-to-reach segments. It also supports continuous, always-on research through digital twins and real-time simulations.  

In forecasting, synthetic data helps to mitigate issues arising from data sparsity and signal loss due to privacy restrictions and cold start challenges. However, its ability to model complex human decisions and emotional nuances remains uncertain. Challenges persist regarding data quality, ethical use, bias reinforcement, and regulatory inconsistency. There is also a critical need for industry-wide standards and validation frameworks. Significant questions remain: Can synthetic data replicate the richness and unpredictability of real human behavior sufficiently to support high-stakes forecasting and decision-making? Where is the line between synthetic realism and synthetic bias, and how do we manage it? Is synthetic data a bridge—or a barrier—to more effective and reliable forecasting models?  

MSI and the ARF are excited to co-produce the ANALYTICS & FORECASTING conference for the first time. It will be held at Columbia University’s Business School on September 29-30. This is an opportunity to engage in discussions about the opportunities and limitations of modeling in market research applications and to showcase innovative work from academia and the business sector.

This is a complimentary event exclusively for ARF & MSI Members.   Click here to learn more about membership.

MONDAY, SEPTEMBER 29

8:00 – 9:00am Registration & Breakfast 

9:00 – 9:15am Opening Remarks

Tulin Erdem, Ph.D. – MSI Executive Director, Professor of Marketing – Leonard N. Stern Professor of Business

Oded Netzer – Vice Dean of Research and the Arthur J. Samberg Professor of Business, Columbia Business School 

9:15 – 9:45am Managing Product Review in the Age of Gen AI – the Role of AI Detection 
In the era of LLM models like ChatGPT, many platforms and policymakers have become increasingly worried about the spread of AI-generated misinformation, including fake reviews created by AI. In response, researchers and companies have heavily invested in developing various AI detectors across different fields. Learn how well several leading LLM detectors perform in reliably telling apart human-written reviews, AI-generated fake reviews, and human reviews assisted by AI. While there may be a fundamental challenge in reliably distinguishing unwanted AI-generated fake reviews from human reviews helped by AI, can platforms identify potential bad actors with reasonable accuracy? Is the “arms race” between malicious AI-generated fake reviews and AI detectors likely to continue and become more complex? 

Liye Ma – Professor of Marketing, Robert H. Smith School of Business, University of Maryland. 

9:45 – 10:15am The Next Era of Marketing Mix

While annual planning remains essential for any business, it is no longer realistic to assume that planning and measurement can be done only once a year for brands. Managing the Marketing Mix and planning for the upcoming year has shifted from an annual task to an on-demand capability, available now and in the near future. To succeed, you must be accurate and actionable, as well as flexible and agile. The mix of art and science is becoming a crucial part of Marketing Mix and measurement overall, with both becoming more interconnected to support the alignment of multiple overlapping metrics. How can brands use the latest technology and top-tier data and modeling to turn Marketing Mix from an annual project into a core capability?

Jonathan Dizney – Mix Director, Circana

Troy Noble – VP, Global Measurement Product Lead, Circana 

10:15 – 10:45am Predicting Purchase Intent in Customer Interactions with AI Assistants: Context-Specific Small Language Models vs. LLMs

This study explores how to predict purchase intent during consumer-initiated interactions with AI assistants—both within and outside the purchasing process. As AI assistant adoption rises, identifying purchase intent becomes essential for targeted advertising, especially since many 2 

interactions occur outside the traditional purchase funnel. Both methods accurately predicted purchase intent, but the graph-based model performed better for monetizable keywords, demonstrating that specialized “small language models” can outperform LLMs for niche tasks. These findings provide marketers with practical guidance on when and how to advertise via AI assistant channels. 

Wendy Moe – Dean’s Professor of Marketing, University of Maryland and Amazon Scholar 

10:45 – 11:15am  Morning Break 

11:15 – 11:45am Leveraging Generative AI for Insight Discovery and Delivery
By analyzing what has worked before, advertisers can boost their product knowledge and discover fresh approaches. The tricky part? Getting helpful insights to advertisers exactly when they need them. This presentation demonstrates how Amazon Ads addresses this challenge by leveraging a combination of capabilities and Generative AI.
Pranav Pandit – Head of Insight Products, Amazon Ads 

11:45am – 12:15pm Twin-2K-500: A Dataset for Building Digital Twins

The rise of large language models (LLMs) like GPT has sparked interdisciplinary interest in using “silicon samples” to emulate human responses, affecting both research and practical applications. If these simulations prove reliable, they could improve experimental design, theory development, and customer insights. An increasing number of companies now offer LLM-based tools for these purposes. A high-quality dataset is presented with over 500 questions and 2,000 respondents, demonstrating strong test-retest reliability and replication of known effects. The main focus is on building digital twins, which initially predict human behavior with up to 88% accuracy in test-retest measures. Although encouraging, there are limitations, such as issues with non-normative behavior, representational diversity, and the dataset’s U.S. focus. Still, this resource seeks to accelerate both academic and commercial use of LLM simulations.

Tianyi PengAssistant Professor, Columbia University

Olivier Toubia – Glaubinger Professor of Business, Columbia Business School 

12:15 – 12:45pm Predicting Behaviors with LLM-Powered Digital Twins of Consumers

Digital twins of consumers have emerged as a promising approach to simulate consumer thinking, feeling, and decision-making. Grounded in the psychological theory, which conceptualizes behavior as a function of both personal traits and contextual factors, this research proposes and validates a dual-component framework for constructing Large Language Model (LLM)-based consumer digital twins. Fine-tuning on consumer-specific data, including user-generated content, allows the model to internalize individual traits, preferences, and cognitive and behavioral patterns. At the same time, retrieval-augmented generation (RAG) dynamically incorporates information specific to consumer context at inference. By aligning LLM adaptation techniques with foundational psychological theories about behavior, this method enables psychologically grounded simulations of individual-level consumer behavior at a scale. This research contributes to the literature on generative AI, synthetic agents, and 3 digital twins in consumer research and, at the same time, offers a new methodology for theory-driven modeling and privacy-compliant personalization in practice. 

Shane Wang – Professor of Marketing, Pamplin College of Business, Virginia Tech 

12:45 – 1:45pm Lunch 

1:45 – 2:20pm Panel Synthetic Data: Pros and Cons for Market Research 

Synthetic data enables researchers to quickly create large, privacy-safe datasets, simulate rare consumer behaviors, and test hypothetical product scenarios. It can also help balance samples and cut data collection costs. However, synthetic data might not fully reflect real-world consumer complexity, risk introducing biases, and be difficult to validate. Insights based solely on synthetic datasets may overfit or fail to generalize, making careful testing against real data essential. Hear a variety of perspectives about the benefits and drawbacks of using synthetic data.

Rajan Sambandam – President, TRC Insights

Ayelet Israeli – Marvin Bower Associate Professor, Harvard Business School

Neeraj Arora – Arthur C. Nielsen, Jr. Chair in Marketing Research and Education, University of Wisconsin-Madison

Rob Kaiser, Ph.D. – Chief Methodologist, PSB Insights

Moderator: Oded Netzer – Vice Dean of Research and Arthur J. Samberg Professor of Business, Columbia Business School

2:20 – 2:50pm Replicas & Realities: The Strategic Role of Consumer Twins

Digital consumer twins are unlocking new possibilities in innovation analytics, from spotting early winners to tailoring concepts for targeted segments. Drawing on Colgate-Palmolive’s CPG experience, this session explores where these models deliver the most value, where they fall short, and how they can shape a more effective learning strategy. 

Katy Qian – Data Scientist, Colgate-Palmolive 

Helen Wolf, Ph.D. – Senior Director, Global Consumer Experience Insights Colgate-Palmolive 

2:50 – 3:20pm Semantic Targeting

This research presents a methodology for real-time, personalized targeting by creating dynamic customer representations that integrate diverse behavioral data. The approach combines LLM-based item embeddings with attention mechanisms to address data sparsity and fragmentation. Evaluated on real-world digital advertising data, the attention-based model 4 

outperforms simple averaging, especially for long-term or abstract predictions, producing stable and generalizable customer embeddings. This work advances marketing research by adapting LLMs to customer data and provides practitioners with a scalable, flexible foundation for targeting systems that adapt to evolving consumer behavior.

Isamar Troncoso – Assistant Professor of Business Administration, Harvard Business School 

3:20 – 3:50pm Afternoon Break 

3:50 – 4:20pm Optimal Product Design Synthesis: Pairing Generative Models with Adaptive Preference Measurement

This research introduces a preference measurement framework that combines generative AI with adaptive survey design to guide product development based on individual consumer preferences. Using Stable Diffusion to generate realistic product images on-the-fly and high-dimensional Bayesian optimization to learn preferences efficiently, the method goes beyond traditional approaches like conjoint analysis. It captures nuanced aesthetic preferences—colors, textures, shapes—and enables automatic generation of designs aligned with consumer tastes. The framework can reduce design costs, accelerate development, and create products better matched to consumer preferences across industries such as fashion, automotive, and creative goods.

Ryan Dew – Assistant Professor of Marketing, The Wharton School, University of Pennsylvania 

4:20 – 4:50pm LLM Time Machines: Valuing Brands Over Time

This research explores using Large Language Models (LLMs) like ChatGPT to estimate brand value over time, especially for free digital services such as social media platforms. Traditional methods are expensive and static, failing to reflect dynamic changes in brand equity. By benchmarking LLM-generated valuations against annual data from incentive-compatible discrete choice experiments, LLMs can accurately mimic human choices and valuations. This approach allows for retrospective brand value estimation and provides predictive insights into future trends, highlighting shifts linked to major events and changes in consumer behavior. The method gives marketers a scalable, practical tool to monitor and forecast brand equity, improving strategic brand management. 

Felix Eggers – Professor, Copenhagen Business School 

4:50 – 5:00pm Closing Remarks

Scott McDonald, Ph.D. – President & CEO, ARF 

5:00 – 6:30pm Cocktail Reception 

 

TUESDAY, SEPTEMBER 30 

8:00 – 9:00am Registration & Breakfast 

9:00 – 9:15am Opening Remarks

Scott McDonald, Ph.D. – President & CEO, ARF 

Olivier Toubia Glaubinger Professor of Business, Columbia Business School 

9:15 – 9:45am From Noise to Signal: When Synthetic Data Adds Real Value

Information theory, decision science, and structural analysis offer a principled approach to determining where synthetic data provides the greatest value in market research. By quantifying tradeoffs across analytical accuracy, economic impact, representativeness, and privacy, Dynata can prioritize use cases where synthetic data meaningfully enhances insight, fills gaps, or supports inference. They will illustrate this framework with examples showing how entropy, mutual information, and scenario modeling guide the selective deployment of synthetic data for segmentation, rare-event simulation, and feasibility forecasting. 

Alan Briancon, Ph.D. – Vice President of Research & Data Science, Dynata 

9:45am – 10:15am Using LLMs for Market Research

This paper examines how LLMs can help researchers and practitioners understand consumer preferences. Focusing on the distributional nature of LLM responses, multiple outputs are generated per survey question and present two key findings. First, willingness-to-pay estimates derived from LLM responses are realistic and comparable to human studies. Second, that fine-tuning LLMs with prior survey data improves alignment with human responses for existing and new product features, though not for entirely new categories or segment differences. These results highlight when and how LLMs can meaningfully support product development and market research.

Ayelet Israeli – Marvin Bower Associate Professor, Harvard Business School

10:15 – 10:45am Morning Break

10:45 – 11:15am Synthetic Data in Action: Ipsos’ Human-Centered Approach to AI Research

Synthetic data—especially through tools like PersonaBots and digital twins—is transforming market research by enabling faster project turnaround times, greater efficiency, and a powerful new way to access insights. In partnership with Stanford, Ipsos is building a ground-truth database to validate these AI-generated responses. While synthetic participants offer real-time interaction and can simulate hard-to-reach segments, a human-in-the-loop, ethically 6 grounded approach is necessary to ensure accuracy and accountability. The potential is significant, but only with transparent methods for validation and thoughtful strategies to blend human and synthetic respondents. 

Stephanie Bannos-Ryback – Head of Business Transformation & AI Change Management, Ipsos

Xufeng Wang, Ph.D. Partner & Chief Data Scientist, Data Labs, Ipsos 

11:15am – 11:45pm Simulating Real Consumers: How Accurately Can LLMs Mimic Individual Respondents in Market Research?

This study explores whether Large Language Models (LLMs) can emulate specific individuals for market research. Unlike previous work that focused on group-level preferences or stylized personas, detailed socio-economic, psychographic, and personality data from real people are integrated into an LLM to create synthetic counterparts. The results demonstrate that synthetic respondents can replicate individual consumers with high accuracy. However, systematic biases still exist. The findings suggest that LLM-based synthetic personas cannot yet replace traditional market research but can serve as scalable, fast, and insightful tools when used carefully.

Leonard Kinzinger, M.Sc.  Doctoral Researcher at the Professorship of Digital Marketing, Technical University Munich (TUM)

11:45am – 11:50pm Closing Remarks

Scott McDonald, Ph.D. President & CEO, ARF

End of Conference 

Presentations, Recordings, and Summaries

Please visit HERE to access the conference resources.

By using MSI.org you agree to our use of cookies as identifiers and for other features of the site as described in our Privacy Policy.