VP & Principal Analyst — Doug Henschen
Publication date: September 23, 2024
Analytics vendors are scrambling to harness GenAI. Leaders will deliver with trust and transparency.
Executive Summary
The analytics market is mature in many respects, but it’s rapidly evolving in the era of generative artificial intelligence (GenAI). Augmented capabilities, headlined by GenAI, promise to ease analysis, improve understanding, and bring advanced predictive capabilities into the mix. But adoption won’t happen unless organizations can be assured of data security, decision transparency, and the reliability of insights. This Constellation Research Trend Report explains which analytics capabilities are emerging to satisfy customer demands for easy conversational natural-language (NL) analysis and advanced analytical capabilities without the need to accept black box results or the risk of hallucination. It also includes Constellation’s independent analysis of startup vendor DataChat, which has developed its own NL approach to drive iterative conversational analysis and advanced analytical capabilities such as machine learning (ML). Early adopters should use this report to better understand trends in the analytics and business intelligence (BI) market and DataChat capabilities, including the company’s latest efforts to harness GenAI for improved NL interaction while ensuring data security, decision transparency, and trust.
Analytics: Market description and trends
Every organization wants to become more agile, digitally transformed, and data-driven. Analytics products can play a big role in meeting these objectives—but only if they include AI-; ML-; and, increasingly, GenAI-based augmented capabilities. Augmented features help organizations be more proactive, uncover insights, and eliminate manual tasks.
Here’s a closer look at what many analytics vendors are working on at the request of their most demanding customers.
Augmented options mature, with GenAI at the cutting edge
The next generation of analytics products is harnessing heuristics, ML, AI technologies, and GenAI to improve data access, suggest data sources and analyses, uncover latent insights, and predict future outcomes. Augmented features extend self-service to a broader base of users and speed data prep and analysis for advanced users.
On the cutting edge, analytics vendors are working to put augmented capabilities on steroids by using GenAI. As shown in Figure 1, these generative features will address new and existing use cases and may save immense amounts of time and effort. Constellation sees the following augmented capabilities maturing or emerging:
Augmented data prep, cataloging, and governance
These features start with automated data profiling, guided cleansing and formatting, data source recommendations, join recommendations, and assisted data segmentation. Data catalogs index data sources and assets such as models, reports, and dashboards and then use ML to map usage and query patterns to drive recommendations. Certification features steer users to reliable metrics, sources, and assets, and lineage and policy-based management features help with governance. GenAI has the potential to automate many data prep activities, including labor-intensive tasks such as data redaction/masking, segmentation, and enrichment. GenAI also has the potential to refine and further automate the development of catalogs, data models, and data recommendation.
Automated trending, forecasting, and prediction
These capabilities start with simple push-button trending and forecasting features that can be harnessed by data-savvy analysts. More-advanced analytics and BI products have introduced AutoML features and prebuilt algorithms that yield predictive results. GenAI has the potential to provide more types of predictions as well as predictions accompanied by rich explanations. Large language models (LLMs) can train on unstructured data and graphs of interactions to handle sentiment analysis, for example, more quickly and accurately than currently possible. This could dramatically improve the ability to identify unhappy customers and to proactively cross-sell, upsell, and prevent churn.
Natural-language interaction
Natural-language interaction includes both NL generation and NL query. NL generation adds detailed textual descriptions to key performance indicators (KPIs), charts, and dashboards to provide more context and improve human understanding. NL query turns typed (or speech-to-text) questions into SQL code that supports data exploration and analysis. NL is a sweet spot for GenAI, promising much more verbose and nuanced NL explanations while eliminating much of the manual setup and administrative work previously required to curate language. GenAI also can generate SQL code, but language models have limited understanding of structured data and math. Vendors must combine search and GenAI capabilities with their existing analytical prowess and capabilities, such as semantic layers with domain-specific language, to ensure that answers are accurate and trustworthy.
Consellation's Analysis:
Self-service data prep has become commonplace. Features that offer suggested or guided approaches to cleansing, formatting, and joining data and masking sensitive information are less common, as are data enrichment capabilities. GenAI has the potential to automate data cleansing and enrichment via NL requests.
Augmented discovery and analysis features that spot exceptions and do root cause analysis are now commonplace, as are trending and forecasting features. Predictive capabilities are becoming more plentiful, but many remain targeted at data scientists and data engineers and fewer are usable by analysts and power users, let alone domain experts and business users. NL query and explanations are must-have capabilities, and here’s where GenAI should be harnessed to eliminate labor-intensive language curation.
GenAI has the potential to transform many aspects of analytics products and related administrative and analysis tasks. Caution is advised, with data privacy and cost concerns needing to be addressed and the accuracy of AI-generated content demanding ongoing monitoring and human review.
Importance to Buyers
The analytics market is shifting from the era of self-service to the era of cloud services and augmented advances. Three years ago, many customers managed their own cloud deployments, but today most want software-as-a-service (SaaS) options. Three years ago, augmented features were in their infancy, but it’s now routine to see features that uncover exceptions and root causes and offer predictions. NL query features are easier to set up and administer, and NL explanations are more nuanced, yet big differences in usability and scalability remain.
A continued key challenge for buyers is understanding how capable they can expect their users to be and how receptive users with different skill levels will be to adopting various analytics tools. Fortunately, most organizations aren’t starting from scratch and have a frame of reference as to which types of tools and workflows are embraced by which types of users. Answering the following questions is a good starting point from which to consider cloud, embedded, and augmented options:
- What’s our organization’s level of cloud migration and maturity, and is it time to embrace analytics services available on one or more clouds?
- Can we embed or surface concise decision-supporting analytics into enterprise apps, custom apps, mobile apps, and/or productivity and collaboration platforms?
- If we have confidence in analytics and what it means for the business, can we use alerts to trigger or automate actions and kick off business processes without requiring any human analysis or intervention?
- Can we augment the skill of workers with GenAI, liberating staff from tasks that previously required administrative support and repetitive manual effort?
No matter how experienced organizations may be with analytics and BI, Constellation recommends hands-on review and testing by a broad base of would-be users whenever a significant new investment is being considered.
Datachat vendor profile
DataChat was launched in 2020 by cofounders Jignesh Patel and Rogers Jeffrey Leo John, based on research they published in 2017 at the University of Wisconsin-Madison. Their basic premise was to enable business users to analyze large datasets without needing a technical background or coding skills.
Overview
DataChat’s cofounders sought to move beyond the still-prevailing, not-so-aptly-named “self-service era” of analytics, in which complex data pipelines must be created to feed dashboards that provide answers to predetermined questions. The complexity of self-service analytics has been a barrier to adoption, and the predefined nature of dashboards has limited flexible exploration.
To promote broader data-driven decision-making, organizations naturally want to open up data access to more users and to give them greater freedom to analyze and ask new questions, but here too, the complexity of languages such as SQL and Python is an obstacle to broad adoption.
Enter DataChat, which uses SQL and Python to offer more than 50 analytics functions for common tasks such as data ingestion, wrangling, exploration, visualization, and ML-based discovery. These functions are built into DataChat in the vendor’s own intermediary language, Guided English Language (GEL), which users can easily understand and apply. But, as detailed later, GenAI has been added to communicate with GEL and support interaction entirely through NL.
Analytics functions are invoked via DataChat’s no-code user interface, which offers two modes of interaction: point-and-click spreadsheet-style interaction (see Figure 2) and NL interaction via the Data Assistant (see Figure 3). Users can choose and switch between these modalities as they explore a dataset. Some users might prefer spreadsheet-style interaction, where they point and click to select data sources, apply analytics functions, and then review the results. Others might prefer an iterative, conversational approach in which they ask questions and make requests in NL, receive results, and ask follow-up questions.
Whether users choose spreadsheet-style or NL interactions, DataChat automatically documents each step in the analysis, which can be saved as a workflow (see Figure 4). Analysts can collaborate on their analyses in real time, and saved workflows can be shared and collaboratively reviewed and edited, ensuring transparency, trust, and repeatable execution.
GenAI was added to DataChat in early 2023 to enhance NL interactions and contextual understanding. The GenAI is trained on and communicates with GEL. Although GenAI now powers the NL interaction and understanding, it does not underpin DataChat’s skills, which are based on built-in SQL and Python functions and are not prone to hallucination. Further, DataChat does not send customer data to LLMs. DataChat customers have a choice of LLM options, including Anthropic Claude, Google Gemini 1.5 Pro, OpenAI GPT-4, and Meta’s open source Llama 3. Only metadata is shared with these LLMs to provide context for more accurate interpretation of NL questions and requests. Customer data stays where it lives and is not shared with LLMs, ensuring data privacy and security.
Capabilities
For a closer look, detailed below are deeper insights into DataChat deployment options, integrations, and platform capabilities.
Deployment
DataChat is available as SaaS on the Amazon Web Services (AWS) and Google Cloud marketplaces. A container-based offering is also available for customers that wish to deploy and manage DataChat on the public cloud, private cloud, or on-premises data center of their choice.
Data sources
DataChat is data platform–agnostic but has BigQuery Ready certification from Google and is preintegrated with other data platforms such as Databricks, Microsoft SQL Server, MySQL, PostgreSQL, and Snowflake. Users can also import Microsoft Excel and .CSV files. DataChat is compliant with SOC 2 Type II, an important auditability standard that ensures secure data handling, and supports authentication via enterprise single sign-on, which ensures that the permissions and access controls of the underlying database are respected.
Data integration
Whether drawing on data warehouse tables, uploaded datasets, or both, users can interact with the DataChat Data Assistant in spreadsheet-style or NL mode to perform data wrangling and lightweight extract, transform, load (ETL) tasks such as formatting, joining, deduping, filling in missing values, and aggregating. The Data Assistant provides descriptive statistics and indicators of data quality. Users can also change column names to improve the Data Assistant’s understanding without impacting the underlying database. To ensure efficiency and performance, the computing workloads required for integration and cleansing steps are pushed down to the underlying data warehouse.
Data analysis
With data of interest loaded and prepared, users can use point-and-click or NL interaction to iteratively explore data and drill down from a given response for further investigation. Users can also create and publish pivot tables, charts, and customizable dashboard-like insight boards. Augmented analytical insights include outlier detection and the “Show Me Something Interesting” feature, which provides a list of suggested insights such as sales, unit, and profit analyses by product, brand, or business unit over specified time periods. Users can provide feedback to DataChat’s AI, by upvoting or downvoting responses. This feedback mechanism helps the Data Assistant learn and
provide better responses.
Machine learning and predictive analysis
DataChat goes beyond descriptive and diagnostic analysis with ML and predictive analysis. The platform’s Train Model function handles many of the complex aspects of ML, such as feature identification and removal (aka pruning) and data optimization. More than a dozen built-in models are available, including decision tree, k-nearest neighbor (kNN), linear and logistic regression, random forest, neural nets, and time series analysis.
By default, DataChat automatically explores multiple feature transformations and models and selects the optimal choice. Alternatively, data science–savvy users can choose and configure specific models suited to their requirements. All of the feature transformations, parameter choices, and evaluation metrics produced by DataChat’s AI are available to users as reports within the platform.
Collaboration
Data, sessions, charts, insight boards, models, and workflows can be shared for collaborative analysis. Workflows also can be shared, replayed, and edited, with the ability to add notes; undo and redo edits; and save, manage, and rename verified versions. Additionally, users can collaborate with other DataChat users in real time, just as in Google Docs.
Consellation's Analysis:
Given the conversational nature of DataChat, it’s tempting to assume that it’s a Johnny-come-lately jumping on the GenAI bandwagon, but the company’s cofounders were exploring NL-based data analysis capabilities long before GenAI became popular. DataChat uses GenAI for more natural and contextually accurate conversational interactions, but its core skills are based on SQL and Python data science functions. As a result, DataChat mitigates hallucinations and does not expose customer data to external language models.
Other DataChat strengths
Ease of use and explainability
The combination of conversational NL interactions and saved workflows with in-depth English language explanations ensures understanding, explainability, and repeatability.
Advanced data science capabilities
DataChat spans descriptive, diagnostic, and predictive analytics, with more than a dozen built-in algorithms and AutoML capabilities as well as testing and tuning capabilities.
Choice of interaction modes
In addition to having no-code spreadsheet-style and NL interactions, power users and developers can invoke APIs and get hands-on with native SQL and Python code
under the hood.
DataChat weaknesses
It’s a young company
As a startup innovator, DataChat does not have a deep roster of reference customers, but it has worked with several (unnamed) Fortune 100 customers and has developed assets supporting common use cases, including four common analysis scenarios in the financial services sector.
NL interaction goes only so far
DataChat’s NL interaction approach and shareable workflows make routine data analysis and data visualization tasks easy and understandable for business users. Deeper data-integration; data-cleansing; and ML model-building, -testing, and -tuning tasks remain the domain of more experienced power users, analysts, and junior data scientists.
DataChat complements rather than replaces incumbent tools
DataChat is not intended to replace incumbent BI and analytics platforms with reporting and scheduling capabilities or deeper data science studios. It’s best seen as an innovative option for net-new deployments and a more broadly business-user-accessible complement to incumbent deployments.
Recommendations
Take advantage of augmented analytical capabilities
To drive deeper data analysis and understanding across the organization, consider emerging augmented capabilities for data prep, data discovery and analysis, NL interaction, and automated prediction. These advances promise not only to take some of the drudgery and doubt out of analytics but also to democratize more-advanced challenges such as prediction and proactive action.
The most tantalizing advance in augmented analytics over the past two years—and perhaps ever—has been the emergence of GenAI capabilities. Lots of vendors have announced features and have been talking about the possibilities, but far fewer companies have made those features generally available or introduced upgrades and refinements based on extensive user feedback. Look for capabilities that have been in development for years rather than months, and insist on robust privacy safeguards and feedback mechanisms that ensure GenAI accuracy.
Start with best practices
To embrace cloud, embedded, and augmented analytics offerings more effectively, follow these
best practices:
Build a culture of data proficiency
As you consider how broadly to deploy an analytics platform, it’s a good time to promote a culture of data-driven decision-making. Analysts may have no problem building reports, data visualizations, and forecasts, but ensure that they collaborate with business leaders and business users to understand what insights would be most valuable. Good analytics guides executives and business users to next-best actions and data-driven decisions.
Win executive support
Given all the buzz about all things ML and AI, leaders may have little troubleattracting an executive sponsor for an augmented analytics pilot project, but choose the executivecarefully. The best choice is someone who has influence and credibility in both the business sphere and IT.
Build a broad cross-functional team
Don’t just strive for the proverbial business/IT partnership; also ensure that there’s well-rounded representation of top executives, line-of-business leaders, analysts, and ordinary users on the business side and data scientists, data professionals, and developers on the IT side of the house. It’s crucial to get a clear understanding of the skill levels of the users you plan to support and, further, which user types will be expected to use which tools. Make sure the selected product aligns with and supports your deployment and consumption expectations.
Pick the right project
Whatever the augmented features or vendors being considered, be careful to choose the right project as the pilot test case. Apply the Goldilocks principle: Go for a project that is not too big, time-consuming, or risky yet not so small and inconsequential that it will go unnoticed. If possible, start with easy, quick wins that have notable payoffs. At the same time, make sure you’re not throwing softballs at the candidates. You’ll want to put the products through their paces and test the nuances of their augmented and other features.
Take an agile approach
Agile DevOps approaches are characterized by rapid, iterative development cycles; frequent reviews by cross-functional teams representing the business and IT; and the application of automation and monitoring wherever possible. This approach has been proven to deliver fast results that match the needs and expectations of the business.
Address trust and transparency
Organizations must address trust and transparency as they embrace augmented capabilities powered by ML, AI, and GenAI. Assume that change management practices and training will be required before implementation of these capabilities. People will more readily trust and embrace augmented features if they understand how they work. That’s where transparency comes in. Augmented features should be explainable, not magical.
The danger is that “black box” (nontransparent) predictions and recommendations may not be in the best interests of an organization or its customers. Business ethics and, in many cases, regulations in the United States and elsewhere hold organizations accountable for showing that variables such as age and race are not inappropriately used in decisions. The truly smart augmented features are those that can be understood and trusted.