datachat-logo
Search
Close this search box.

451 Research: DataChat talks no-code data science with generative AI, spreadsheet-driven platform

451 Research-Analyst Briefing
Download the 451 Report on DataChat >

Analysts – Krishna Roy
Publication date: Monday, February 12, 2024

Introduction

The data science and analytics market will likely exceed $72 billion in 2027, representing a compound annual growth rate of 13%, according to 451 Research’s latest market sizing forecast. The 134 vendors — 70 from the data science platform sector and 64 from the analytics platform segment — in this forecast illustrate the market’s crowded nature. As a new market entrant, DataChat is touting no-code data science using a platform rooted in a conversational user experience and spreadsheet modality, which are interchangeable, as required.

DataChat’s endgame is to enable data analysts and business individuals to perform tasks — wrangle data, explore and visualize it, create predictive models, and conduct outlier analysis — using an English-language and/or a spreadsheet-driven user experience, rather than code them in Python, SQL or JavaScript. DataChat is also seeking to make data science understandable, trustworthy and replicable to non-data scientists by automatically explaining the steps involved in English.

The Take

The skills barrier for data science and analytics continues to stymie organizations’ efforts to become more data-driven. This issue was identified as the third-biggest hurdle organizations encounter by respondents to 451 Research’s Voice of the Enterprise: Data & Analytics, Data Science & Decision Intelligence Platforms 2023 survey. Thirty-one percent (31.4%) of respondents said their organization lacked the requisite skilled resources and talent to become more data-driven. DataChat’s no-code data-science approach is therefore addressing a genuine pain point. However, the vendor is missing out on an opportunity by not supporting R because it is another popular language for data science. Nonetheless, DataChat is solid because it was conceived with natural language actions and explanations in mind, unlike some other offerings, which have delivered them as poorly integrated bolt-ons.

Context

DataChat has kicked off 2024 by activating its go-to-market strategy. The company is focusing on making advanced analysis possible by individuals who are not highly trained, as well as data scientists with coding skills. DataChat observed the struggles non-data scientists were facing about seven years ago, just as data science rose to prominence as a discipline. This was driven by organizations’ desire to make decisions based on data, rather than gut instinct and experience alone.

DataChat began life as a spinoff from University of Wisconsin-Madison in 2017. The company was founded by Jignesh Patel and Rogers Jeffrey Leo John based on research they began at the University of Wisconsin-Madison. Patel, a former chief scientist and entrepreneur, saw firsthand the struggles data teams were facing creating data science pipelines. Patel and John decided that natural language was the answer to making data science possible for the increasingly divergent set of user personas without programming skills that needed to do it. Their “aha moment” turned into the genesis of DataChat.

Having built an initial prototype in 2019, established a founding team in 2020 and raised series A funding in September 2021, DataChat’s platform came into existence in January 2022. It was subsequently reengineered to take advantage of OpenAI’s ChatGPT-4. DataChat’s $25 million series A funding was led by Redline Capital and Anthos Capital. Celesta Capital and Nepenthe Capital, which led the previous $4 million seed funding round, also participated. DataChat is hiring, so it anticipates a headcount of 40 employees in February. The vendor’s staff members are spread across Madison, Wisconsin; Boston; and the Bay Area.

Strategy

DataChat is targeting data analysts and business individuals without Python, SQL or JavaScript skills; these languages, among others, are the lingua franca of data science and analysis. The vendor’s spreadsheet interface is all about making data analysts and other spreadsheet creators feel comfortable using it. The ChatGPT user experience is aimed at individuals who want a conversational analytics interface. That said, data scientists could use DataChat as well to perform tasks such as extract, transform and load (ETL) and data discovery, for example, given that data management is an onerous and time-consuming aspect of their role.

The vendor is using a direct sales strategy as well as partners. Google is an early partner, which will result in DataChat becoming available through Google Cloud Marketplace. Amazon Web Services is another partner. DataChat operates as a SaaS platform running on AWS and GCP as well as on-premises.

To get up and running on DataChat quickly, the vendor offers a free version for individual use that is hosted by DataChat. Although the free version provides access to all analysis capabilities, it has caps on the number of concurrent sessions (five) and amount of storage provided (100-MB file storage). Data is also limited to 10 million-cell datasets.

Charged-for DataChat Enterprise does not have these restrictions. It is available as a custom deployment for large enterprises that want it locked down within their private network. DataChat Enterprise also comes with a support package, authentication (single sign-on) and the ability to ask unlimited natural language questions.

Product

The first incarnation of DataChat did have a conversational user experience, but it relied solely on GEL (Guided English Language). GEL is the name for the proprietary, declarative language DataChat created to abstract away the complexities of Python, SQL and JavaScript. GEL is a subset of English. It was inspired in part by the NATO Phonetic Alphabet. While GEL made data science easier, it still required some learning; users could not easily converse in it from the beginning.

DataChat has sought to remedy this usability issue by adding a translation layer to GEL. This means individuals can use a familiar ChatGPT-like interface for analysis, so they are doing it in plain English and/or conducting data management and analysis within the integrated spreadsheet-style environment within the platform. In other words, DataChat uses OpenAI’s GPT-4 on Microsoft Azure underneath to enable natural language query and generation. No customer data leaves DataChat as it sends schema to only this large language model (LLM). Furthermore, there is no additional cost to clients because DataChat incurs the cost of using OpenAI’s LLM on Azure.

The vendor also notes that it has built the platform to have an open architecture, so customers will be able to use any LLM they want. Furthermore, the vendor anticipates its Google relationship resulting in the requirement to support multi-modal Gemini, which can process text, audio and visuals — not just language. DataChat plans to incorporate Gemini so that the user experience is on parity with the current one for GPT-4.

DataChat’s data management capabilities enable users to connect to data in Snowflake, Google BigQuery, Presto, Databricks, Amazon Aurora and Postgres, as well as upload CSV files. Expanding connectivity, based on customer demand, is on the road map. Data exploration and transformation are also provided, either as a natural language or spreadsheet experience, with the ability to toggle between them as required. Furthermore, statistical data summaries of what happened in the data science workflow are surfaced in English, as data management and analysis tasks are undertaken, for explainability, traceability and reproducibility purposes.

DataChat’s analysis capabilities take the form of automated machine learning (AutoML), interactive and automatic data visualization, and auto generation of outliers as well as the root cause of them. Individuals can also visualize data themselves or let DataChat’s algorithms do it for them. AutoML means that, by default, DataChat explores several gradient-boosted models and chooses the best one for the user.

However, if these models do not work well, users can click a Default Model Exploration button, which will activate Random Forest and logistic/linear regression models. There is also a button for training clustering models to divide groups of abstract data into similar classes — and the ability to auto-generate times series predictions. Model inferencing via an API is also provided, as well as the ability to publish and embed results elsewhere.

Additionally, DataChat provides a collaborative environment for visualization and narratives using Insights Boards and Workflows. Every chart, ML model or other insight is backed by a Workflow, so that it has a history for data lineage, explanation and documentation purposes. Insight Boards provide the finding from a Workflow — a visualization or a data table, for example — in a customizable layout, able to be shared with others. Insight Boards can also be continuously updated if the connection to the data source is live.

Furthermore, DataChat provides automated semantics involving no manual set up — important because semantic definitions enable non-data scientists to understand their data — and can be onerous to set up by hand. Built-in cost optimization for cloud databases through data sampling and caching is also provided.

Competition

Arguably DataChat’s biggest competitor is Microsoft Corp. because of the ongoing popularity of Excel, the ubiquity of Power BI and Microsoft Azure Machine Learning’s use for data science, as well as Microsoft’s LLM-driven capabilities. However, Microsoft also has a complementary role because of the use of OpenAI GPT-4 on Azure. Google is another cloud provider that, while a partner, could also be a rival. Google provides data science and analysis using an assortment of offerings including Looker and Vertex AI, as well as LLM-driven capabilities, but DataChat is also using Google as a channel to market, illustrating the complementary nature of the relationship.

The large and loyal following among analysts that Salesforce-owned Tableau has gathered over the years makes Tableau a competitor for data analysts’ mindshare. Qlik and Sisense are competitors because they are also likely to be incumbent in many organizations DataChat is courting. Furthermore, all these vendors have embraced generative AI, to one degree or another, to reduce the skills barrier to data science and analysis.

Finally, Sigma Computing springs to mind as a rival because Sigma is also courting analysts and business users with Excel-driven visual analysis and collaboration. ThoughtSpot is another direct competitor because the company is courting the same types of users with a chatbot experience called Sage. No-code data science is also the focus for KNIME, Akkio, Obviously AI and Tellius, making them other competitors, albeit with different approaches to DataChat and each other. However, the link that binds them is the use of generative AI for analytics.

SWOT Analysis-451 Research
Source: 451 Research