We asked DataChat’s co-founder, Rogers Jeffrey Leo John, to tell us about the origins of the company. What was missing from the marketplace? How has DataChat pushed the envelope?
Q: How did DataChat start?
The idea formed after Jignesh [Patel, our CEO and co-founder] was a visiting scientist at Pivotal Labs. He observed their data teams and the problems they were trying to solve. He noticed that most of the problems (and their solutions) followed similar patterns: when training a model, they ended up in a loop of loading the Python package, selecting features, training the model, then analyzing the results. This was followed by tweaking the features and retraining the model over again.
With that observation in mind, we wrote our first paper. In that paper, we suggested an early prototype of Ava, our Conversational Intelligence assistant, that could abstract the model training loop into a Python template.
Q: How did you improve the model training process?
We realized that, by leveraging controlled natural language (CNL), we could abstract away the programming languages (Python, R, SQL, etc.) from the user in favor of a subset of English. That was the genesis of DataChat’s Guided English Language© (GEL), which was inspired by the “language” used by aviators, such as the NATO phonetic alphabet. GEL allows the user to build data science workflows without needing to know Python, R, SQL, or any other traditional data science tool.
We spun our research out into DataChat and have been growing and evolving ever since.
Q: How has Ava evolved?
While developing Ava and GEL to make model training more intuitive, we’ve also expanded GEL to cover a wide array of data science tools and functions, including data ingestion, data wrangling, and visualization, along with machine learning and explainable artificial intelligence. This makes us a truly all-in-one platform that allows more business users to work with their own data to answer their own questions without needing to learn how to code or work with more complicated data science tools.
Q: What are some innovations in DataChat’s design?
One problem we’re solving is the reproducibility gap. A few years ago, the industry didn’t care about reproducibility; they were more concerned with model accuracy and less concerned about how they got there. We baked reproducibility into DataChat from the beginning.
By having conversations with Ava in GEL, we’re actually automating the documentation and commenting pieces of the data science process, too. Our workflows are built in English, which makes it very easy to look back and see exactly what happened and when. This makes it easy to understand the logic behind the pipeline, but also improves governance and transparency across an organization.
Q: As the platform matured, what other problems have you focused on?
Most of the gains in DataChat can be attributed to how we’ve included industry best practices directly in the platform to avoid common pitfalls, such as avoiding including your label in your feature set. This helps build confidence for novice users and lessens the mental load for more experienced users.
We’ve also seen performance improvements when it comes to data wrangling. For example, in one of our test cases, 100 percent of the DataChat users were able to at least attempt every question in a set of data wrangling problems, compared to 73 percent of Python users.
Overall, we think we’ve already pushed the envelope considerably and are continuing to do so as we add more functionality and features to the platform.
Q: Who will use DataChat?
The biggest challenge we’re trying to solve is how we can democratize data science and bring those data science tools to more users. Everybody has data, but not everybody can pay a data scientist to work with it. Or has the time to learn the tools themselves.
Q: How can DataChat improve the data science field?
Obviously, making data science more accessible is a huge win for everybody. Business users feel empowered, giving more time back to their data teams. Rather than chasing tickets, they can spend more time digging deeper into their data to find more novel insights for their organization.
When we were first developing the platform, we ran some user studies with users who had data science knowledge ranging from nothing at all to intermediate Python experience. We found that DataChat improved those user’s time to the first model by 10 times (~2 minutes in DataChat vs. ~20 minutes in Python).
We also saw that DataChat users were able to train more models (an average of six compared to an average of two in Python) and 100 percent of the DataChat users were able to train at least one model (compared to 80 percent of Python users). Models trained in DataChat were also more accurate (with F1 scores in the 0.7-0.8 range compared to 0.3 to 0.8 in Python).
Q: Has the platform met your initial expectations since spinning out from UW-Madison?
Overall, we think we’ve already pushed the envelope considerably and are continuing to do so as we add more functionality and features to the platform. We’ve also made an impact for our customers from day one. They’re more productive, they’re finding insights they couldn’t before, and they’re unlocking tools that were out of their reach before.
DataChat is a cohesive analytics platform that uses natural language to make a broad range of data science tools, including data wrangling, preparation, exploration, visualization, and predictive modeling, accessible to everyone to improve business outcomes. Contact us or schedule a demo to learn more about how DataChat can help you improve your business outcomes.