Generative AI for databases

2024-07-10
2 min read.
Easier way to analyze complex tabular data
Generative AI for databases
A new tool enables someone to perform complicated statistical analyses on tabular data using just a few keystrokes (credits: MIT News; iStock)

MIT researchers have developed a new tool that makes it easier for database users to perform complicated statistical analyses of tabular data, without the need to know what's going on behind the scenes.

GenSQL, a generative AI system for databases, could help users make predictions, detect anomalies, guess missing values, fix errors, or generate synthetic data with just a few keystrokes.

GenSQL combines a tabular dataset with a generative probabilistic AI model, which can account for uncertainty and adjust their decision-making based on new data.

GenSQL can also produce and analyze synthetic data that mimic the real data in a database—useful where sensitive data cannot be shared, such as patient health records, or when real data are sparse.

Extending SQL

This new tool is built on top of SQL, a programming language for database creation and manipulation that was introduced in the late 1970s and is used by millions of developers worldwide.

Compared to popular, AI-based approaches for data analysis, GenSQL is faster and also produces more accurate results, the researchers say. Also, the generated models are explainable, so users can read and edit them.

Next, the researchers want to apply GenSQL more broadly to conduct large-scale modeling of human populations. With GenSQL, they can generate synthetic data to draw inferences about things like health and salary while controlling what information is used in the analysis.

ChatGPT-like AI expert

In the long run, the researchers want to enable users to make natural language queries in GenSQL. Their goal: develop a ChatGPT-like AI expert one could talk to about any database, which grounds its answers using GenSQL queries.   

The research was recently presented at the ACM Conference on Programming Language Design and Implementation. It is funded in part by the Defense Advanced Research Projects Agency (DARPA), Google, and the Siegel Family Foundation.

Citation: Mathieu Huot et al., 20 June 2024, Proceedings of the ACM on Programming Languages, Volume 8, Issue PLDI, https://doi.org/10.1145/3656409 (open access)



Related Articles


Comments on this article

Before posting or replying to a comment, please review it carefully to avoid any errors. Reason: you are not able to edit or delete your comment on Mindplex, because every interaction is tied to our reputation system. Thanks!