March 28, 2023

Jan 09, 2023Ravie LakshmananDatabase Safety / PLM Framework

Text-to-SQL Model Vulnerabilities

A bunch of teachers has demonstrated novel assaults that leverage Textual content-to-SQL fashions to provide malicious code that might allow adversaries to glean delicate info and stage denial-of-service (DoS) assaults.

“To higher work together with customers, a variety of database functions make use of AI strategies that may translate human questions into SQL queries (particularly Text-to-SQL),” Xutan Peng, a researcher on the College of Sheffield, instructed The Hacker Information.

“We discovered that by asking some specifically designed questions, crackers can idiot Textual content-to-SQL fashions to provide malicious code. As such code is mechanically executed on the database, the consequence could be fairly extreme (e.g., knowledge breaches and DoS assaults).”

The findings, which had been validated in opposition to two business options BAIDU-UNIT and AI2sql, mark the primary empirical occasion the place pure language processing (NLP) fashions have been exploited as an assault vector within the wild.

The black field assaults are analogous to SQL injection faults whereby embedding a rogue payload within the enter query will get copied to the constructed SQL question, resulting in sudden outcomes.

The specifically crafted payloads, the examine found, may very well be weaponized to run malicious SQL queries that might allow an attacker to switch backend databases and perform DoS assaults in opposition to the server.

Moreover, a second class of assaults explored the potential of corrupting numerous pre-trained language fashions (PLMs) – fashions which were educated with a big dataset whereas remaining agnostic to the use circumstances they’re utilized on – to set off the technology of malicious instructions primarily based on sure triggers.

“There are numerous methods of planting backdoors in PLM-based frameworks by poisoning the coaching samples, similar to making phrase substitutions, designing particular prompts, and altering sentence kinds,” the researchers defined.

The backdoor assaults on 4 completely different open supply fashions (BART-BASE, BART-LARGE, T5-BASE, and T5-3B) utilizing a corpus poisoned with malicious samples achieved a 100% success price with little discernible influence on efficiency, making such points tough to detect in the true world.

As mitigations, the researchers recommend incorporating classifiers to verify for suspicious strings in inputs, assessing off-the-shelf fashions to stop provide chain threats, and adhering to good software program engineering practices.

Discovered this text attention-grabbing? Comply with us on Twitter and LinkedIn to learn extra unique content material we publish.