(持续更新中...)
recently update log:
0. UnifiedSKG, UniSAr
1. GNN works: LGESQL, ShadowGNN, SADGA, S²SQL (SOTA)
2. RatSQL + Pretraining (STRUG, GraPPa, GAP, GP) + NatSQL
3. PICARD, DT-Fixup, RaSaP
4. wikisql: SeaD, SeqGenSQL, BRIDGE^
The Resources for
Natural Language to Logical Form
Research, Focus onNL2SQL
first.
"自然语言转逻辑形式"研究资料收集: 本阶段主要以NL2SQL的研究为主, 主要包括评测公开数据集、相关论文和部分代码实现、相关博客或公众号文章。
NL2SQL
一、主要评测数据集 dataset
二、主要论文方法及代码实现 papers&code
1. WikiSQL
2. Spider
3. UnifiedSKG
三、相关资源扩展 extend-resources
1. Related Works
1.1. Pre-training
1.2. Systems
1.3. Surveys
1.4. Blogs
1.5. Other Papers
1.6. Tools
2. SQL2Seq
3. 图神经网络 GNN
- Academic, Advising, ATIS, Geography, Restaurants, Scholar, IMDB, Yelp, etc.
Blog
http://jkk.name/text2sql-data/GitHub
https://github.com/jkkummerfeld/text2sql-dataPaper
Improving Text-to-SQL Evaluation Methodology, Finegan-Dollak C, Kummerfeld J K, Zhang L, et al., ACL 2018
- WikiTableQuestions
- WikiSQL
WikiSQL数据集特点:
- 单表单列查询;
- 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
- 条件连接('AND');
- 条件比较('=', '>', '<')
GitHub
https://github.com/salesforce/WikiSQLPaper
Seq2sql: Generating structured queries from natural language using reinforcement learning, Zhong V, Xiong C, Socher R. , 2017.
- Spider
Spider数据集特点:
- Complex, Cross-domain and Zero-shot
- 多表多列查询, 复杂子查询;
- 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG','GROUP', 'HAVING', 'LIMIT');
- join连接:('join', 'on', 'as')
- where连接:('AND','OR');
- where操作:('not', 'between', '=', '>', '<', '>=', '<=', '!=', 'in', 'like', 'is', 'exists')
- 排序操作:('order by', 'desc', 'asc')
- sql连接:('Intersect', 'Union', 'Except')
Home
https://yale-lily.github.io/spiderGitHub
https://github.com/taoyds/spiderPaper
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task Yu T, Zhang R, Yang K, et al. , EMNLP 2018.PPT
spider/wikisql/tableQA数据集统计对比_by gibbsxiong
- SParC
SParC数据集特点:
- Context-dependent and Multi-turn version of the Spider task.
继承Spider特点的上下文多轮任务。
Home
https://yale-lily.github.io/sparcPaper
SParC: Cross-Domain Semantic Parsing in Context, Yu T, Zhang R, Yasunaga M, et al., ACL 2019.
- Context-dependent and Multi-turn version of the Spider task.
- CoSQL
CoSQL数据集特点:
- Cross-domain Conversational, the Dilaogue version of the Spider and SParC tasks.
继承Spider特点的多轮对话任务,涉及意图澄清。
Home
https://yale-lily.github.io/cosqlPaper
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases, Yu T, Zhang R, Er H Y, et al., EMNLP-IJCNLP 2019.
- Cross-domain Conversational, the Dilaogue version of the Spider and SParC tasks.
- Chinese Spider
中文版Spider
Home
https://taolusi.github.io/CSpider-explorer/GitHub
https://github.com/taolusi/chispPaper
A Pilot Study for Chinese SQL Semantic Parsing, Qingkai Min, Yuefeng Shi and Yue Zhang, EMNLP-IJCNLP 2019.
- TableQA
首届中文NL2SQL挑战赛 数据特点:
- 中文加强版WikiSql,金融等泛领域数据
- 单表多列(两列)查询
- 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
- 条件连接('AND', 'OR');
- 条件比较('=', '>', '<', '!=')
Home
https://tianchi.aliyun.com/competition/entrance/231716/informationGitHub
https://github.com/ZhuiyiTechnology/nl2sql_baselinePaper
TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation[J]. Sun N, Yang X, Liu Y. 2020.Blog
RANK
Paper
- Zhang X, Yin F, Ma G, et al. M-SQL: Multi-Task Representation Learning for Single-Table Text2sql Generation[J]. IEEE Access, 2020, 8: 43156-43167. 🔥
-
DuSQL
百度2020语言与智能技术竞赛:语义解析任务,大规模开放领域的复杂中文Text-to-SQL数据集 数据特点:
- 包含200个Database以及对应的2.3万对(question, SQL query),其中18000对用于训练集,2000用于验证集,3000用于测试集。
- 200个Database来自百科infobox、百科表格数据、以及互联网上存在的表格数据。每个Database包含若干张表格(2-11张,平均4.1张),人工构建了表之间的链接操作(即foreign key)。为了验证解析算法Database无关性及question无关性,在训练集合和测试集合的Database无交叉。
- 包含复杂的多表join查询和嵌套查询,复杂度和spider类似。评价方法关注每一个组件的精准匹配度,并消除顺序影响。因此对val的准确度要求更高。具体的sql嵌套结构单元分解如下:
# 关键词和嵌套规则 select: [(agg_id, val_unit), (agg_id, val_unit), ...] from: {'table_units': [table_unit, table_unit, ...], 'conds': condition} where: condition groupBy: [col_unit, ...] orderBy: asc/desc, [(agg_id, val_unit), ...] having: condition limit: None/number intersect: None/sql except: None/sql union: None/sql
# 连接单元 val: number(float)/string(str)/sql(dict) col_unit: (agg_id, col_id) val_unit: (calc_op, col_unit1, col_unit2) table_type: 'table_unit'/'sql' table_unit: (table_type, table_id/sql) cond_unit: (agg_id, cond_op, val_unit, val1, val2) condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
# op操作符 agg_id: (none, max, min, count, sum, avg) calc_op: (none, -, +, \*, /) cond_op: (not_in, between, =, >, <, >=, <=, !=, in, like)
Home
https://aistudio.baidu.com/aistudio/competition/detail/30?isFromCcf=trueGitHub
https://github.com/PaddlePaddle/Research/tree/master/NLP/DuSQL-BaselineVideo
冠军分享 http://mbd.baidu.com/webpage?type=live&action=liveshow&source=h5pre&room_id=4008201814Paper
- Wang L, Zhang A, Wu K, et al. DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 6923-6935.
Dataset
https://github.com/luge-ai/luge-ai/blob/master/semantic-parsing/semantic-parsing.mdBlog
- CCKS2022:金融NL2SQL评测
现有NL2SQL数据和方法主要关注“封闭场景指定库/表”设定,这很难满足业务范围动态发展的需求。从领域特性来看,金融数据多为时间序列,包括日频行情、季频财报、年度GDP、不定期股票质押解质押等,这无疑会增大问题转SQL难度。
论文主要以WikiSQL和Spider为评测数据,相应排行榜详见任务主页。
下面主要整理具有代表性的方法,持续更新补充...
注: Exe_score 表示 | model | Dev accuracy | Test accuracy |,表示执行准确率(Execution accuracy)
Log_score 表示逻辑准确率(Logical accuracy),且Spider中不包括值预测。
-
Weakly Supervised
采用弱监督方法,即不使用sql的逻辑形式作为监督信号。
Paper
- Li N, Keller B, Butler M, et al. SeqGenSQL--A Robust Sequence Generation Model for Structured Query Language[J]. 2020. 🔥
- Min S, Chen D, Hajishirzi H, et al. A discrete hard em approach for weakly supervised question answering[C]. EMNLP-IJCNLP 2019.
- Wang B, Titov I, Lapata M. Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs. 2019.
- Agarwal R, Liang C, Schuurmans D, et al. Learning to Generalize from Sparse and Underspecified Rewards. 2019.
- Liang C, Norouzi M, Berant J, et al. Memory augmented policy optimization for program synthesis and semantic parsing[C].NeurIPS, 2018: 9994-10006.
- Guo T, Gao H. Using Database Rule for Weak Supervised Text-to-SQL Generation[J]. 2019.
Code
- Hard-EM https://github.com/shmsw25/qa-hard-em 🔥
- LatentAlignment https://github.com/berlino/weaksp_em19
- MeRL / MAPO https://github.com/google-research/google-research/tree/master/meta_reward_learning
- Rule-SQL https://github.com/guotong1988/Rule-SQL
Exe_score
Hard-EM 84.4 83.9 LatentAlignment 79.4 79.3 MeRL 74.9 74.8 MAPO 72.2 72.1 Rule-SQL 61.1 61.0
-
ExecutionGuided
Execution Guided (EG) 可以在解码阶段通过执行错误对生成sql的项进行修正,从而过滤了一些不符合实际的sql语句。主要分为三类执行错误:1)句法解析错误,即生成的sql语法错误。2)执行失败。常见的run-time error, 例如SUM( ) 和比较string类型的数据;3)假设执行结果不为空,则空查询的条件错误。例如条件值实际并不存在于预测的列中, 因此会去 Beam Search 实际包含条件值的列。
Paper
- Wang C, Huang P S, Polozov A, et al. Robust Text-to-SQL Generation with Execution-Guided Decoding[J]. 2018.
- Wang C, Brockschmidt M, Singh R. Pointing out SQL queries from text[J]. 2018.
- Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing[J]. 2018.
- Huang P S, Wang C, Singh R, et al. Natural language to structured query generation via meta-learning[J]. 2018.
Code
Exe_score
Coarse2Fine + EG 84.0 83.8 Coarse2Fine 79.0 78.5 Pointer-SQL + EG 78.4 78.3 Pointer-SQL 72.5 71.9
-
SQLNet Framework
设计了一种满足SQL语法的框架, 在这样的语法框架内,只需要预测并填充相应的槽位。 语法框架为:
SELECT $AGG $COLUMN WHERE $COLUMN $OP $VALUE (AND $COLUMN $OP $VALUE)*
在这基础上去完成不同的联合任务的分类预测:
- select-column, 选择的列
- select-aggregation, 聚合操作类型
- where-number, where条件语句的数量
- where-column, where条件中的列
- where-operator, where条件操作类型('<','=','>')
- where-value, where条件值
Paper
- Xu X, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning[J]. 2018.
- Hwang W, Yim J, Park S, et al. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J]. 2019.
- He P, Mao Y, Chakrabarti K, et al. X-SQL: reinforce schema representation with context[J]. 2019. 🔥
- Tong Guo, Huilin Gao. Content Enhanced BERT-based Text-to-SQL Generation .2019.
- Qin Lyu, Kaushik Chakrabarti, Shobhit Hathi, Souvik Kundu, Jianwen Zhang, Zheng Chen. Hybrid Ranking Network for Text-to-SQL. 2020 🔥
Code
- https://github.com/naver/sqlova
- https://github.com/xiaojunxu/SQLNet
- https://github.com/guotong1988/NL2SQL-BERT
Exe_score
RoBERTa-Large-HydraNet + EG 92.4 92.2 BERT-Large-HydraNet + EG 92.2 91.8 RoBERTa-Large-HydraNet 89.1 89.2 BERT-Large-HydraNet 88.9 88.6 BERT-XSQL-Attention + EG 92.3 91.8 (Tong) BERT-base-TableContent-used + EG 91.1 90.1 (Tong) BERT-base-TableContent-used 90.3 89.2 BERT-XSQL-Attention 89.5 88.7 BERT-SQLova-LSTM 87.2 86.2 BERT-SQLova-LSTM + EG 90.2 89.6 GloVe-SQLNet-BiLSTM 69.8 68.0
-
Schema aware Denoising (SeaD)
🔥🔥在text-to-SQL任务中,由于架构设计的限制,seq2seq模型通常会导致局部最优。在本文中,作者提出了一种简单而有效的方法:采用基于transformer的seq2seq模型来加强文本到SQL生成。使用模式感知去噪(SeaD)对seq2seq模型进行训练:由两个去噪目标组成,训练模型从erosion和随机噪声中恢复输入或预测输出(自回归方式),而不是对encoder施加约束或将任务重新格式化为槽位填充。这些去噪目标作为辅助任务,用于在seq2seq生成中更好地建模结构数据。此外,作者改进并提出了一种子句敏感执行引导(Execution Guided, EG)解码策略,以克服生成模型EG解码的局限性。
Paper
- [1] Xuan K , Wang Y , Wang Y , et al. SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising [J]. 2021.
Exe_score
SeaD + EG 92.9 93.0 SeaD 90.2 90.1
-
Schema Dependency Guided
🔥🔥结合Question和Schema之间的依存关系来进行多任务学习。
Paper
- Hui B, Shi X, Geng R, et al. Improving Text-to-SQL with Schema Dependency Learning[J]. arXiv preprint arXiv:2103.04399, 2021.
Exe_score
SDSQL + EG 92.5 92.4 SDSQL 88.7 88.8
-
BRIDGE^
🔥Paper
- Lin X V, Socher R, Xiong C. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing[C]//EMNLP: Findings. 2020: 4870-4888.
Code
Exe_score
BRIDGE^ + EG 92.6 91.9 BRIDGE 91.7 91.1
-
T5 SeqGenSQL
🔥🔥利用T5预训练语言(文本生成)模型, 将问题直接转换为SQL语句。同时,探索了如何利用表格模式信息对问题进行扩充,生成新的(silver)训练数据集
Paper
- Li N , Keller B , Butler M , et al. SeqGenSQL -- A Robust Sequence Generation Model for Structured Query Language[J]. 2020.
- Youssef M, Abdelkader R, et al. SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules[J]. 2021
Exe_score
SeqGenSQL + EG 90.8 90.5 SeqGenSQL(T5-base + 250K silver data) 90.6 90.3 T5-large&mT5-large + Association Rules * 91.2 91.0
-
Information Extraction Approach
信息抽取的方法: 采用统一的基于BERT的抽取模型来识别query提及的槽位类型,包括序列标注方法、关系抽取和基于文本匹配的链接方法。
Paper
- Ping An Life, AI Team. IE-SQL: Mention Extraction and Linking for SQL Query Generation 2020
Exe_score
BERT-IE-SQL + EG 92.6 92.5 BERT-IE-SQL 88.7 88.8
-
MRC Approach
🔥阅读理解的方法: 与传统槽位填充方法不同的是,该方法将NL2SQL转化为QA问题,通过统一的MRC框架来预测不同的槽位。
Paper
- Yan Z, Ma J, Zhang Y, et al. SQL Generation via Machine Reading Comprehension[C]//Proceedings of the 28th International Conference on Computational Linguistics. 2020: 350-356.
Code
Exe_score
BERT-MRC-SQL + STILTs training + AGG enhancement 87.8 87.4 BERT-MRC-SQL + STILTs training 86.2 86.0 BERT-MRC-SQL 85.9 85.9
-
Model Interactive
基于用户交互的语义解析,更偏向于落地实践。在生成sql后,通过自然语句生成来进一步要求用户进行意图澄清,从而对sql进行修正。
Blog
Paper
- Yao Z, Su Y, Sun H, et al. Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study[C]. EMNLP-IJCNLP 2019.
Code
-
GNN Encoding Seq2Seq
🔥Schema-GNN:
利用多表关联信息来建立一个表名、列名为节点,表内、表间关系为边的图。通过GNN方法计算每一个节点(table item)的隐藏状态。在seq2seq模型的encoding阶段,每个query word 向量对每个 table item隐藏向量进行attention计算, 并将attention权重作为每个query word的图表示。在decoding阶段,结合语法规则,如果输出应为table item,则将输出向量与所有table item隐藏向量进行全连接打分,计算其关联程度。LGESQL:
以往的建图方式存在问题:1)忽略了边在拓扑结构中丰富的语义信息 2)无法区分每个节点的局部和非局部的关系。本文方法(Line Graph Enhanced Text-toSQL)在不构建元路径的情况挖掘潜在的关系特征.借助Line Graph,消息可以有效的在连接节点之间以及拓扑有向边上进行传播。在图迭代过程中,局部和非局部关系被显著地集成。同时,还设计了图剪枝的辅助任务,来提高编码器的识别能力。ShadowGNN:
在跨域结构下,传统的语义解析模型难以适应不可见的数据库模式。为了提高稀少且不可见模式的模型泛化能力,我们提出了一种新的架构ShadowGNN,它可以在抽象和语义级别处理schemas。具体地,通过忽略数据库中语义项的名称,抽象schemas利用图映射神经网络来获得question和schema的去符号化表示。在领域无关表示的基础上,利用关系感知转换器进一步提取question和schema之间的逻辑联系。最后,还应用了一个具有上下文无关语法的SQL解码器。SADGA:
Structure-Aware Dual Graph Aggregation Network, 设计了一种基于图结构的聚合方法来学习question图和schema图的映射关系。该聚合方法的特征来源于图的全局链接、局部链接以及双图聚合机制。S²SQL:
以往的基于图的编码器,没有很好的建模question的句法结构。本文利用句法解析器来抽取question的信息,并将句法信息注入到question-schema图编码器中。同时还使用了解耦约束来引导不同的边关系嵌入,从而提升网络性能。Paper
- Krishnamurthy J, Dasigi P, Gardner M. Neural semantic parsing with type constraints for semi-structured tables[C]. EMNLP 2017.
- Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation. 2019.
- Bogin B, Gardner M, Berant J. Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing[C]. ACL 2019.
- Bogin B, Gardner M, Berant J. Global Reasoning over Database Structures for Text-to-SQL Parsing[C]. EMNLP-IJCNLP 2019.
- Shaw P, Massey P, Chen A, et al. Generating Logical Forms from Graph Representations of Text and Entities[C]. ACL 2019.
- Kelkar A, Relan R, Bhardwaj V, et al. Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker[J]. arXiv preprint arXiv:2002.00557, 2020.
- Cao R , Chen L , Chen Z , et al. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations[C]. ACL. 2021.
- Chen Z , Chen L , Zhao Y , et al. ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser[C]. NAACL. 2021.
- Cai R , Yuan J , Xu B , et al. SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL(https://arxiv.org/abs/2111.00653)[C]. NeurIPS 2021.
- Hui B , Geng R , Wang L , et al. S²SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers[C]. ACL Findings. 2022.
Code
- https://github.com/benbogin/spider-schema-gnn
- https://github.com/benbogin/spider-schema-gnn-global
- https://github.com/amolk/Bertrand-DR
- https://github.com/rhythmcao/text2sql-lgesql
- https://github.com/WowCZ/shadowgnn
- https://github.com/DMIRLAB-Group/SADGA
Log_score
S²SQL + ELECTRA (DB content used) 76.4 72.1 SADGA + GAP (DB content used) 73.1 70.1 LGESQL + ELECTRA (DB content used) 75.1 72.0 LGESQL + BERT (DB content used 74.1 68.3 LGESQL + Glove (DB content used) 67.6 62.8 ShadowGNN + RoBERTa (DB content used) 72.3 66.1 ShadowGNN (DB content used) - 64.8 GNN + Bertrand-DR 57.9 54.6 Global-GNN 52.7 47.4 GNN 40.7 39.4 GNN w/edge vectors 32.1 -
-
RATSQL related works
🔥🔥🔥Paper
- [1] Wang B, Shin R, Liu X, et al.RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers [C]. ACL 2020.
- [2] Deng X, Awadallah A H, Meek C, et al. Structure-Grounded Pretraining for Text-to-SQL[C]. NAACL, 2021.
- [3] Gan Y , Chen X , Xie J , et al. Natural SQL: Making SQL Easier to Infer from Natural Language Specifications[C]. EMNLP Findings. 2021.
- [4] Yu T, Wu C S, Lin X V, et al. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing[C]. ICLR 2021.
- [5] Shi P , Ng P , Wang Z , et al. Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training[C]. AAAI 2021.
- [6] Zhao L, Cao H, Zhao Y. GP: Context-free Grammar Pre-training for Text-to-SQL Parsers[J]. arXiv preprint arXiv:2101.09901, 2021.
Code
- https://github.com/Microsoft/rat-sql
- https://github.com/ygan/NatSQL
- https://github.com/awslabs/gap-text2sql
Exe_score
[3] RATSQL + GAP + NatSQL (DB content used) 73.3 Log_score
RAT-SQL + GraPPa + Adv (DB content used) 75.5 70.5 RATSQL++ + ELECTRA (DB content used) 75.7 70.3 [6] RATSQL + GraPPa + GP (DB content used) 72.8 69.8 [5] RATSQL + GAP (DB content used) 71.8 69.7 [4] RATSQL + GraPPa (DB content used) 73.4 69.6 [3] RATSQL + GAP + NatSQL (DB content used) - 68.7 [2] RAT-SQL + STRUG (DB content used) 72.6 68.4 [1] RATSQL v3 + BERT (DB content used) 69.7 65.6 [1] RATSQL v2 + BERT (DB content used) 65.8 61.9 [1] RATSQL v2 (DB content used) 62.7 57.2 [1] RATSQL + BERT 60.8 55.7 [1] RATSQL 60.6 53.7
-
MSRA: IRNet related works
🔥🔥Blog & Video
- 智能数据分析技术,解锁Excel“对话”新功能 Conversational Data Analysis
- Use Ideas in Excel to get Immediate answers with ONE Click
Paper
- Guo J, Zhan Z, Gao Y, et al. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[C]. ACL 2019.
- Dong Z, Sun S, Liu H, et al. Data-Anonymous Encoding for Text-to-SQL Generation[C] EMNLP-IJCNLP 2019.
- Liu H, Fang L, Liu Q, et al. Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP-IJCNLP 2019.
- Liu Q, Chen B, Lou J G, et al. FANDA: A Novel Approach to Perform Follow-up Query Analysis[C]. AAAI 2019.
- Liu Q, Chen B, Liu H, et al. A Split-and-Recombine Approach for Follow-up Query Analysis[C]. EMNLP-IJCNLP 2019.
Code
Log_score
IRNet++ + XLNet (DB content used) 65.5 60.1 IRNet++ + XLNet (DB content used) 65.5 60.1 IRNet-v2 + BERT 63.9 55.0 IRNet + BERT-Base 61.9 54.7 IRNet-v2 55.4 48.5 IRNet 53.2 46.7
-
MSRA DKI Group's works
🔥🔥Paper & Code
Log_score
ETA + BERT (DB content used) 70.8 65.3
-
PICARD
Paper
- Scholak T , Schucher N , Bahdanau D . PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models[C]. EMNLP. 2021.
Code
Log_score
PICARD + T5-3B (DB content used) 75.5 71.9 Exe_score
PICARD + T5-3B (DB content used) - 75.1
-
DT-Fixup SQL-SP
Paper
- Xu P , Kumar D , Yang W , et al. Optimizing Deeper Transformers on Small Datasets[C]. ACL. 2021.
Code
Log_score
DT-Fixup SQL-SP + RoBERTa (DB content used) 75.0 70.9
-
RaSaP
Paper
- Hua Ng J , Wa Ng Y , Wa Ng Y , et al. Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL[J]. 2021.
Log_score
RaSaP + ELECTRA (DB content used) 74.7 69.0
-
EditSQL
Paper
- Zhang R, Yu T, Er H Y, et al. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions[C]. EMNLP-IJCNLP 2019.
Code
Log_score
EditSQL + BERT 57.6 53.4 EditSQL 36.4 32.9
-
RYANSQL
Paper
- Choi D H, Shin M C, Kim E G, et al. [RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases](RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases)[J]. 2020.
Log_score
RYANSQL v2 + BERT 70.6 60.6 RYANSQL + BERT 66.6 58.2 RYANSQL 43.3 -
-
SmBoP
与自上而下的自回归分析相比,半自回归自底向上解析器具有多种优势。首先,由于每个解码步骤中的子树都是并行生成的,因此理论上的运行时间是对数而不是线性复杂度。其次,自下而上的方法学习在每个步骤上学习语义子程序的表示,而不是语义上模糊的部分树。最后,SMBOP基于Transformer的层将子树相互关联起来,与传统的beam-search不同,以探索过的其他树木为条件为树进行评分。
Paper
- Rubin O, Berant J. SmBoP: Semi-autoregressive Bottom-up Semantic Parsing[C]. NAACL, 2021.
Code
https://github.com/OhadRubin/SmBopLog_score
SmBoP + GraPPa (DB content used) 74.7 69.5 SmBoP + BART 66.0 60.5 Exe_score
SmBoP + GraPPa (DB content used) - 71.1
-
SLSQL
Schema Linking is the crux for the current text-to-SQL task.
Paper
- Lei W, Wang W, Ma Z, et al. Re-examining the Role of Schema Linking in Text-to-SQL[C]. EMNLP 2020: 6943-6954.
Code
Log_score
SLSQL + BERT + Data Annotation 60.8 55.7
-
BRIDGE
Paper
- Lin X V, Socher R, Xiong C. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing[C]//EMNLP: Findings. 2020: 4870-4888.
Code
Log_score
BRIDGE(k = 2) + BERT (DB content used) 65.5 59.2 BRIDGE(k = 1) + BERT (DB content used) 65.3 - Exe_score
BRIDGE v2 + BERT(ensemble) (DB content used) - 68.3 BRIDGE v2 + BERT (DB content used) - 64.3 BRIDGE(k = 2) + BERT (DB content used) - 59.9
-
GAZP
GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycleconsistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose inputoutput consistency are verified.
Paper
- Zhong V, Lewis M, Wang S I, et al. Grounded adaptation for zero-shot executable semantic parsing[C]. EMNLP-2020.
Exe_score
GAZP + BERT - 53.5
-
SQLNet Framework
Paper
- Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP 2019
- Yu T, Yasunaga M, Yang K, et al. Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task[C]. EMNLP 2018.
- Dongjun Lee. Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation[C]. EMNLP 2019.
- Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation. 2019.
Code
Score
GrammarSQL 34.8 33.8 SyntaxSQLNet + augment 24.8 27.2 RCSQL 28.5 24.3 SyntaxSQLNet 18.9 19.7 SQLNet 10.9 12.4
Blog
Code
Paper
- Xie T , Wu C H , Shi P , et al. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models[J]. arXiv e-prints, 2022.
jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
- Shi P , Ng P , Wang Z , et al. GAP: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training[C]. AAAI 2021.
A novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus.
- [ ]Deng X, Awadallah A H, Meek C, et al. Structure-Grounded Pretraining for Text-to-SQL[C]. NAACL, 2021.
A new method for Text-to-SQL parsing, Grammar Pre-training (GP),is proposed to decode deep relations between question and database.
- [ ]Zhao L, Cao H, Zhao Y. GP: Context-free Grammar Pre-training for Text-to-SQL Parsers[J]. arXiv preprint arXiv:2101.09901, 2021.
An effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.
- Yu T, Wu C S, Lin X V, et al. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing[C]. ICLR 2021.
this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
- Qin B , Wang L , Hui B , et al. SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing[J]. 2021.
table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
- Liu Q , Chen B , Guo J , et al. [TAPEX: Table Pre-training via Learning a Neural SQL Executor(https://arxiv.org/abs/2107.07653)[J]. 2021.
A pretrained language model that jointly learns representations for NL sentences and (semi-)structured tables.
- Pengcheng Yin, Graham Neubig, et al. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data[C]. ACL 2020.
this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
- Bowen Q, LiHan W, et al.Linking-Enhanced Pre-Training for Table Semantic Parsing. 2021
Adapting a semantic parser trained on a single language.
- Tom Sherborne, Yumo Xu, Mirella Lapata. Bootstrapping a Crosslingual Semantic Parser.2020.
- Zeng J, Lin X V, Xiong C, et al. Photon: A Robust Cross-Domain Text-to-SQL System[J]. 2020.
- Brunner U, Stockinger K. ValueNet: A Neural Text-to-SQL Architecture Incorporating Values[J]. 2020.
- Elgohary A, Hosseini S, Awadallah A H. Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback[J]. 2020.
- Jovan, Martina, Frosina. Recent Advances in SQL Query Generation: A Survey//Part of the 17th International Conference on Informatics and Information Technologies. Received best paper award. 2020.
- NL2SQL概述:一文了解NL2SQL
- 哈工大SCIR: 一文了解Text-to-SQL
- 表格问答1:简介-朴素人工智能
- 表格问答2:模型
- 表格问答完结篇:落地应用
- ACL2020 表格预训练工作速览
- Dhamdhere K, McCurley K S, Nahmias R, et al. Analyza: Exploring data with conversation[C]//Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 2017.
- Chen S, San A, Liu X, et al. A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing[C]. COLING 2020.
- Dou L , Gao Y , Pan M , et al. UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL[J]. 2022.
- Test suite for text2sql code: https://github.com/taoyds/test-suite-sql-eval
- Test suite for text2sql paper: Zhong R, Yu T, Klein D. Semantic Evaluation for Text-to-SQL with Distilled Test Suites[C]. EMNLP2020.
- SQL Parser https://github.com/mozilla/moz-sql-parser
Paper
- Xu K, Wu L, Wang Z, et al. Graph2seq: Graph to sequence learning with attention-based neural networks.2018.
- Xu K, Wu L, Wang Z, et al. SQL-to-text generation with graph-to-sequence model[C]. EMNLP 2018.
Code
- https://github.com/IBM/SQL-to-Text
- https://github.com/IBM/Graph2Seq
- https://github.com/RandolphVI/Graph2Seq
Paper
- https://github.com/thunlp/GNNPapers
- https://github.com/IndexFziQ/GNN4NLP-Papers
- https://github.com/nnzhan/Awesome-Graph-Neural-Networks
- https://github.com/naganandy/graph-based-deep-learning-literature
- https://github.com/svjan5/GNNs-for-NLP
Code