Date of Award

5-2026

Document Type

Thesis

Degree Name

Master of Science

Department

Electrical Engineering

Abstract

Text-to-SQL aims to translate natural language questions into SQL queries that can be executed on databases, enabling non-expert users to retrieve information without learning formal query languages. Early Text-to-SQL systems relied on rule-based methods and semantic parsers, while recent advances in deep learning have achieved strong performance by jointly encoding user questions and database schemas. However, these approaches typically require large, annotated datasets and specific model architectures. With the emergence of large language models (LLMs), such as GPT-4, Llama, and Gemma, Text-to-SQL systems can leverage powerful natural language understanding capabilities to generate SQL queries using zero-shot or few-shot prompting. Despite these advancements, existing research has largely focused on conventional relational databases, with limited attention given to geospatial databases that involve specialized spatial data types and functions.

This thesis addressed this gap by investigating LLM-based Text-to-SQL for geospatial information retrieval. A new benchmark dataset was constructed with a PostGIS spatial database, containing natural language questions paired with SQL queries that incorporated spatial operations such as distance calculations, spatial joins, and geometric predicates. To expand the dataset and improve diversity, additional question-query pairs were generated through LLM-based data augmentation.

Furthermore, building on this benchmark, a Text-to-SQL pipeline was developed that integrated multiple state-of-the-art LLMs to translate natural language queries into executable spatial SQL statements. The system incorporated database schema information within prompts to improve query generation. Experimental results demonstrated that the proposed pipeline can effectively retrieve geospatial information using natural language queries, achieving competitive performance regarding Execution Accuracy and Valid Efficiency Score.

Index Terms- Geospatial database, information retrieval, large language models (LLMs), text-to-SQL.

Committee Chair/Advisor

Xishuang Dong

Committee Member

Xiangfang Li

Committee Member

Lijun Qian

Committee Member

Anthony Hill

Publisher

Prairie View A&M University

Rights

© 2021 Prairie View A & M University

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Date of Digitization

6/15/2026

Contributing Institution

John B Coleman Library

City of Publication

Prairie View

MIME Type

Application/PDF


Share

COinS