Date of Award
5-2026
Document Type
Thesis
Degree Name
Master of Science
Department
Electrical Engineering
Abstract
Text-to-SQL aims to translate natural language questions into SQL queries that can be executed on databases, enabling non-expert users to retrieve information without learning formal query languages. Early Text-to-SQL systems relied on rule-based methods and semantic parsers, while recent advances in deep learning have achieved strong performance by jointly encoding user questions and database schemas. However, these approaches typically require large, annotated datasets and specific model architectures. With the emergence of large language models (LLMs), such as GPT-4, Llama, and Gemma, Text-to-SQL systems can leverage powerful natural language understanding capabilities to generate SQL queries using zero-shot or few-shot prompting. Despite these advancements, existing research has largely focused on conventional relational databases, with limited attention given to geospatial databases that involve specialized spatial data types and functions.
This thesis addressed this gap by investigating LLM-based Text-to-SQL for geospatial information retrieval. A new benchmark dataset was constructed with a PostGIS spatial database, containing natural language questions paired with SQL queries that incorporated spatial operations such as distance calculations, spatial joins, and geometric predicates. To expand the dataset and improve diversity, additional question-query pairs were generated through LLM-based data augmentation.
Furthermore, building on this benchmark, a Text-to-SQL pipeline was developed that integrated multiple state-of-the-art LLMs to translate natural language queries into executable spatial SQL statements. The system incorporated database schema information within prompts to improve query generation. Experimental results demonstrated that the proposed pipeline can effectively retrieve geospatial information using natural language queries, achieving competitive performance regarding Execution Accuracy and Valid Efficiency Score.
Index Terms- Geospatial database, information retrieval, large language models (LLMs), text-to-SQL.
Committee Chair/Advisor
Xishuang Dong
Committee Member
Xiangfang Li
Committee Member
Lijun Qian
Committee Member
Anthony Hill
Publisher
Prairie View A&M University
Rights
© 2021 Prairie View A & M University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Date of Digitization
6/15/2026
Contributing Institution
John B Coleman Library
City of Publication
Prairie View
MIME Type
Application/PDF
Recommended Citation
Ogunsusi, T. O. (2026). Llms-Based Text-To-Sql For Geospatial Information Retrieval. Retrieved from https://digitalcommons.pvamu.edu/pvamu-theses/1673