LLM-powered Assistants for Advanced Geospatial Dataset Recommendations based on Geolocated Queries
Arnaud Le Carvennec (CS Group)Earth 1
Integrating Earth observation, meteorological, and sensor data into domain-specific research often requires substantial pre-existing knowledge. Leveraging Large Language Models (LLMs) to interpret natural language prompts offers a solution, enabling scientists in fields such as climate science, biology, humanities, and economics to access relevant geospatial datasets intuitively. This paper presents an innovative system using LLMs, specifically through Retrieval Augmented Generation (RAG), to recommend advanced geospatial datasets based on geolocated queries. Developed using the LangChain framework, the system incorporates data from sources like Destination Earth, Eurostat, and ECMWF, dynamically expanding its knowledge base with online data. This approach provides precise dataset recommendations and bibliographic references, enhancing research integration and application. The architecture, workflow, and technical implementation are detailed, emphasizing the system’s traceability, multi-LLM integration, and effectiveness, as demonstrated through improved answer relevance and relevant data collection recommendations.