Posts by Collection

portfolio

publications

The Ethics of AI-Generated Maps: DALL·E 2 and AI’s Implications for Cartography (Short Paper)

Published in 12th International Conference on Geographic Information Science (GIScience 2023), Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 93:1-93:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023

This paper investigates the ethics of using artificial intelligence (AI) in cartography, focusing on the generation of maps using DALL·E 2. We created an open-sourced dataset of synthetic (AI-generated) and real-world (human-designed) maps, examined four ethical concerns—namely inaccuracies, misleading information, unanticipated features, and irreproducibility—associated with DALL·E 2 generated maps, and developed a deep learning-based model to identify AI-generated maps. Our work emphasizes the importance of ethical considerations in AI-driven cartography and aims to raise public awareness and support the development of ethical guidelines for AI-generated maps.

Download Paper

FLEE-GNN: a federated learning system for edge-enhanced graph neural network in analyzing geospatial resilience of multicommodity food flows

Published in Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, 2023

We propose FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, to analyze the geospatial resilience of multicommodity food flow networks. FLEE-GNN addresses challenges in generalizability, scalability, and data privacy, combining the strengths of graph neural networks and federated learning for robust, privacy-preserving analysis of food supply network resilience across regions.

Download Paper

Automating Geospatial Analysis Workflows Using ChatGPT-4

Published in Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024

This study investigates the use of ChatGPT-4 to automate geospatial analysis workflows in GIS by generating ArcPy functions from structured instructions. The approach achieves an 80.5% task success rate, demonstrating its effectiveness and accessibility for domain scientists seeking to automate GIS workflows.

Download Paper

ScienceAgentBench: Toward rigorous assessment of language agents for data-driven scientific discovery

Published in ICLR 2025, 2025

We present ScienceAgentBench, a new benchmark for evaluating language agents for data-driven scientific discovery. ScienceAgentBench consists of 102 tasks extracted from 44 peer-reviewed publications across four disciplines, validated by nine subject matter experts. Each task requires generating a self-contained Python program, and is evaluated using multiple metrics on program correctness, execution, and cost. We assess five open-weight and proprietary LLMs with three frameworks, finding that the best-performing agent solves only 32.4% of tasks independently and 34.3% with expert knowledge. Our results highlight the need for rigorous, task-level assessment before making claims about end-to-end scientific automation.

Download Paper

Mapping Urban Coyote Ecology in Los Angeles: Insights from Citizen Science and Human Mobility Data

Published in M.S. Thesis, University of Wisconsin–Madison, 2025

This thesis investigates the spatial and temporal distributions of urban coyotes in Los Angeles County by integrating citizen science data from iNaturalist with environmental, socioeconomic, and human mobility datasets. Using Random Forest, Geographically Weighted Regression, and structural equation modeling, the study reveals how ecological and anthropogenic factors, including real-time human mobility during the Covid-19 pandemic, shape coyote occurrence and visibility across neighborhoods.

Download Paper

GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation

Published in arXiv, under review for Transaction in GIS, 2025

We introduce GeoAnalystBench, a benchmark of 50 Python-based geoprocessing tasks for evaluating large language models (LLMs) in geospatial analysis and GIS workflow automation. Our results reveal a significant performance gap between proprietary and open-source models, highlighting both the promise and current limitations of LLMs for GeoAI.

Download Paper

AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

Published in EMNLP 2025, 2025

We present AutoSDT, an automatic pipeline for collecting high-quality coding tasks in real-world data-driven scientific discovery workflows. AutoSDT-5K, the resulting dataset, contains 5,404 coding tasks across four scientific disciplines and 756 unique Python packages, enabling the training of LLM-based co-scientists. Models trained on AutoSDT-5K, dubbed AutoSDT-Coder, achieve state-of-the-art results on ScienceAgentBench and DiscoveryBench, closing the gap with proprietary models.

Download Paper

talks

Automating Geospatial Analysis Workflows Using ChatGPT-4

Published:

This talk explores how ChatGPT-4 can be leveraged to automate and streamline geospatial analysis workflows. We discuss practical applications, integration strategies, and the potential for large language models to enhance productivity and reproducibility in spatial data science.

Scalable Inter-County Food Flow Prediction Using Graph Neural Network

Published:

This talk presents a scalable approach for predicting inter-county food flows using Graph Neural Networks (GNNs). We discuss model architecture, data integration strategies, and the implications for food supply chain optimization at regional and national scales.

teaching