Membership Testing for Semantic Regular Expressions (PLDI 2025 - PLDI Research Papers)

Who

Yifei Huang, Matin Amini, Alexis Le Glaunec, Konstantinos Mamouras, Mukund Raghothaman

Track

PLDI 2025 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Jun 2025 17:00 - 17:20 at Grand Ball Room 2 - Big Red KATs Chair(s): Alex Potanin

Abstract

This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples.

In this paper, we study the \emph{membership testing problem}.

First, we present a two-pass NFA-based algorithm to determine whether a string $w$ matches a SemRE $r$ in $O(|r|^2 |w|^2 + |r| |w|^3)$ time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs in $O(|r|^2 |w|^2)$ time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a $\approx 2 \times$ overhead over the time needed for interaction with the oracle.

Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an $\Omega(|w|^2)$ lower bound on the number of oracle queries necessary to make this determination.

DOI

https://doi.org/10.1145/3729300

Yifei Huang

University of Southern California

United States

Matin Amini

University of Southern California

United States

Alexis Le Glaunec

Rice University

United States

Konstantinos Mamouras

Rice University

United States

Mukund Raghothaman

University of Southern California

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 Jun
Displayed time zone: Seoul change

16:00 - 17:20	Big Red KATsPLDI Research Papers at Grand Ball Room 2 Chair(s): Alex Potanin Australian National University

16:00 20m Talk		Active Learning of Symbolic NetKAT Automata PLDI Research Papers Mark Moeller Cornell University, Tiago Ferreira University College London, Thomas Lu Cornell University, Nate Foster Cornell University; Jane Street, Alexandra Silva Cornell University DOI
16:20 20m Talk		StacKAT: Infinite State Network Verification PLDI Research Papers Jules Jacobs Cornell University, Nate Foster Cornell University; Jane Street, Tobias Kappé Leiden University, Dexter Kozen Cornell University, Lily Saada Cornell University, Alexandra Silva Cornell University, Jana Wagemaker Radboud University Nijmegen DOI
16:40 20m Talk		Probabilistic Kleene Algebra with Angelic Nondeterminism PLDI Research Papers Shawn Ong Cornell University, Stephanie Ma Cornell University, Dexter Kozen Cornell University DOI
17:00 20m Talk		Membership Testing for Semantic Regular Expressions PLDI Research Papers Yifei Huang University of Southern California, Matin Amini University of Southern California, Alexis Le Glaunec Rice University, Konstantinos Mamouras Rice University, Mukund Raghothaman University of Southern California DOI