Regular expression matching: DP with dot and star

Problem. Implement regex matching for . (any single character) and * (zero or more of the preceding element). The match must cover the entire string.

Examples:

  • s="aab", p="c*a*b"True — c* matches “”, a* matches “aa”, b matches “b”.
  • s="ab", p=".*"True — .* matches any string.
  • s="aa", p="a"False
  • s="", p="a*"True — a* matches zero 'a’s.
  • s="aaa", p="a*a"True — a* matches “aa”, a matches “a”.

Solution. Bottom-up DP on a (len(s)+1) × (len(p)+1) table.

def is_match(s, p):
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True
    # handle patterns like a*, a*b*, .* matching empty string
    for j in range(2, n + 1):
        if p[j - 1] == "*":
            dp[0][j] = dp[0][j - 2]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == "*":
                # zero occurrences of preceding element
                dp[i][j] = dp[i][j - 2]
                # one or more occurrences
                if p[j - 2] == "." or p[j - 2] == s[i - 1]:
                    dp[i][j] = dp[i][j] or dp[i - 1][j]
            elif p[j - 1] == "." or p[j - 1] == s[i - 1]:
                dp[i][j] = dp[i - 1][j - 1]
    return dp[m][n]

O(m × n) time and space.

Gotchas.

  • * applies to the preceding character, not the preceding substring. "ab*" means “a” followed by zero or more "b"s.
  • Initialise column 0 carefully: dp[i][0] = False for all i > 0 (non-empty string can’t match empty pattern).
  • Patterns like "a*b*c*" can match the empty string — handle the first row by checking dp[0][j-2] whenever p[j-1] == '*'.
  • This is different from wildcard matching where * means “any sequence” — here * is a quantifier.

Run: regex matching

  • * applies to the preceding character, not the preceding substring. "ab*" means “a” followed by zero or more "b"s.
  • Initialise column 0 carefully: dp[i][0] = False for all i > 0 (non-empty string can’t match empty pattern).
  • Patterns like "a*b*c*" can match the empty string — handle the first row by checking dp[0][j-2] whenever p[j-1] == '*'.
  • This is different from wildcard matching where * means “any sequence” — here * is a quantifier.

Run: regex matching