[LeetCode] 010. Regular Expression Matching

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

  • s could be empty and contains only lowercase letters a-z.
  • p could be empty and contains only lowercase letters a-z, and characters like . or *.



Example 1:

Input:**
s = "aa"
p = "a"
Output:False
Explanation: "a" does not match the entire string "aa".



Example 2:

Input:
s = "aa"
p = "a*"
Output: True
Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".



Example 3:

Input:
s = "ab"
p = ".*"
Output: True
Explanation: ".*" means "zero or more (*) of any character (.)".
<br></br>



Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: True
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".



Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false


解題邏輯與實作

這題是要實作一個簡易的Regular Expression的Parser,僅支援a-z、.*,題目也限定了輸入會是這幾個字元,所以可以先不用考慮其他字元的情況。

遞迴

這題我最先想到的是使用遞迴來解,只是判斷的情況有些瑣碎。實作步驟如下:

  1. 若pattern為空字串,則檢查string是否為空字串
    • 是,則回傳 True
    • 否,則回傳 False
  2. 檢查pattern的長度是否大於2且第二字元為 *
    • 是,判斷string是否為空字串
      • Case1:是,則呼叫遞迴傳入string與pattern[2:]
      • 否,則pattern的字首與string的字首是否相等〈相等情況包含pattern字首為.的情況〉
        • Case2:是,則呼叫遞分別傳入string[1:]與pattern、string[1:]與pattern[2:] 和string與pattern[2:]
        • Case3:否,則呼叫遞迴傳入string與pattern[2:]
    • 否,則判斷string的長度是否大於0且pattern的字首與string的字首是否相等
      • 是,則呼叫遞迴傳入string[1:]與pattern[1:]
      • 否,則回傳False



另外針對流程中幾個地回傳入的Case進行說明:

  • Case1: pattern第二字元為*且string是為空字串,則呼叫遞迴傳入string與pattern[2:]
    - *可代表pattern[0]出現0次,因此略過pattern字首向後檢查

  • Case2:是,則呼叫遞分別傳入string[1:]與pattern、string[1:]與pattern[2:] 和string與pattern[2:]
    - string[1:]與pattern:為*出現多次的情況,下個字元比對時仍與pattern字首比對
    - string[1:]與pattern[2:] :為*出現1次的情況,因此下一個字元比對時略過pattern字首
    - string與pattern[2:] :為*出現0次的情況,因此忽略pattern字首

  • Case3: pattern第二字元為*且pattern的字首與string的字首不相等,則呼叫遞迴傳入string與pattern[2:]
    - 在不相等的情況下,因為*的緣故可以假定pattern[0]為不存在〈出現0次〉的元素,因此略過pattern字首向後檢查



程式碼

class Solution:
	def __init__(self):
		self.memo = {}

	def isMatch(self, s, p):
		p_len = len(p)
		if p_len == 0:
			return len(s) == 0
	
		if (s,p) in self.memo:
			return self.memo.get((s,p))

		result = False
		if p_len >= 2 and p[1] == '*' :
			if len(s) > 0 and (p[0] == '.' or  p[0] == s[0]):
				result =  self.isMatch(s[1:] , p) or self.isMatch(s[1:] , p[2:]) or self.isMatch(s , p[2:])
			else:
				result =  self.isMatch(s,p[2:])

		elif len(s) > 0  and (p[0] == '.' or  p[0] == s[0]):
			result =  self.isMatch(s[1:],p[1:])

		self.memo[(s,p)] = result
		return result

不過單用遞迴會Time Limit Exceeded,因此多加了參數暫存比較結果,節省呼叫次數。果然效能好多了,Runtime: 52 ms, faster than 99.53% of Python3 online submissions forRegular Expression Matching.


Dynamic Programming

是說寫完地回後很高興要切下一題的時候,才後知後覺的想到,我剛剛應該是挑了DP的標籤要練習阿XDDD

但是,我真的對DP不太擅長的說,想破了頭還是毫無頭緒,只好去翻了下討論區的答案,解釋有點難懂還沒有啃透,晚點再回來補充好了…腦袋燒壞Orz

  1. P[i][j] = P[i - 1][j - 1], if p[j - 1] != '*' && (s[i - 1] == p[j - 1] || p[j - 1] == '.');
  2. P[i][j] = P[i][j - 2], if p[j - 1] == '*' and the pattern repeats for 0 times;
  3. P[i][j] = P[i - 1][j] && (s[i - 1] == p[j - 2] || p[j - 2] == '.'), if p[j - 1] == '*' and the pattern repeats for at least 1 times.



程式碼

class Solution:
	def isMatch(self, s, p):
		m = len(s)
		n = len(p)
		dp = [[False for i in range(n + 1)] for i in range(m + 1)]
		dp[0][0] = True
		for i in range(m+1):
			for j in range(1, n+1):
				if p[j-1] == '*':
					dp[i][j] = dp[i][j - 2] or (i > 0 and (s[i - 1] == p[j - 2] or p[j - 2] == '.') and dp[i - 1][j])
				else:
					dp[i][j] = i > 0 and dp[i - 1][j - 1] and (s[i - 1] == p[j - 1] or p[j - 1] == '.')
		return dp[m][n]

是說,我硬是將討論區的Code改成Python版本的,執行時間 60 ms,效能好像沒有比較好?


參考資料

  1. 9-lines 16ms C++ DP Solutions with Explanations



[LeetCode] 解題目錄 -> 這裡走

留言

這個網誌中的熱門文章

Genymotion 模擬器安裝篇:In ubuntu14.04 LET