Tuesday, April 03, 2007

A simple regular language recognizer

There are some programs that you just have to write in order to have the feeling that you know a bit of the programming language you are learning. For me such a program is a regular expression recognizer.

Both strings and regular expressions are represented by lists of Lisp symbols (a regexp description may be also a nested list)
Let me introduce my notation of regexp-s by example:
  • (* a b c d) is the same as (a b c d)* in the usual notation
  • (+ a b c d) is (a b c d)+
  • (or a b c d) is (a|b|c|d)


(defun recognize (regexp string)
"T if string matches to regexp, nil otherwise."
(let ((end (list (gensym))))
(if (null regexp)
;; the language {\sigma} must be a special case because nil is
;; the value returned by reg-eval on failure
(null string)
(equal end (reg-eval regexp (append string end))))))

(defun reg-eval (regexp string)
"If string has a prefix matching regexp, returns the part that doesn't match (nil otherwise)."
(when string
(cond ((null regexp)
string)
((not (listp (car regexp)))
(when (equal (car string) (car regexp))
(reg-eval (cdr regexp) (cdr string))))
(t (let ((prefix (caar regexp))
(body (cdar regexp)))
(case prefix
((*)
(reg-eval (cdr regexp) (let ((match (reg-eval body string)))
(if (not match)
string
(reg-eval (list (car regexp)) match)))))
((+)
(reg-eval (nconc (list (cons 'subexp body) (cons '* body)) (cdr regexp)) string))
((or)
(some #'(lambda (x) (reg-eval x string)) body))
((subexp)
(reg-eval (cdr regexp) (reg-eval body string)))))))))


(defun test ()
(let ((cases '((recognize '(a b c d) '(a b c d))
(recognize '((* a b c)) '(a b c a b c))
(recognize '((or ((+ 1) 2) (2 1))) '(1 1 1 2))
(recognize '((or (1 2) (2 1))) '(1 2)))))
(dolist (cas cases)
(format t "~&~s => ~s." cas (eval cas)))))