###Ruby, 641 characters###
H=Hash.new{|h,k|[]}
D=[[0,0,[]]]
O=['(']
L=->{e2=D.pop;e1=D.pop
O.pop=='.'&&(H[e1[1]]|=[e2[0]])||(H[e1[0]]|=[e2[0]];H[e1[1]]|=[e2[1]])
D.push [e1[0],e2[1],[(e1[2]+e2[2])*"||"]]}
i=0;gets.chop.chars.map{|c|c=='*'&&(e=D.pop;H[e[0]]|=[e[1]];H[e[1]]|=[e[0]];D.push e;next)
L[]while/[|)]/=~c ?O[-1]!='(':O[-1]=='.'
/[|(]/=~c&&(O.push c;D.push [i+=1,i,[]])||c==')'&&(O.pop;O.push'.')||(D.push [i,i+1,["s==#{i}&&c=='#{c}'&&#{i+=1}"]];O.push'.')}
L[]while O.size>1
H.map{H.map{|k,v|v.map{|v|H[k]|=H[v]}}}
d=D[0]
$><<"s=[#{d[0]}];#{l=H.map{|k,v|"s&[#{k}]!=[]&&s|=#{v}"}*";"};gets.chop.chars.map{|c|s=s.map{|s|#{d[2][0]}}-[!0];#{l}};p s&[#{d[1]}]!=[]"
This ruby version became quite long because of several corner cases in the regex parser (maybe I should try a different approach). It expects the regular expression on STDIN and outputs the corresponding ruby code for the matcher to STDOUT.
The program directly generates code for a NFA-ε which is then executed in the matcher.
Test case 1: (output includes additional newlines and indentation)
>>>
s=[0];
;
gets.chop.chars.map{|c|
s=s.map{|s|}-[!0];
};
p s&[0]!=[]
Test case 2:
>>> (b|)(ab)*(a|)
s=[0];
s&[1]!=[]&&s|=[1, 3, 4, 6, 7, 9];
s&[2]!=[]&&s|=[3, 4, 6, 7, 9];
s&[0]!=[]&&s|=[1, 3, 4, 6, 7, 9];
s&[4]!=[]&&s|=[4, 6, 7, 9];
s&[5]!=[]&&s|=[5];
s&[6]!=[]&&s|=[4, 7, 6, 9];
s&[3]!=[]&&s|=[4, 6, 7, 9];
s&[7]!=[]&&s|=[7, 9];
s&[8]!=[]&&s|=[9];
gets.chop.chars.map{|c|
s=s.map{|s|s==1&&c=='b'&&2||s==4&&c=='a'&&5||s==5&&c=='b'&&6||s==7&&c=='a'&&8}-[!0];
s&[1]!=[]&&s|=[1, 3, 4, 6, 7, 9];
s&[2]!=[]&&s|=[3, 4, 6, 7, 9];
s&[0]!=[]&&s|=[1, 3, 4, 6, 7, 9];
s&[4]!=[]&&s|=[4, 6, 7, 9];
s&[5]!=[]&&s|=[5];
s&[6]!=[]&&s|=[4, 7, 6, 9];
s&[3]!=[]&&s|=[4, 6, 7, 9];
s&[7]!=[]&&s|=[7, 9];
s&[8]!=[]&&s|=[9]
};
p s&[9]!=[]
Another example:
>>> a|bc
s=[0];
s&[0]!=[]&&s|=[0, 2];s&[2]!=[]&&s|=[2];s&[3]!=[]&&s|=[3];s&[1]!=[]&&s|=[4];
gets.chop.chars.map{|c|
s=s.map{|s|s==0&&c=='a'&&1||s==2&&c=='b'&&3||s==3&&c=='c'&&4}-[!0];
s&[0]!=[]&&s|=[0, 2];s&[2]!=[]&&s|=[2];s&[3]!=[]&&s|=[3];s&[1]!=[]&&s|=[4]
};
p s&[4]!=[]