Friday, June 19, 2009

A (Subtle?) Difference in Regular Expressions Between Java and Perl

I wrote a bunch of Perl code before Java finally got decent regular expressions. (I wrote lots of C++ before that but C++ didn’t have built in regular expressions either).

For some reason, on a number of occasions my Java regular expressions never worked right and I never fully realized why.

David has an interesting post pointing out how in Java regex’s – Carriage Return is not included in .* by default.

However, when I saw his example, I finally understood my confusion. In Perl, a rage means – “Does this pattern exist somewhere in my target string”?

So the following Perl code:

$str = "word in middle of line";
if ($str =~ /middle/) {
print "match"
}



will print “match”



You can force a regex in Perl to mean match from the beginning of the line by putting line markers into your string.



So the code:




$str = "word in middle of line";
if ($str =~ /^middle/) { print "match"}




won’t print out “match”


However, in Java, regex’s have to match against the whole string and you need .* on both ends if you want the Perl behavior. So the code:


String str = "word in middle of line";
System.out.println("First if");
if (str.matches("middle")) {
System.out.println("match");
}
System.out.println("Second if");
if (str.matches(".*middle.*")) {
System.out.println("match");
}



Will print out:


First if
Second if
match