Friday, June 19, 2009

A (Subtle?) Difference in Regular Expressions Between Java and Perl

I wrote a bunch of Perl code before Java finally got decent regular expressions. (I wrote lots of C++ before that but C++ didn’t have built in regular expressions either).

For some reason, on a number of occasions my Java regular expressions never worked right and I never fully realized why.

David has an interesting post pointing out how in Java regex’s – Carriage Return is not included in .* by default.

However, when I saw his example, I finally understood my confusion. In Perl, a rage means – “Does this pattern exist somewhere in my target string”?

So the following Perl code:

$str = "word in middle of line";
if ($str =~ /middle/) {
print "match"

will print “match”

You can force a regex in Perl to mean match from the beginning of the line by putting line markers into your string.

So the code:

$str = "word in middle of line";
if ($str =~ /^middle/) { print "match"}

won’t print out “match”

However, in Java, regex’s have to match against the whole string and you need .* on both ends if you want the Perl behavior. So the code:

String str = "word in middle of line";
System.out.println("First if");
if (str.matches("middle")) {
System.out.println("Second if");
if (str.matches(".*middle.*")) {

Will print out:

First if
Second if