Richard Searle's Blog

Thoughts about software

Performance surprising with Scala regexp parser combinators

Posted by eggsearle on August 21, 2012

The various examples generally have the form

def identifier  = """[_\p{L}][_\p{L}\p{Nd}]*""".r

This works fine, but the underlying Java Pattern is recompiled on every reference.

The behavior came to light during an upgrade from Java 6 and Scala 2.7.7 to Java 7 and Scala 2.9.2, when a ~ 10% performance degradation was noted.
Performance analysis indicated an unexpected large number of calls to Pattern.compile.
The Java 7 implementation is evidently somewhat slower.

Changing the def to val resolves the problem, without impact to the semantics.
In this case, the improvement was greater than 30% which  more than compensates for the degradation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: