Richard Searle's Blog

Thoughts about software

Archive for August, 2010

Scala XMLLoader as ContentHandler

Posted by eggsearle on August 29, 2010

Scala makes it very easy to handle XML but does not provide any means to interact with the standard Java XML tooling. XML can only be created using literals or parsing the string representation.

The internal implementation of XML.load uses a ContentHandler. Exposing that ContentHandler would allow Scala XML to be created from a SAX pipeline, which is half of what is required for full integration

The following class provides the necessary implementation

import scala.xml.factory.XMLLoader
import scala.xml._
import org.xml.sax._
import org.xml.sax.helpers.DefaultHandler

class Loader extends DefaultHandler  with   XMLLoader[Elem]{
   val newAdapter = adapter
   def value = newAdapter.rootElem.asInstanceOf[Elem]

   override def characters( ch:Array[Char],start:Int,length:Int) {
   override def endDocument() {
   override def endElement(uri:String,localName:String, qName:String){
   override def processingInstruction(target:String,  data:String){
   override def startDocument(){
     newAdapter.scopeStack push TopScope
   override def startElement(uri:String,localName:String, qName:String,atts:Attributes){

Illustrated by

import javax.xml.parsers._
import javax.xml.transform._
import javax.xml.transform.sax.SAXResult

val transformerFactory = TransformerFactory.newInstance
val xformer = transformerFactory.newTransformer
val xl = new Loader
xformer.transform(new StreamSource(new StringBufferInputStream("""<X><y/>fdgfd</X>""")), new SAXResult(xl))

Posted in Scala | Leave a Comment »

Using akka with SBT and Scala 2.8.0

Posted by eggsearle on August 22, 2010

akka describes how to use SBT but neither it nor the SBT documentation indicates how to specify the Scala version to get 2.8.0. Some digging into the repo indicates the correct vallue is: 2.8.0-SNAPSHOT

Hopefully all the Scala infrastructure will soon catch up to 2.8 and eliminate these missteps.

Posted in Akka, Scala | 2 Comments »

Notes document is primitive compared to CouchDB JSON document

Posted by eggsearle on August 2, 2010

A Notes document is limited to name/value pairs of primitives (number,string,date) and lists thereof.

Notes applications often required more complex structures that could be represented by these limited types. Those structures were encoded into strings (and strings thereof), traditionally using the | as the separator. Much of the application code then consisted of logic to pack and unpack these representations.

JSON allows the CouchDB document to contain an arbitrary object tree. A wide variety of tools are available for parsing and data binding, further reducing the development effort.

Posted in CouchDB | Leave a Comment »

The invoice address quandary

Posted by eggsearle on August 1, 2010

The attendees at the Dallas TechFest presentation on CouchDB were concerned with the apparently lack of normalization in capturing the address as part of an invoice document. Their primary concern was how to change that address when it changes.

That concern appears to reflect an inappropriate approach to data modelling and the type of historical data required for audits and SOX compliance.

The address on the invoice reflects the value that applied at the time the invoice was generated. That address is part of a primary key that identifies the legal entity that was responsible for the invoice. That entity might cease to exist, move, be merged, acquired or divested.  Even the move has non trivial legal consequences since it might impact taxes, export regulations, etc. Maintaining that historical value can be critical.

Ideally the invoice would not indirectly reference the affected legal entity by name+address, but rather by some unique identifier. Some countries define unique identifiers as part of the company registration process, but these (AFAIK) apply to the organization and not to individual locations. A WalMart can impose its own identification scheme on its suppliers. There are organizations that provide identifiers (such as Dun & Bradstreet) but those are limited in scope and potentially costly to acquire and use.

Organizations are thus forced to maintain their own directory of entities (parties) and perform a mapping from name+address to their own identifier. That leaves open the possibility of error, which cannot be resolved if the source name+address was not captured as part of the invoice. This is one reason for the popularity of document scanning and off-site archival systems, so that information can be retrieved if necessary.The cost of storing the textual representation is very much smaller than that of the scanned image or the storage and retrieval of the original paper document.

Auditing concerns would prohibit the modification of the master document.

The perceived CouchDB concern is thus a feature rather than a defect.

Posted in CouchDB | Leave a Comment »

Hello world!

Posted by eggsearle on August 1, 2010

I have moved from jroller to WordPress simply because this is the only blogging platform that supports syntax highlighting out of the box.

Posted in Uncategorized | Leave a Comment »