Monday, March 19, 2012

Parsing XML with NSXMLParser

A while ago I helped a friend out with implementing his XML parser. When checking the net, I found a lot of people asking the same question: how to use the NSXMLParser? It is not that hard actually, so in this blog post I will show you how it works so that you can use it to great effect.

In general, there are two kinds of XML parsers; the SAX parsers and the DOM parsers. A SAX parser reads the XML document as a stream and hands you bits and pieces of information about the document as it encounters them. A DOM parser reads the entire document into memory and builds a tree structure that you may query. Although conceptually a DOM parser seems more convenient, in practice the SAX parser is much easier to work with. If you are proficient in XQuery you might want to use DOM anyway. Cocoa provides a DOM parser class named NSXMLDocument. Honestly, I could not get NSXMLDocument to work in 30 minutes and therefore I say it is not easy to use. Lucky for us (poor mortals) Cocoa also provides a SAX parser class named NSXMLParser. Let's see how to use it.

The NSXMLParser class is a piece of machinery that parses XML. This machine generates events when you feed it an XML document. The events are rather simple like "I've encountered an XML element" and "I'm done with this element". The parser generates these events in its delegate. What is a delegate? (I always forget). It's a class that you have to write, and it should implement a certain protocol to be able to work with the machinery. What's cool about this is the fact that you can have your custom class implement the NSXMLParserDelegate protocol and then you can make NSXMLParser load XML data into your custom class.

Rather than giving the same old cheesy address book and/or music albums example, I'm going to give this generic bit of code that will make you realize what happens when NSXMLParser processes a document.
@interface MyData : NSObject <NSXMLParserDelegate> {
 // add any data members that you need here
}

-(BOOL)parseDocumentWithURL:(NSURL *)url;

@end

@implementation MyData

-(BOOL)parseDocumentWithURL:(NSURL *)url {
    if (url == nil)
        return NO;
   
    // this is the parsing machine
    NSXMLParser *xmlparser = [[NSXMLParser alloc] initWithContentsOfURL:url];
   
    // this class will handle the events
    [xmlparser setDelegate:self];
    [xmlparser setShouldResolveExternalEntities:NO];

    // now parse the document
    BOOL ok = [xmlparser parse];
    if (ok == NO)
        NSLog(@"error");
    else
        NSLog(@"OK");
   
    [xmlparser release];
    return ok;
}

-(void)parserDidStartDocument:(NSXMLParser *)parser {
    NSLog(@"didStartDocument");
}

-(void)parserDidEndDocument:(NSXMLParser *)parser {
    NSLog(@"didEndDocument");
}

-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
    NSLog(@"didStartElement: %@", elementName);
   
    if (namespaceURI != nil)
        NSLog(@"namespace: %@", namespaceURI);
   
    if (qName != nil)
        NSLog(@"qualifiedName: %@", qName);
   
    // print all attributes for this element
    NSEnumerator *attribs = [attributeDict keyEnumerator];
    NSString *key, *value;
   
    while((key = [attribs nextObject]) != nil) {
        value = [attributeDict objectForKey:key];
        NSLog(@"  attribute: %@ = %@", key, value);
    }

    // add code here to load any data members
    // that your custom class might have

}

-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
    NSLog(@"didEndElement: %@", elementName);
}

// error handling
-(void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
    NSLog(@"XMLParser error: %@", [parseError localizedDescription]);
}

-(void)parser:(NSXMLParser *)parser validationErrorOccurred:(NSError *)validationError {
    NSLog(@"XMLParser error: %@", [validationError localizedDescription]);
}

@end
Run this code on an XML document and see the light! What it does, is just print the events as they occur. You can use this code as a starting point for building your custom XML loader. As always, it is a good idea to start out simple and build it from there.
In general, you can get away with just using didStartElement. The namespace and qualifiedName are typically nil if the document does not specify "xmlns:" namespaces. If you get parser errors, it is probably because there are non-ASCII characters in the document. By default, XML uses plain ASCII, and it is picky about document errors.

By the way, Apple's programming guide on NSXMLParser is good, albeit a bit lengthy. Personally, I don't like XML at all but its use is so widespread that you will run into it sooner or later. I like JSON much better, or even a raw binary format. A raw binary data file typically loads in a split second, while parsing XML takes considerable time when the document is large.