iOS - XML to NSString conversion

I'm using NSXMLParser for parsing XML to my app and having a problem with the encoding type. For example, here is one of the feeds coming in. It looks similar to this"

\U2026Some random text from the xml feed\U2026

I am currently using the encoding type:

NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];

Which encoding type am I suppose to use for converting \U2026 into a ellipse (...) ??

Answers


The answer here is you're screwed. They are using a non-standard encoding for XML, but what if they really want the literal \U2026? Let's say you add a decoder to handle all \UXXXX and \uXXXX encodings. What happens when another feed want the data to be the literal \U2026?

You're first choice and best bet is to get this feed fixed. If they need to encode data, they need to use proper HTML entities or numeric references.

As a fallback, I would isolate the decoder away from the XML parser. Don't create a non-conforming XML parser just because your getting non-conforming data. Have a post processor that would only be run on the offending feed.


If you must have a decoder, then there is more bad news. There is no built in decoder, you will need to find a category online or write one up yourself.


After some poking around, I think Using Objective C/Cocoa to unescape unicode characters, ie \u1234 may work for you.


Alright, heres a snippet of code that should work for any unicode code-point:

NSString *stringByUnescapingUnicodeSymbols(NSString *input)
{
    NSMutableString *output = [NSMutableString stringWithCapacity:[input length]];

    // get the UTF8 string for this string...
    const char *UTF8Str = [input UTF8String];

    while (*UTF8Str) {
        if (*UTF8Str == '\\' && tolower(*(UTF8Str + 1)) == 'u')
        {
            // skip the next 2 chars '\' and 'u'
            UTF8Str += 2;

            // make sure we only read 4 chars
            char tmp[5] = { UTF8Str[0], UTF8Str[1], UTF8Str[2], UTF8Str[3], 0 };
            long unicode = strtol(tmp, NULL, 16); // remember that Unicode is base 16

            [output appendFormat:@"%C", unicode];

            // move on with the string (making sure we dont miss the end of the string
            for (int i = 0; i < 4; i++) {
                if (*UTF8Str == 0)
                    break;
                UTF8Str++;
            }
        }
        else 
        {
            if (*UTF8Str == 0)
                break;

            [output appendFormat:@"%c", *UTF8Str];
        }


        UTF8Str++;
    }

    return output;
}

You should simple replace literal '\U2026' on a quotation, then encode it with NSUTF8StringEncoding encodind to NSData


Need Your Help

Removing a viewcontroller by selecting a button

iphone xcode ios5 uiview uiviewcontroller

I've created an app that contains 4 viewcontroller and its .h,.m files...In my firstviewcontroller a button is pressed it goes to secondviewcontroller and in second viewcontroller has two buttons a...

How is an NSNumber represented on disk?

objective-c

Not sure why Objective-C decided to use NSNumber instead of float, double, etc. How is this type represented on disk?