XML vs YAML vs JSON
Assuming I'm starting a project from scratch, which is not dependent on any other project. I would like to use a format to store feeds, something like XML, since XML is not the only available format of its kind, I would like to know: why should I choose one over the rest?
I will be using perl.
'Feed' is a description of a product (name, price, type, short description, up to 120 words).
We can't really answer that without knowing a lot more. Just because you're not currently dependent on any other projects, are you likely to interact with them at some point in the future? If so, what technologies do they prefer? At the BBC, we've had some projects "JSON-only", only to find out that Java developers who wanted to access our API were begging us to provide a simple XML API simply because they have so many tool built around XML. They didn't even care about namespaces, attributes, or anything else; they just wanted those angle-brackets.
As for "storing feeds", I also not sure what you mean there. You explain the data in the feed, but what are you then going to do with those feeds? Parse them? Cache and reserve them? Write them out to cuneiform tablets? :)
I sounds like what you actually want is a database and you want to persist the data there and later make it serialisable as JSON/YAML/XML or whatever your desired format is. What I'd recommend is to be able to pull the data out into a Perl data structure and then have "formatters" which know how to serialise that data structure to the desired output. That way you can serialise to, say, JSON, and later if that's not good enough, easily switch to YAML or something else. In fact, if others need your data (one-way data tends not to be useful), they can ask for JSON, YAML, XML or whatever. You have more flexibility and aren't tied into a decision that you made up front.
That being said, I don't know your system, so it's tough to say what the right thing to do is. Also, not that JSON and YAML aren't exactly interchangeable with XML. Subtle differences can and will trip you up.
Each will do the job.
XML has the advantage that more languages bundle the relevant libraries, and is useful for the storage you mention. So, it is valuable for passing around through different systems, both "in-motion" and "at-rest".
YAML has libraries for all languages, but is somewhat less commonly used, so you are more likely to have to find and introduce a library.
I think XML has been thoroughly explained by the others. However, YAML and JSON are both elegant languages, and they are not as similar as you might believe at first glance.
Some of the particularities about YAML
- person: &id002 name: James age: 5.0 - person: *id001
The second person is an associative array equal to the first.
Casting data types
foobar: !!str 123
foobar is "123" (type string).
Uncommon data types not supported by every implementation
Particularly interesting ones [...] are sets, ordered maps, timestamps, and hexadecimal.
Therefore, I consider JSON a lot simpler.
An argument for JSON
Readable, even if the whitespace is optional
I think JSON is very readable once prettified, which is very easy to do. YAML is difficult to make compact, since it relies on the whitespace. Granted, you should rely on compression for saving bandwidth. The references in YAML might save you a few bytes, but they add a lot of complexity. If you are really dealing with amounts of data that makes it important to avoid duplication, I'd suggest solving that problem on a whole other level. Not even XML supports these kind of macros.
Choose XML if you need to interoperate with systems you don't control (XML Schema is invaluable here), if you will be transforming the data extensively into text, HTML, or XML (haters notwithstanding, XSLT is peerless), if your data includes a lot of text markup, if your data needs to be human-editable (haters notwithstanding, editable XML that's validated against a schema is a pretty good tool for a lot of jobs), and/or if you need to interoperate any of the myriad of tools and technologies that work with XML.
Choose JSON if you really can't be bothered with any of the above.
Choose YAML if you're working in an environment that's got a lot of YAML support.
Depends on your needs. For small, lightweight apps I personally think XML is overkill: http://www.codinghorror.com/blog/2008/05/xml-the-angle-bracket-tax.html
If the data's not hierarchical or going to have data interspersed in e.g., the description This product is great for <targetDemo/> who love it's <featureSet/>), you may want to consider Comma Separated Values (CSV) or some other format like tab separated.
It's old school but it gets the job done without weighing your file down with a bunch of describing text. I.e., in XML, you'd have the following non-value data for each feed.
<feed name="" price="" type="" description=""/>
...contrasted with CSV:
"", , "", ""
If you want, you can add header row at the top for documentation purposes.
There's also plenty of tooling around CSV, from command line utilities like awk to GUIs such as Excel.
Another alternative, if you don't really need the data to be editable via a text editor but don't want to deploy a more robust database service, would be SQLite which allows you to perform RDBMS-style CRUD operations on a flat binary file.
In the absence of interoperability concerns, i don't think there's much in it. There are good libraries for all of them in all languages; some of them are built-in, some aren't. Yur interface to those libraries will be narrow - just in data-access code - so if one has a painful API, even that doesn't matter much.
JSON is, for me, the most pleasant to edit by hand, which is a small plus.
YAML can handle non-tree data structures using the &/* notation. Neither XML nor JSON have a built-in way to do that. Your use doesn't need it, though.
I think xml is for big data and json is for small and not too complex data that do not need multiple dimension of array. I might be wrong. ^^ And i only see yaml in google app engine. Which appear to me , it is quite suitable for storing preferences and data of an application.