NOTE: This page has been moved to http://www.socialtext.net/yaml/index.cgi?perl_state_of_the_yaml.
The state of YAML in the Perl world has been transitional (at best) and confusing for a long time. As the primary person behind YAML in the Perl world, I(ngy dot Net) will do my best to explain the current state of where things are and where I see things headed over the next 6 months or so.
The latest YAML specification can be found at http://yaml.org/spec/1.2/ and was last updated this summer. The spec is primarily maintained by Oren Ben-Kiki. Being a pedant, Oren never likes to release anything as final, so this is officially a working draft. It is however, 99.99% stable and complete. All software and specs have the possibility of having bugs, so I am of the opinion that the spec should probably be made "final".
There have been 3 major versions of the spec: 1.0, 1.1 and 1.2. The 1.0 spec was around 2003, 1.1 was around 2005, and 1.2 was around 2007. These are very rough dates, but it gives some context. 1.1 completely revamped the tagging system, and 1.2 made JSON a complete subset of YAML. Other than those things, not much has changed in YAML since 1.0, especially the parts most commonly used in Perl.
YAML is the only cross-programming-language serialization language in existence today. It caters specifically to languages like Perl (Python, Ruby, etc) that use a hash/array/scalar/object data model.
I attended 4 Perl conferences this year. One thing I noticed is that YAML showed up in almost every talk. Not that anybody ever talked about YAML, but it was always in their slides and programs. Everybody is using YAML these days. It is becoming ubiquitous.
In response to this, I was embarrassed that the YAML tool chain was not as full strength as it should be.
I also noted that YAML was primarily used for configuration files and for dumping objects. YAML is great for these things but, as a serialization language, it goes so much deeper. I know that people don't yet depend on the deeper features because the implementations are lacking at this time.
Next, I'll name off the YAML implementations, describe their strengths and weaknesses and what the longer term plans for them are.
YAML.pm is the original and probably most popular Perl YAML Loader/Dumper module. It was mostly written in 2002, before YAML 1.0 was even stable. It has not been kept up to date. It only handles the commonly used subset of YAML and YAML idioms.
When a pure Perl YAML implementation becomes stable, it will likely be named "YAML.pm" and the old YAML.pm implementation will go away. YAML::XS may also be "rolled into" the YAML distribution, or at least be loaded by YAML.pm if available. Neither of these things can happen until the YAML implementations become stable.
It is worth noting that YAML.pm has extra features that go beyond the basic Dump/Load API that all YAML mdules share. It has the ability to Dump keys in specific order and to apply certain styles to certain nodes, for example. These features will either be made standard or deprecated.
YAML::XS has the distro name YAML-LibYAML. It is a binding of Kirill Siminov's libyaml. libyaml is a bugfree and complete implementation of the YAML 1.2 spec as of Summer 2006. (YAML 1.2 has changed slightly since then.) YAML::XS provides a very fast Dump/Load YAML API.
YAML::XS is currently, by far, the best YAML implementation in Perl. Unless you need a pure Perl module, please always use this one. There are no known bugs in libyaml to date; it's a fantastic piece of software.
The YAML::XS binding is pretty good too. It currently has some ambiguity in the concept of Boolean values that needs to be dealt with.
The next big move for YAML::XS would be to expose a streaming API. Doing this with libyaml would be fairly trivial, and this could be very helpful for Perl YAML in general. Before I do this I want to produce a design document for the official full YAML API of Perl modules.
This module does not yet fully exist and that is likely not the final name of it. It represents the code that is being written for My TPF Grant Project .
This module needs to be finished by December 25th 2008. I promised the TPF that I would do that.
The result will be an extremely solid YAML implementation in pure Perl, with a really nice API. The only possible downside is that it could be quite slow. Hopefully YAML::XS can be used when speed is a critical issue.
YAML::Syck is a binding to libsyck which is the YAML library that ships with Ruby. Syck was the first YAML C library and was written roughly to the 1.1 spec. Like YAML::XS and YAML.pm it supports the Dump/Load API.
The bad news is that libsyck is old, out of date, not well maintained and buggy. The good news is that YAML::Syck is maintained by Audrey Tang, and she actually makes changes to libsyck itself, when required.
Hopefully YAML::Syck becomes deprecated in Perl and fully replaced by YAML::XS. There is no advantage to using YAML::Syck. I have heard that there are legacy problems in switching. Hopefully they can be dealt with.
YAML::Tiny is Adam Kennedy's module to provide a Dump/Load API for a subset of YAML. Specifically the subset that people typically use in config files and metadata files.
I have no current plans to change YAML::Tiny. Only to run it against the common test suite and make sure there are no blatent bugs.
Once YAML::Perl::Parser is done and passing all tests, I will start this module, which will be functionally the same but much faster. Also it will comprise a full YAML grammar of YAML expressed as an oject of regexps. It can be easily shown in YAML!
This module written by Ricardo Signes provides a schema mechanism for validating JSON and YAML documents.
I need to evaluate it a bit more, but I would like to make it be full fledged member of the YAML Perl toolchain.
There are things that need to happen in YAML. I've decided that this is probably the right time for me to concentrate on the Perl YAML toolchain. This section outlines my plans.
The plans may seem ambitious, but if I'm going to pour my effort into something, I'd like to do it right. Honestly, I think this is just the basics that needs to get done.
I want to make sure that the entire YAML development process is open and transparent. I want to encourage the YAML community to contribute as much as possible. These are the things I want to do to facilitate that:
All YAML modules support the Dump/Load API, but there is much more that can and should be available in a good YAML module. This design document will be a guideline for someone (like me) is is going to produce a YAML module. It will fully take into account the PyYaml APIs which are excellent.
I want to create a test suite that defines what a valid Perl implementation should do. This test suite will be made to run against all the current implementations.
This should provide a crystal clear of where the deficiencies lie and what needs to be fixed.
I need to complete the TPF grant deliverables by Christmas. This will be my coding priority for the rest of the year.
YAML is at a crossroads. The spec is stabilized and adoption is fairly widespread for shallow use cases. However, the implementations are lacking. If the toolchain can be made solid, YAML will be a huge asset to programmers for years to come.
Now is a great time to solidify the infrastructure. It will be a great service to both YAML and Perl.