Elements vs. attributes

Saturday 18 December 2004This is almost 20 years old. Be careful.

The classic XML design question came up the other day: whether to use elements or attributes. During the discussion I became somewhat heated, for a few reasons:

First, we weren’t debating whether to create a design using elements or attributes, we were talking about changing an existing design using attributes to a new one using elements. To my mind, the reasons for switching had better be pretty good to change an existing system.

Second, I had designed the system in question, and I thought the attribute decision was a sound one. They were all simple datatypes, and were order-less, and could appear only once. In this case, attributes are perfectly reasonable, and mean that you can avoid the overhead of end tags.

Third, I sensed ill-reason, or worse, dogma, approaching.

The X12 Reference Model for XML Design was offered as guidance for the decision. This is a long and detailed document that describes many things I don’t understand. Like many documents of its ilk, it generalizes concepts and terms to the point that I no longer know what they refer to.

But section 7.2.5 (Elements vs. Attributes) applied, and for the most part consists of a clear explanation of the pros and cons of elements and attributes. Here it is (used without permission, mea culpa):

7.2.5 Elements vs. Attributes

Description: Often it is possible to model a data item as a child element or an attribute.

Benefits of Using Elements

  • They are more extensible because attributes can later be added to them without affecting a processing application.
  • They can contain other elements. For example, if you want to express a textual description using XHTML tags, this is not possible if description is an attribute.
  • They can be repeated. An element may only appear once now, but later you may wish to extend it to appear multiple times.
  • You have more control over the rules of their appearance. For example, you can say that a product can either have a number or a productCode child. This is not possible for attributes.
  • Their order is significant if specified as part of a sequence, while the order of attributes is not. Obviously, this is only an advantage if you care about the order.
  • When the values are lengthy, elements tend to be more readable than attributes.

Disadvantages of Using Elements

  • Elements require start and end tags, so are therefore more verbose. (NOTE: not all elements require a start and end tag — elements can be declared in a single line.)

Benefits of Using Attributes

  • They are less verbose.
  • Attributes can be added to the instance by specifying default values. Elements cannot (they must appear to receive a default value).
  • Attributes are atomic and cannot be extended and its existence should serve to remove any and all possible ambiguity of the element it describes. They are “adjectives” to the element “noun”.

Disadvantages of Using Attributes

  • Attributes may not be extended by adding children, whereas a complex element may be extended by adding additional child elements or attributes.
  • If attributes are to be used in addition to elements for conveying business data, rules are required for specifying when a specific data item shall be an element or an attribute.

All is well and good. These are the pros and cons based on the XML semantics of elements and attributes. But then it continues with this recommendation:

Recommendation: Use elements for data that will be produced or consumed by a business application, and attributes for metadata.

What?! How does that relate to all the pros and cons? This recommendation is a commonly-repeated mantra about elements and attributes, and is nearly meaningless. What if I have “metadata” that has order significance or needs to be repeated, or is itself structured? And what do they mean by “metadata” anyway? One man’s data is another man’s metadata. It’s impossible to separate the two without specifying the audience for the information.

In the HTML world, there’s a handy rule of thumb: element content gets put on the screen, and attributes do not. That’s basically the metadata rule, but it only works in this case because HTML has a very clear consumer (a browser) with a very clear processing model (render the HTML for display). Most other XML dialects don’t have such clear processing models. The particular case I’m dealing with is data served by an API, with dozens of potential consumers, doing a dozen different things with the data. The metadata rule is useless to me.

I say: Use attributes unless you truly need elements. You need elements for a thing if the thing can be repeated, or is itself structured, or has semantics based on its order among its peers.

Comments

[gravatar]
Completely agree. I think those are basically the rules I applied when I was designing XML schemas. But this became a heated debate? Did someone mention the New York Stock Exchange or something? ;-)
[gravatar]
>You need elements for a thing if the thing can be repeated, or is itself structured, or has semantics based on its order among its peers.

This would be a much more compelling entry if some small "pro and con" XML examples were shown side-by-side with your logic. Don't tell us, show us.
[gravatar]
If you need a real world sample XML schema to use for illustration, I am currently working on a single-file drop in 404 handler/rewriter that has some configuration:

<Custom404>
 <Debug>False</Debug>
 <UseFirstMatch>True</UseFirstMatch>
 <RewriteRules>
  <Rule>
   <Name>HTML to HTM</Name>
   <Pattern>(?<url>.*)\.html$</Pattern>
   <Replace>${url}.htm</Replace>
  </Rule>
  <Rule>
   <Name>ASP to ASPX</Name>
   <Pattern>(?<url>.*)\.asp$</Pattern>
   <Replace>${url}.aspx</Replace>
  </Rule>
  <Rule>
   <Name>Transfer Test</Name>
   <Pattern>.*transferme$</Pattern>
   <Replace>webform1.aspx</Replace>
   <RedirectType>Transfer</RedirectType>
  </Rule>
 </RewriteRules>
</Custom404>

I think you can see that I chose the elements approach, for absolutely no good reason other than I felt it was simpler to write code to parse and simpler for humans to read (I do prefer vertical layouts to horizontal, so there's less line wrapping and scrolling etc). Some of these things could clearly be attributes...
[gravatar]
I can understand the logic of choosing elements for everything: it keeps thing very uniform, and allows for maximum flexibility in the future.

And I espouse the logic in the post: use attributes or elements based on whether the semantics of your data more closely matches the semantics of elements or attributes.

It's the squishy belief that data vs. metadata distinctions will somehow decide the matter for you that gets my hackles up.
[gravatar]
You say: Use attributes unless you truly need elements.

The problem with this rule is that you don't necessarily know, at the time you make the decision, whether you need elements. I do a lot of incremental development where the final needs are not clearly known up front. Data that seems atomic may grow requirements for attributes, structure, repetition, etc. later on. So I tend toward the rule, "Use elements unless you are really sure that you will never need them. Be cautious and conservative in using attributes." Of course that is a judgement call still...
[gravatar]
Data versus metadata is a difficult distinction to make in most data-oriented uses of XML. However, in markup-oriented applications where you are annotating text, the distinction is much clearer.

See my post http://jtauber.com/blog/2004/12/21/xml_elements_versus_attributes
[gravatar]
annotating text! how quaint! ;)
[gravatar]
http://www-106.ibm.com/developerworks/xml/library/x-eleatt.html
[gravatar]
One more problem with attributes: their values are normalized, so that line ends become spaces. If you need multi-line text, definitely don't use attributes.
[gravatar]
Very interesting post :) Thank you.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.