value-class-pattern-brainstorming

From Microformats Wiki
Jump to navigation Jump to search

value excerption pattern brainstorming

The value-excerption-pattern is derived from value-excerpting in hCard. The precise parsing behavior is not yet finalized, so the pattern should be used only with extreme caution.

This brainstorming page is for exploring ideas related to specifying the value-excerption-pattern in more detail and ideas for special case handling of the value-excerption-pattern in combination with specific semantic HTML elements per those elements' particular semantics.

These are merely explorations for now, and should NOT be used in actual content publishing, nor implemented in any production code.

Editor
Tantek Çelik


details for handling specific elements

object param handling

2008-08-23 Ben Ward and Tantek Çelik brainstormed the following possible special case markup handling for the use of the value-excerption-pattern with the <object> element. Modified 2008-08-26.

The following markup example documents one way the hCard tel property's type subproperty could be specified with the enumerated value of "cell" while providing the UK English "mobile" as the human visible object text contents:

<object class="type" lang="en-GB">
 <param name="value" value="cell" />
 mobile
</object>

summary

  • object element special case handling of value excerption. When a microformat (sub)property class name is specified on an object element, then value excerption handling is modified as follows:
  • first param with name attribute value. if the first child of the object is a param element, and that param element has name attribute value of "value", then use the value attribute value for the value for the microformat (sub)property class name specified on the object.
  • continue. if not, continue with existing value excerption handling, and microformat (sub)property parsing rules as currently best specified by hcard-parsing.

notes

Note that the param element does not have a 'class' attribute and thus its 'name' attribute (which has a compatible semantic) is used instead to invoke the value excerption pattern.

advantages
  • Greater semantic re-use. The use of the param element to specify a value for its object is in line with the param element's semantics. The semantic association between the object and the param element is defined in the HTML4 specification.
  • Less invention. This use of object param is superior to the use of a nested empty span element. The association of an empty span with its parent is a new semantic not previously defined in the HTML4 specification. Thus this use of object param markup better follows the principle of minimum invention as compared to nested empty span markup.
neutral
  • Similar violation of DRY to nested empty span.
disadvantages
  • Less human visible than abbr DRY violation. The contents/values of param elements are not exposed to the user of a browser, unlike the title attribute of abbr which, since it is commonly available as a hover tooltip, is more human visible, thus verifiable, than param.
  • DRY violation content divergence risk greater than abbr. With abbr, one element is used to express both a human visible string and the property value, thus tying these values closer together (thus reducing risk of divergence). With object param, two elements are used, and thus risk of divergence may be greater than the use of abbr. Possible mitigating techniques that would help keep the property value and the equivalent human visible string closer to each other, perhaps as close in the code as they are when using abbr:
    1. require param be first child of object
    2. require use of only one param child (allow other child elements)
    3. require exclusive use of object for value excerption i.e. no using the same object for an actual replaced object and a value excerption
    4. require "value" attribute be the last attribute specified on the param element
    5. require equivalent human visible text be placed immediately (allowing for whitespace) following the param
criticisms

to do

  • Browser testing. This code sample must be tested in various browsers to determine how they process and handle pages with such code
    1. determine which browsers to test (based on popularity, deployment, etc.)
    2. write a full sample test case using the above object param markup pattern and a complete hCard
    3. write a more complex sample test case with multiple uses of the object param markup pattern
    4. test do browsers properly display the UK English text "mobile"?
    5. test do browsers generate multiple browser (e.g. Webkit, Trident etc.) controls as they would for embedded frames (nested HTML objects)?
    6. determine any other tests
  • Parser implementability. Determine approximately how much work it would be to implement this special case object param support.
    • I've just added support for this. It took 19 bytes of code. TobyInk 01:24, 25 Aug 2008 (PDT)
  • Document in more detail. Assuming browser tests of a simple example pass (proper visible text displayed, page efficiency not compromised by additional control creation), document how to handle/parse this pattern in more detail. Iterate.

Browser Testing

Using the following simple, HTML4 hcard:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

<title>&lt;object> value excerption pattern: hCard Telephone Type Test Case</title>

<body class="vcard">
    <h1 class="fn"><a class="url" href="http://ben-ward.co.uk">Ben Ward</a></h1>
    <p class="tel">
        <object class="type">
            <param name="value" value="cell">
            Mobile:
        </object>
        <span class="value">415-123-567</span>
    </p>
</body>
Results

A pass is to display a heading level one ‘Ben Ward’ with hyperlink, followed by a paragraph displaying the text ‘Mobile: 415-123-567’ Browsers selected based on YUI Graded Browser Support (August 2008), plus some others.

  • Opera 9.5 - Pass
  • Firefox 2, 3 - Pass
  • Microsoft Internet Explorer 5.2 (Mac) - Pass
  • Microsoft Internet Explorer 6 - Partial Pass†
  • Microsoft Internet Explorer 7 - Partial Pass†
  • Microsoft Internet Explorer 8 (beta) - Partial Pass†
  • Safari 3 - Pass
  • Safari 2 - *Fail* ††
  • † Internet Explorer 6–8 on Windows XP renders the correct text, but triggers an ActiveX security warning bar on the page load.
    • This is an error on behalf of IE/Windows. As the object has no type nor data attributes, it has nothing that would bind it to a specific ActiveX control, and therefore should not trigger a security warning bar. This bug should be reported, and the respective bug number referenced here.
  • †† Safari 2 renders a default-sized white box (as if embedding an external control). It breaks layout and does not display the desired content.
Safari 2 Tweak

The example is tweaked as follows to affect Safari 2 rendering:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

<title>&lt;object> value excerption pattern: hCard Telephone Type Test Case</title>

<body class="vcard">
    <h1 class="fn"><a class="url" href="http://ben-ward.co.uk">Ben Ward</a></h1>
    <p class="tel">
        <object data="data://" class="type">
            <param name="value" value="cell">
            Mobile:
        </object>
        <span class="value">415-123-567</span>
    </p>
</body>

A data="data://" URL attribute is added to the object element.

Safari 2 Result
  • Safari 2 - Partial Pass†

† Safari 2 renders the object correctly on first page load, *however*, upon using the browser ‘Refresh’ function, the object element reverts to the broken rendering described in the original test.

Current Conclusion
  • Safari 2 does not pass the test acceptably for this to be adopted as the only solution.
  • Internet Explorer's security warnings are irritating, but justifiably unacceptable.

--BenWard 20:17, 26 Aug 2008 (PDT)

  • I concur. And while we can report a bug against IE/Windows in the hopes that it is fixed eventually (perhaps even in IE8 before it ships), as this problem has been fixed in Safari 3, it is doubtful that a bug report against Safari 2 would be fixed in an intermediate version.

--Tantek 03:07, 27 Aug 2008 (PDT)

misconceptions

misunderstanding of authoring unfriendliness
  • not very hand-authoring friendly, compared to other proposals like: Machine data in class: <span class="type data-cell">Mobile:</span>, and data prefix in titles: <span class="type" title="data:cell">Mobile</span> TobyInk
    • It is even more hand-authoring unfriendly to introduce a new syntax, as "Machine data in class" does, and to some extent as "data prefix in titles does". Additional (especially new) syntax introduces far greater cognitive load to the author than a little bit more markup. Tantek

previous iterations

2008-08-23
<object class="type" lang="en-GB">
 <param class="value" name="value" value="cell" />
 mobile
</object>
disadvantages
  • Invalid (X)HTML - although this pattern does make sense, it is worth noting that <param> is one of just a handful of HTML elements for which the class attribute is not defined. Use of this pattern will break validation unless a custom DTD is employed.


details for handling specific property types

date and time separation

summary

By specifying a more precise parsing of the use of "value" excerption inside all datetime properties (e.g. dtstart, dtend, published, updated etc.), dates and times can be marked up separately, thus reducing/minimizing (and potentially eliminating) the readability issues that come with compound ISO8601 datetimes.

introductory example

The sentence:

 The weekly dinner is tonight at 6:30pm.

would be marked up as:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.

advantages

  • re-uses the readable abbr-date-pattern
  • identifies a similarly readable abbr-time-pattern.
  • minimizes DRY violation distance, keeps machine data on exactly the same element as the respective human data
    • even better than abbr-datetime-pattern does, which, in practice from experience often required specifying the date in machine readable form on the human readable time (separate from the human readable date).
  • introduces no new class names - principle of minimal invention
  • introduces no new use of the class attribute - principle of minimal invention again
  • introduces no new syntax (see above about any publishing method that requires the author to think like a programmer being a non-starter, and introducing new syntax almost always requires authors to think like programmers).
  • and most importantly, introduces no dark data.

issues

Some potential issues were raised in IRC, and it helps to document/resolve them so that they are not brought up repeatedly.

  • [Does this sufficiently address the concerns raised with the current use of abbr-pattern?]
    1. The abbr-date-pattern, as documented and explained by Jeremy Keith is just fine (in contrast to the abbr-datetime-pattern).
    2. Similar to the abbr-date-pattern, this proposal implies/introduces the abbr-time-pattern, which is similarly acceptable.
    3. In addition, as long there is incremental improvement, we are making progress. It is more important to take small steps that we know will help some things, rather than try to take a big step that is more risky in the attempt to help more but may not actually do so (as most big changes don't), therefore "sufficiently" is a flawed way of evaluating incremental fixes.
  • Exposes data through tooltips. Separating into 2008-06-07 and 18:03 improves the ability for humans to consume the data, but still exposes data through tooltips and speech in formats that the publisher did not choose to use. --BenWard 04:52, 25 Jun 2008 (PDT)
    1. This is a feature, not a bug. By making the duplicated data at least *somewhat* visible (rather than fully invisible), effective data quality is increased due to the fact that the probability of the ISO8601 and locale-specific data getting out-of-sync is reduced because of the increased visibility (and therefore the increased inspectability and more eye-balls looking at/for problems effect).
    2. Workaround: if a site publisher wishes to customize the presentation of tooltips, they can do so with a nested span with title.
      • That proposes extraneous mark-up maintain some publisher's wish not to have a tool-tip in the first place. I object to a microformat pattern requiring an immediate work-around to meet publisher's desires. It goes against ‘Humans first…’. --BenWard 09:09, 30 Jun 2008 (PDT)
        1. Additional markup has nothing to do with "Humans first".
        2. Additional markup to work-around minor issues (e.g. CSS, cross-browser compatibility, etc.) is a well accepted modern web design practice. It's not ideal, but it is both accepted and widely practiced. With the use of <span> and <div> elements, it's also semantically neutral, therefore not a problem from that perspective either.
        3. Finally, it should not be our goal to try to satisfy *every* publisher, for that would make every microformat beholden to every publisher and contort the design of microformats in really poor ways. We must accept that not all publishers will adopt all microformats and that is ok. Our goal to incrementally increase the number of publishers that adopt microformats, not to try to satisfy each and every one.
      • You *have* to have a tooltip though. It’s not possible to *not* have a tooltip. Not great.--Julian Stahnke 12:58, 4 Sep 2008 (BST)
  • Semantic misuses of ABBR. That ‘tonight’ is ever a textual, human abbreviation of ‘2008-06-24’ is not accepted.
    1. Semantic stretch not misuse. It is a semantic abbreviation rather than a purely syntactical (character shortening) abbreviation, but it is an abbreviation in context nonetheless. Though this may stretch what may be commonly expected as an "abbreviation", the HTML4 spec does seem to allow some flexibility here (HTML 4.01 9.2.1 Phrase elements).
  • Maintaining proper sentences with the expanded form. It is not always possible to use this mark-up and maintain proper sentences with the expanded form. e.g. it's my <abbr class="bday" title="2005-06-20">birthday today</abbr>! becomes ‘it's my 2005-06-20!’. And thus audio rendition of such titles can be nonsensical - "The weekly dinner is two thousand and eight dash zero six dash twenty four at eighteen thirty."
    1. This can and should be addressed by improving authoring examples so that practices improve with experience.
  • Publishing practices and desires show us that authors are not willing to compromise the semantics of abbr. Phae 04:30, 27 Jun 2008 (PDT)
    1. Without specific citations of which authors and what specific issues they have, we are unable to address their issues.
    2. See also above - not our goal to satisfy *every* publisher, but rather to incrementally satisfy more and more. We must accept that there may be some authors we are unable to satisfy in the immediate/short-term.
  • [That's getting pretty complicated]
    • Much less complicated than inventing yet another syntax ( " { ... } " ???? ) that web authors would have to learn.
      • But it's all in one place, rather than spreading it out.
        • The spreading it out is what current content publishing practices do already! It is much more important to map the machine data as close to the existing publishing practice as possible, than to try to "put it all in one place". The "put it all in one place" way of thinking is why people ended up sticking so much invisible metadata in the head of the document, which we know fails.

content requirements

Some requirements which enhance both human readability, and machine parsability (best of both) :

  • date value excerpts MUST use hyphen separators. E.g. 2008-06-24. Not ok:20080624.
  • time value excerpts MUST use colon separators (seconds optional, implied :00 if absent). E.g. 18:30 or 18:30:00. Not ok:183000.
  • timezone value excerpts MUST use leading plus or minus and NO colon separator. E.g. -0700. Not ok:-07:00.

derivation

It's important to document the derivation/background of a brainstorm/proposal as it allows others to see some of the thinking that went into it, and avoid having to rediscuss alternatives already considered, and helps provide understanding as to why aspects of the design are as they are.

example with datetime

Here is a short code example:

 the weekly dinner is tonight at <span class="dtstart">2008-06-24T18:30</span>
example with abbr datetime

However that's not the easiest to read, nor do most people publish that as human visible text, so per the abbr-datetime pattern:

 the weekly dinner is tonight at <abbr class="dtstart" title="2008-06-24T18:30">6:30pm</abbr>

which has raised two issues:

  1. When "2008-06-24T18:30" is inspected by a human reading a tooltip, or spoken by a screen reader, it's not the most understandable thing (precise citation needed, perhaps an mp3 with screen reader used version info).
  2. There is a non-local violation of DRY (which IMHO is a worse problem, as it leads to worse data quality -Tantek). That is, the "date" information is now not only in the text twice (as it was before), but those two instances of the date information are not on the same element, which makes it worse. That is, "tonight" is in the prose, outside of the element with the precise date "2008-06-24".

In analysis of examples of event information on the web, the date and time are often published in separate elements, often for display purposes.

Thus it is this existing content publishing practice which leads to this brainstorm proposal, to essentially to introduce a date and time value excerption longhand.

(Initially Tantek's idea that he bounced off Jeremy Keith (similar idea conceived by Drew independently) was to introduce new classes "datevalue", "timevalue" and "tzvalue" for this purpose, but Bob Jonkman pointed out that HTML5's time parsing algorithm enables a single <time> element to contain dates or times (with or without timezone) without having to explicitly say whether the value contains dates or times (with or without timezone). Bob then proposed that thus all was needed was a single new "datetime" class name. This was the key realization that allowed minimal invention. Tantek pointed out that since from the type of property we already know it is a datetime, there was no need for even one new class name, that we could simply re-use "value" excerption, and simply more precisely specify the semantics/parsing in the case of datetime properties.)

example with new date and time value excerpts

Thus we markup the date and time separately, as value excerpts, using the abbr-date-pattern and an implied parallel abbr-time-pattern:

 The weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr> 
 at <abbr class="value" title="18:30">6:30pm</abbr></span>.
separate subtrees

The proposal also allows setting the date and time in separate element subtrees as well, which may be necessary for some document structures:

 the weekly dinner is <span class="dtstart"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 at <span class="dtstart"><abbr class="value" title="18:30">6:30pm</abbr></span>.

Note the two instances of dtstart, one of which sets the date for the dtstart, and the other of which sets the time.

The idea being, when a parser sees a datetime property (e.g. dtstart) with a value excerpt, that it only "set" the component of its full value that is specified by the value excerpt (e.g. the date), and that if lacking a complete datetime, it continue to parse additional instances of that datetime property for the remaining component(s) (e.g. the time).

Of course this only works for singular properties, but fortunately all instances of datetime properties so far are singular, so this works.

  • hCard's rev is plural. TobyInk
    • can someone give a reference to this being the case? The RFC says "The value distinguishes the current revision of the information in this vCard for other renditions of the information." Does it make sense to have multiple REV dates in a single vCard?
      • The RFC is ambiguous as usual, but a contact card could conceivably have had several changes made to it, with a rev for each. ("Change logs" are fairly common on the web.) The hCard spec is fairly specific about which properties are singular and which are not, and rev is not included in the list of singular properties.
reusing date data for multiple datetime properties

This also provides a *very* convenient way to re-use the same date information for start and end, e.g. expanding the example:

 the weekly dinner is <span class="dtstart dtend"><abbr class="value" title="2008-06-24">tonight</abbr></span> 
 from <span class="dtstart"><abbr class="value" title="18:30">6:30</abbr></span> - 
 <span class="dtend"><abbr class="value" title="20:30">8:30pm</abbr></span>.

Note what just happened. we just eliminated another duplication of date information by reusing the start *date* information for the end *date* information and *only* specifying the end *time* information separately for the two properties.

Reducing the duplication (or triplication) of such data helps to reduce the chances of (even inadvertent) data corruption/drift/divergence among any duplicates.

time zones

There are a few choices for timezones.

  1. Simply include the time zone information as part of the time "value".
    E.g. <abbr class="value" title="18:30-0700">6:30pm</abbr>
  2. Or use another value excerpt for the timezone (was: introduce the class name "tzvalue")
    E.g. <abbr class="value" title="18:30">6:30pm</abbr> <abbr class="value" title="-0700">PDT</abbr>
  3. Or allow both and let web authors decide. This is the current leaning.
    • if web authors want to specify timezone as part of the time (first example above), they can,
    • or if web authors visibly publish the timezone separately (second example above), then they can mark that up.
    • or if web authors wish to omit timezone information, they can do so as well, as most do today. In practice this works fine, as it creates a "floating" time which works fine in far more than the 80/20.

(more to come, documenting from IRC logs)

discussion

Opening up a discussion section even though documentation from IRC logs is still in progress. :)

  • regarding the advantage of "and most importantly, introduces no dark data."
    • "Dark data" is sometimes what publishers *want* to publish. To use the example of TV schedules which kick started the renewed discussion in this area, publishers will often not want to display the date. For instance, if a page entitled "Tomorrow's TV" and containing 300 different programmes marked up with dtstart, it is superfluous to explicitly display the date for each one. With this proposed solution the include pattern could be used to include the date into each vevent, but a visible link to the date on each programme would simply be confusing. Sometimes it just makes sense to hide some of the information you're publishing as a microformat - because the information you want to make explicit to parsers can be inferred from context by humans, or is more appropriately displayed at a different level of granularity for machines and humans. TobyInk 14:26, 24 Jun 2008 (PDT)
      • It doesn't matter whether publishers *want* to publish dark data or not. Invisible data always leads to poorer quality data. Publishers publish all kinds of invisible metadata in the heads of documents etc. because they want to, but their desire doesn't stop the data from becoming obsolete, diverging from the actual visible data etc. The quality of the data matters more than any publishers wish(es) of publishing in a specific format, or in a hidden way. In the example you gave, using the include pattern in that way would not result in any visible links, but merely empty include anchors. It never makes any sense to actually hide "some of the information you're publishing as a microformat", because historically that always results in some loss of data quality over time and thus the microformats principle of visible data instead of invisible metadata. Tantek 14:32, 24 Jun 2008 (PDT)
        • All microformats hide some data. In the example <span class="tel">01632 960123</span>, the information that the long string of numbers represents a telephone number is invisible. And making it visible (Tel: <span class="tel">01632 960123</span>) violates DRY. It's just a matter of where to draw the line.
          • That statement makes the mistake of conflating *type* data and *content* data. "tel" is not content data, just as <p> is not content data. It's markup, indicating the type of the data. Markup (type data) being invisible to the user has worked just fine. Content (content data) being invisible to the user is the problem of dark data. Or rather, if you think that everything is data, then you really should be spending time developing in a system that is built on that assumption, e.g. RDF, rather than microformats, which are built on HTML, and the clear separation of type of data (HTML elements, microformats properties) and content data (inner text, text attribute values).
            • My point is that there isn't a distinction between the two, but a continuum. The choice of where to draw the line is never a clear one and always somewhat arbitrary. The vCard standard could quite easily have ended up with separate "TEL", "FAX" and "CELL" properties, in which case hCard would have ended up with <foo class="tel">, <bar class="fax"> and <baz class="cell">. Going the other way, they could have stored e-mail addresses as mailto: URLs, and then hCard would have <a class="url" href="mailto:quux@example.com">. They chose the way they did, and as a result in hCard the distinction between a mailto: URI and an http: URI is largely invisible (in most circumstances only obvious by looking at the status bar when hovering), but the distinction between a telephone number and a fax number is visible. But that wasn't the only possible (nor the only reasonable) outcome.

enabling more use of title attributes

valuetitle

Numerous proposals over the years have advocated expanding the use of the title attribute beyond the abbr tag for storing microformat property values. One simple mechanism for doing so would be to introduce a new value excerption class name and rule.

valuetitle: before "normal" value excerption handling, first look (in the same manner as value-excerption) for the class name "valuetitle", if it is found, use the value of the title attribute on that element and do no further value excerption or other parsing for that property value.

E.g.

<span class="type"> <span class="valuetitle" title="cell">mobile</span> </span>

In addition to first looking for "valuetitle" where a parser would look for "value", it seems reasonable to also allow "valuetitle" on the property element itself in order to minimize the markup necessary, e.g.:

<span class="type" class="valuetitle" title="cell">mobile</span>

Naming reasoning/methodology: by using the prefix "value-" it is clear that this is part of the value excerption pattern. By using the suffix "-title", it is clear that the "title" attribute is involved. Thus the name "valuetitle" is a good mnemonic for its functionality. See related naming-principles.

"valuetitle" was suggested on 2008-08-30 by Tantek in a discussion with Ben Ward.

previous similar proposals

I believe there may have been a proposal for "usetitle"(link+citation needed) in the past that would function similarly. I think "valuetitle" is better than "usetitle" as "valuetitle" is more *descriptive*, i.e. meaning "the title is the value", as opposed to "usetitle", which is more *prescriptive*, i.e. "use the title". Tantek 08:13, 1 Sep 2008 (PDT)


related pages