Genealogy Formats

From Microformats Wiki
Jump to navigation Jump to search


Per the microformats process, towards the development of a genealogy microformat, this page documents previous/existing genealogy related formats.

GEDCOM

GEDCOM has become pretty much the defacto standard for sharing data between genealogy systems. It is hierarchical and link based, much like HTML; but it encodes family structure (which is a general graph) outside of this structural hierarchy.

GEDCOM was developed (...) to provide a flexible, uniform format for exchanging computerized genealogical data.[1]

  • I'm not sure whether it makes sense to do GEDCOM as its own format, the FAM structure and the need to present different reports, suggest to me that we need some kind of post-GEDCOM markup. To see how direct use of GEDCOM might pan out I hacked up this GEDCOM Worked example. To me the main issue seems to revolve around the FAM structure. I think the Jay Askren approach might be better than the Gene Stark work as a starting point.
  • Had a look at some examples of what GEDCOM creates [2]. Basically, seems to be XFN relationships (siblings, spouses etc.) and hCard information (could genealogy be inferred from existing XFNs regardless of a hGED format?). The only additional information we do not currently hold in a format is that of gender. GEDCOM specifies male or female for each individual. Creating something using these formats would be quite straightforward, but not sure its takeup would be good unless someone was interested in creating a hGEDCOM2GEDCOM. -- Frances Berriman
  • GEDCOM is basically a set of INDIvidual records, related by FAMily nodes the family nodes contain the HUSBand, WIFE and CHILd. The INDI records are quite similar and might be replaced by hCard records, but the graph structure is a little harder to capture; families aren't strict trees, so a direct mapping to XML doesn't really work. Publishing a GEDCOM database directly to the web might not be the most logical thing to do.
  • Genealogical information has date-of-death, which is also missing in hCard format (although hCard does have date-of-birth). Much of genealogical information is event based: Date of birth, date of death, dates of marriages and divorces, and many other significant events such as religious observances (Baptisms, Bar/Bat Mitzvahs) and migrations ("Moved to Canada from the Netherlands"). This all translates wonderfully to hCalendar. Additionally, a properly researched family tree will cite sources for all the data listed, and so could use hCite. The biggest problem I see in using hCalendar is that genealogical data allows approximate dates, specifically "ABT 4 July 1776", "BEF 25 Dec 1903", "AFT 11 Nov 1918". It also also allows ambiguous dates, "July 1867" or just "1886", or even "4 July". And these in combination, (Approximately ambiguous dates? Ambiguously approximate dates?), eg. "BEF Feb 2007", "AFT 1945". The most ambiguous entries I've seen for dates are "DECEASED" when date-of-death is unknown, and "NOT MARRIED" for couples who have not had a wedding ceremony. (Info from Guidelines for event dates in the PAF Help File).
The only relationship links in GEDCOM are HUSBand, WIFE and CHILd. All other relationships (brother, sister, grandparents, grandchildren, uncles, aunts, nieces, nephews, cousins) can be inferred by traversing family records. This does mean that any collection of genealogical pages need some way to cross-reference to each other. This isn't a problem for all pages on a single Web site, which use RIN (Record Identifier) or REFN (User Reference Number). However, different Web pages maintained by different genealogists may have conflicting RINs and REFNs. There is a globally-unique AFN (Ancestral File Number) issued by the Church of Jesus Christ of Latter-Day Saints (LDS), but I don't know how they're issued and most genealogical sites don't use them anyway.
The GEDCOM format contains much other data specific to the LDS, but I don't know how widespread it is, nor how appropriate it would be to code it into a microformat intended to reach well beyond the LDS.
Regardless of whether an hGED microformat is developed, it would still be valuable to mark up genealogical information with microformats on Web pages for the semantic value.
Bob Jonkman 07:58, 9 Feb 2007 (PST)

GEDCOM Replacement Efforts

There are currently two major efforts to develop a replacement for the largely out-of-date GEDCOM format (last updated in 1999).

GEDCOM X

One effort is GEDCOM X [3], by FamilySearch, the original creator of GEDCOM. While the format is openly published on github and the development is fairly transparent, it is completely controlled by FamilySearch (a division of the Mormon church). Includes JSON and XML serialization formats, as well as a file format which includes many files compressed into a zip file.

FHSIO

The other effort is the Family History Information Standards Organization (FHSIO) [4] which is gathering member companies into a consortium to develop a replacement format. Part of the goal of FHISO is specifically to take genealogy standards out of the control of a single organization. FHISO was spawned out of a grass-roots effort to replace GEDCOM called BetterGEDCOM [5].

Wikipedia Persondata

Wikipedia's Persondata aligns very closely with hCard, but has additional date and place of birth & death fields. Andy Mabbett 13:04, 28 Jan 2007 (PST)

vCard birth death extensions

http://tools.ietf.org/html/draft-li-vcarddav-vcard-id-property-extensions

This vCard extension draft proposes new properties related to birth location, death date, and death location.

  • BIRTHPLACE
  • DEATHPLACE
  • DEATHDATE

External Links

See also