Showing posts with label Component. Show all posts
Showing posts with label Component. Show all posts

Sunday, August 19, 2012

Introducing the ACD Tool [Original Work]

It is with satisfaction I announce the release of my first ever population genetics spreadsheet for fellow researchers. The Ancestral Component Dissection (ACD) Tool is a piece freeware I have developed to give those with a similar knack for fiddling with ADMIXTURE, Y-SNP and mtDNA frequency data better means to flesh out inter-population differences.


ACDTool (v1.0)
How Does The ACD Tool Work?

The ACD Tool relies on the frequencies of "ancestral components", a general catch-all term for uniparental markers (Y-SNP's, mtDNA) and Autosomal DNA (auDNA). These form the mainstay of much of the work that has been done in population genetics for the past few decades. The advent of "genome blogger" projects has brought the immediacy of these techniques to those who have tested with personal genetics companies, such as Family Tree DNA (FTDNA) and 23andMe. The ACD Tool should therefore be considered a supplementary item by those interested in these results, as well as data procured from current literature.

The level of commonality that occurs between many populations and ethnic groups poses a problem for those interested in investigating what differences arise between them.

To solve this, the ACD Tool works by removing mutual shared component frequencies between sample averages within a region. The idea is to lessen the amount of regional similarity and intentionally exaggerate those differences that exist between neighbours.

This is achieved by removing congruent component values across all populations (using the lowest value as a benchmark), leaving only the differences behind.


What Experiments Are Ideal?

As the ACD Tool is intended for finer inter-population analysis, it is best applied in a regional context. It serves the purpose of better revealing genetic differences which may account for linguistic or micro-regional trends.

Example #1: Northeast Europeans (Dodecad)

Once the Polish, Russian and Finnish Dodecad cohort averages were run through the ACD Tool, I simply used Excel to create the charts. The "Before-After" feature is used to highlight that the tool has completely achieved its' desired goal in amplifying the genetic differences between them:


NE European auDNA (Dodecad) through the ACD Tool



Example #2: West Asians (Harappa)
Using the Harappa Ancestry Project this time, I ran the data of Armenians, Assyrians, Kurds and Iranians (mostly from the Harappa cohort) into the ACD Tool once more and presented the differences as above:

W Asian auDNA (Harappa) through the ACD Tool


Example #3: South-Central Asians (Eurogenes)
A final example pits Pathans, Jatts, the Burusho, Balochis and Brahuis against one another:

SC Asian auDNA (Eurogenes) through the ACD Tool



Are There Any Drawbacks?
The efficacy of the ACD Tool depends on the number of populations, cohort size and cohort specificity. As the examples above show, the level of inter-population component sharing may decrease greatly if groups that are from more genetically diverse regions are compared.

In addition, using the ACD Tool on populations that are too different (i.e. Han Chinese and Yoruba) will not work given the genetic overlap through either ADMIXTURE, Y-SNP's or mtDNA is negligible. Of course, this defeats the point of the tool in the first place.

Lastly, the tool requires Macros to be enabled for the instructions to work.


Disclaimer

The ACD Tool is an open-source free-to-use spreadsheet. Those wishing to modify the spreadsheet for their personal use are welcome to do so. However, any modifications made to the ACD Tool with the intent of subsequent redistribution are kindly asked to contact the creator (myself) before doing so out of common courtesy.

Please also note the ACD Tool is a first attempt at giving back to the genealogy world I have been a part of for several years. Though functional (as shown above), it is not without bugs. In light of this, I am not responsible for any loss of data that may occur from its' use.

Finally, I hope the genealogy world finds some use for this nifty piece of kit.



Acknowledgements

To the Dodecad Ancestry ProjectHarappa Ancestry Project and Eurogenes Genetic Ancestry Project (auDNA used in Examples).

Addentum I [20/08/2012]: ACDTool v1.1 replaces v1.0, Macros smoothened and instructions refined. Eurogenes South-Central Asian example also added.

Tuesday, June 26, 2012

Worldwide Distribution of Dodecad K10a Components [Review]

Numerous ADMIXTURE runs have been completed by the Dodecad Ancestry Project since its' inception approximately two years ago. The status of certain components remained tenuous despite subsequent runs, whilst others provided fairly stable values for the bulk of the project's participants.

With the completion of the latest K10a run, I have composed a series of geographically accurate frequency maps with the intention of effectively presenting the trends that can be seen through the raw data.


Method

Data; values from over 130 groups obtained through the Dodecad K10a Spreadsheet. Only groups with at least 5 participants considered. Composites of populations were taken where appropriate and denoted with _cmp. Labels shown otherwise identical to source. The O_Italian_D group was excluded because no information on their origins were found online. 

Mapping; Dodecad participant populations allocated to national capitals. Exact location of reference populations obtained where possible (see Citations) however some allowances were made regarding those accompanied by scant information. Refer to the Data Sink for the population list, coordinates and commentary made during mapping process. No numerical data, aside from those shown for certain populations, was shown to minimise clutter and to remain faithful to the intention of this entry.

Population depiction; I deemed it necessary to separately consider the genetic structure of Jewish, Indian and expatriate/New World populations and exclude them from the rest of Europe, Asia or Africa. Including Jewish minorities with their gentile compatriots would render the maps uninformative. The complexity of India's demographics, particularly because of the caste system, makes frequency maps an improper choice for revealing inter-group genetic differences. 


Results





















Acknowledgement

The raw values used in this investigation are attributed to Dienekes Pontikos, author of the Dodecad Ancestry Project.


Addenum I [04/07/2012]: Inclusion of All Components Colourised map, shown below:




Citations
http://www.uvm.edu/~rsingle/stat295/F05/papers/Cavalli-Sforza-NRG-2005_Ceph-HGDP-CDP.pdf
http://www.1000genomes.org/about 

Saturday, March 31, 2012

North European Component Variation within the Eurasian Heartland [Original Work]

As DNA variation across Asia have progressed over the years (Wells et al., Xing et al., teaser mtDNA results from Burger et al.'s upcoming analysis of prehistoric Eurasian steppe remains), the prevailing theme of ancestral markers with origins in Europe has remained a frequent one, particularly with regard to the expansion of Bronze Age semi-pastoral nomads from the Pontic-Caspian steppe bearing the Indo-European languages.

David W. of the Eurogenes Genetic Ancestry Project has recently posted data online from a new Intra-European run using ADMIXTURE (K=12) with the intention of breaking up the North European component that often arises through the program. Spreadsheet results here.

This brief investigation seeks to identify the North European-derived component patterns within Asia by first mapping out the frequencies and then correlating with Eurogenes' release notes on each.

Method
As many samples from immediately-identifiable populations were obtained from the spreadsheet results (link above). No sample restrictions were implemented. Averages of each population were calculated, except where n=1. No modifications made to population labels except for Eurogenes population averages, denoted by the addition of a _Eg suffix. Populations were then allocated into arbitrary regional groups, allowing results to be displayed more coherently.

Results
Tabulated results can be found in the Data Sink. Autosomal variation per Regional Group can be found below:











The North European-derived components, despite their exceptionally close Fst. distances relative to the other components, do seem to reveal a few interesting trends;

  • Northeast European appears to (at least partially) be the result of allele sharing with populations further east, as evidenced by its' predominance in East-Central Asian groups, as well as extending even further eastwards into the Siberian Selkup (n=1). This component has a circumstantial correlation with the craniometric and ancient mtDNA evidence suggestive of a "migration corridor" between Eastern Europe and Siberia (Malyarchuk et al.'s On the Origin of Mongoloid Component in the Mitochondrial Gene Pool of Slavs, Newton's Ancient Mitochondrial DNA From Pre-historic Southeastern Europe: The Presence of East Eurasian Haplogroups Provides Evidence of Interactions with South Siberians Across the Central Asian Steppe Belt). While it also explains this component's abundance in North Caucasian populations (lie en route between Ukraine and Siberia), the same cannot be said with absolute certainty of South-Central Asia. With that being said, the 0.021 Fst distance with West European despite the markedly different distributions suggests both are the result of prehistoric (possibly paleolithic?) hunter-gatherer migration paths across large swathes of Eurasia.
  • West European has a sporadic appearance across with an Asian peak in the North Caucasus. This implies - Staying true to its' assigned label - It is a generic West Eurasian component that has reached a maximum in Western Europe, with the North Caucasus representing the closest point of reference to there. Indeed, this inference is made independently by Eurogenes, albeit using different parameters;
"I used samples of Scottish, Irish and Western English ancestry to create this cluster. Not surprisingly, it peaks in individuals of Western Irish descent. However, it also peaks in Basques and many Iberians, which is fascinating, because that makes it the autosomal equivalent of Y-chromosome haplgroup R1b in Europe."
  • North Sea and South Baltic accompany one another at similar frequencies across much of Asia, especially in populations with an Indo-Iranian-speaking heritage (observe the ~0.8-1:1 ratio among Kurds, Iranians, the Turkmen, Uzbeks, Tajiks, Brahmins, Kshatriya's and Kyrgyz as examples of this). It is interesting to note that, of the two, only the North Sea component is readily present in East-Central Asians. The only other likely migration path along this trajectory is that of the proto-Tocharians, who (under the Eurasian steppe theory) split off from the Proto-Indo-European homeland several millennia prior to the Proto-Indo-Iranians that eventually formed the Andronovo archaeological horizon from Sintashta/Pit Grave (E Kuz'mina, The Origin of the Indo-Iranians, pg.451). Perhaps this near-solitary North Sea component within the Altaians, Mongolians and Uyghurs is attributed to early speakers of Tocharian? Perhaps the elevated presence of the North Sea component in South-Central Asia (Jatts, Pathans, Kyrgyz) is a relic of the Kushans, nomads supposedly a part of the Yuezhi confederacy, who may have been Tocharian speakers themselves? 
  • One curious phenomenon is the similar West European-North Sea-Northeast European component proportions across the Turkmen, Uzbeks, Kyrgyz, Pathans, Uttar Pradesh Brahmins, Altaians and the Uyghur. Whether this can be substantiated in any way, or whether it is simply an anomalous association predicated by non-uniform and varying sample sizes, prevents a firm conclusion from being made.
  • North European-derived frequencies among Southwest Asian Semitic-speaking groups shown here seldom exceed 1% apiece and are either the result of recent, inconsistent small-scale admixture events or are simply background noise generated by ADMIXTURE.
Summary
The Northeast European and West European components appear to have a distribution independent of any significant migration events since the Neolithic, instead being associated with either the "migration corridor" across Eurasia or simply being the result of mutual West Eurasian heritage. North Sea and South Baltic, on the other hand, do seem to correlate with one another and support (rather than contradict) the eastward movement of Bronze age semi-pastoral nomads speaking early dialects of Proto-Indo-European.

Edit I [31/03/2012]: Correction of erroneous Brahmin results due to Google Spreadsheet lag.