Showing posts with label ADMIXTOOLS. Show all posts
Showing posts with label ADMIXTOOLS. Show all posts

August 08, 2013

Major admixture in India took place ~4.2-1.9 thousand years ago (Moorjani et al. 2013)

A new paper on the topic of Indian population history has just appeared in the American Journal of Human Genetics. In previous work it was determined that Indians trace their ancestry to two major groups, Ancestral North Indians (ANI) (= West Eurasians of some kind), and Ancestral South Indians (ASI) (= distant relatives of Andaman Islanders, existing today only in admixed form). The new paper demonstrates that admixture between these two groups took place ~4.2-1.9 thousand years ago.

The authors caution about this evidence of admixture:
It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years3 (although it is possible that with further sampling and new methods such relatedness might be detected). An alternative possibility that is also consistent with our data is that the ANI and ASI were both living in or near South Asia for a substantial period prior to their mixture. Such a pattern has been documented elsewhere; for example, ancient DNA studies of northern Europeans have shown that Neolithic farmers originating in Western Asia migrated to Europe about 7,500 years BP but did not mix with local hunter gatherers until thousands of years later to form the present-day populations of northern Europe.15, 16, 44 and 45
This is of course true, because admixture postdates migration and it is conceivable that the West Eurasian groups might not have admixed with ASI populations immediately after their arrival into South Asia. On the other hand, a long period of co-existence without admixture would be against much of human history (e.g., the reverse movement of the Roma into Europe, who picked up European admixture despite strong social pressure against it by both European and Roma communities, or the absorption of most Native Americans by incoming European, and later African, populations in post-Columbian times). It is difficult to imagine really long reproductive isolation between neighboring peoples.

Such reproductive isolation would require a cultural shift from a long period of endogamy (ANI migration, followed by ANI/ASI co-existence without admixture) to exogamy ~4.2-1.9kya (to explain the thoroughness of blending that left no group untouched), and then back to fairly strict exogamy (within the modern caste system). It might be simpler to postulate only one cultural shift (migration with admixture soon thereafter, with later introduction of endogamy which greatly diminished the admixture.

The authors cite the evidence from neolithic Sweden which does, indeed, suggest that the neolithic farmers this far north were "southern European" genetically and had not (yet) mixed with contemporary hunter-gatherers, as they must have done eventually. But, perhaps farmers and hunters could avoid each other during first contact, when Europe was sparsely populated. It is not clear whether the same could be said for India ~4 thousand years ago with the Indus Valley Civilization providing evidence for a large indigenous population that any intrusive group would have encountered. In any case, the problem of when the West Eurasian element arrived in India will probably be solved by relating it to events elsewhere in Eurasia, and, in particular, to the ultimate source of the "Ancestral North Indians".

It is also possible that some of the ANI-ASI admixture might actually pre-date migration. At present it's anyone's guess where the original limes between the west Eurasian and ASI worlds were. There is some mtDNA haplogroup M in Iran and Central Asia, which is otherwise rare in west Eurasia, so it is not inconceivable that ASI may have once extended outside the Indian subcontinent: the fact that it is concentrated today in southern India (hence its name) may indicate only the area of this element's maximum survival, rather than the extent of its original distribution. In any case, all mixture must have taken place somewhere in the vicinity of India.

A second interesting finding of the paper is that admixture dates in Indo-European groups are later than in Dravidian groups. This is demonstrated quite clearly in the rolloff figure on the left. Moreover, it does not seem that the admixture times for Indo-Europeans coincide with the appearance of the Indo-Aryans, presumably during the 2nd millennium BC: they are much later. I believe that this is fairly convincing evidence that north India has been affected by subsequent population movements from central Asia of "Indo-Scythian"-related populations, for which there is ample historical evidence. So, the difference in dates might be explained by secondary (later) admixture with other West Eurasians after the arrival of Indo-Aryans. Interestingly, the paper does not reject simple ANI-ASI admixture "often from tribal and traditionally lower-caste groups," while finding evidence for multiple layers of ANI ancestry  in several other populations.

My own analysis of Dodecad Project South Indian Brahmins arrived at a date of 4.1ky, and of North Indian Brahmins, a date of 2.3ky, which seems to be in good agreement with these results.

The authors also report that "we find that Georgians along with other Caucasus groups are consistent with sharing the most genetic drift with ANI". I had made a post on the differential relationship of ANI to Caucasus populations which seems to agree with this, and, of course, in various ADMIXTURE analyses, the component which I've labeled "West Asian" tends to be the major west Eurasian element in south Asia.

Here are the estimated admixture proportions/times from the paper:


Sadly, the warm and moist climate of India, and the adoption of cremation have probably destroyed any hope of studying much of its recent history with ancient DNA. On the other hand, the caste system has probably "fossilized" old socio-linguistic groups, allowing us to tell much by studying their differences and correlating them with groups outside India.

Coverage elsewhere: Gene Expression, HarappaDNA
Related podcast on BBC.

AJHG doi:10.1016/j.ajhg.2013.07.006

Genetic Evidence for Recent Population Mixture in India

Priya Moorjani et al.

Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.

Link

June 08, 2013

Friendly rejoinder to genetiker

genetiker calls me "dumber than he thought" and responds to my criticism of his model. As always, I will disregard the name calling and deal with the (much more interesting) facts.

First, he writes:
In his post Dienekes takes the phylogeny I used for running the F4 ratio estimation program and shows that it won’t work for f3 statistics. 
No kidding.
No kidding indeed. Either genetiker believes in his phylogeny or he doesn't. The fact that the F4 ratio estimation program requires a phylogeny with that structure is meaningless: as I have shown, that phylogeny is wrong because it makes a prediction that is falsified by the data. Garbage in-garbage out, so the estimates obtained by genetiker with the wrong phylogeny are of course... wrong.

Second, he presents an even more elaborate phylogeny, where "V is Veddoids, C is Caucasoids, M is Mediterraneans, N is Nordics, G is Mongoloids, S is Sardinians, E is Europeans, and A is Amerindians."


This phylogeny is of course also wrong, for at least two reasons:

  • It ignores post-admixture drift in Europeans, i.e., the drift that has accumulated after E was formed by M+N. This drift is always traversed in the same direction from E to S and to A, so it contributes a constant positive term in the value of F3(E; S,A)
  • It proposes instantaneous formation of S, E, and A, e.g., the "Nordic" component in Europeans is symmetrically related to the "Nordic" component in Sardinians and Amerindians. genetiker clearly does not believe this, since he argues in his site (i) for I-M26 bearing "White Gods" coming to the Americas via the Canary islands, (ii) that mtDNA haplogroup X in the Americas is Caucasoid and so is (iii) Y-haplogroup C, which although "originally Veddoid" was carried by "Caucasoids" into the Americas. Now there's zero evidence that any of this has anything to do with Caucasoids, let alone Nordics in the Americas, but in any case it would be nice if genetiker harmonized his convoluted model of "Nordic" migrations with his phylogeny. In other words, his mental model of what happened isn't only inconsistent with the data, it's also inconsistent with itself. 
Finally, genetiker attempts to work out the mathematical details of his model, arriving at the conclusion that:
There are four paths from Europeans to Sardinians and four paths from Europeans to Amerindians, so there are sixteen path combinations.
This is of course wrong, because these paths are not independent; one actually needs to sum over 8 (=2^3) different trees for the different combinations of α, β and γ in the model; genetiker is therefore using wrong math applied to a wrong model. I believe his confusion stems from conflating admixture edges with drift edges.

It is not clear what he has aimed to accomplish with this "model", but let's analyze it properly: 

  • If α,β or 1-α,1-β  then because of the instantaneous derivation of S,E from M and N respectively there is no drift in the degenerated length-0 "path" E-to-S, and hence F3(E; S,A) = 0. So, we only have to consider the cases α, 1-β and 1-α, β:
  • If 1-γ then if α,1-β we have drift overlap MC, or if β,1-α we have drift overlap CN
  • If γ then then if α,1-β we have drift overlap MC+CN, or if β,1-α we have drift overlap 0

So, in total we have a positive F3(E; S, A) statistic again, since we are summing over positive or zero drifts. If we also added the post-admixture drift in E, that statistic would be even higher -although this is not really necessary to falsify genetiker's model.

In any case, I still applaud genetiker for engaging with the data, and I'm happy to contribute to his continuing education!

June 05, 2013

Amerindian-like admixture in northern Europe is real

genetiker, a new genome blogger questions the existence of Amerindian-like admixture in Europe. I am generally well-disposed to anyone who tries their hand at analysis of genetic data. On the other hand, if one  accuses me of writing a series of posts "chock-full of stupidity", then there's a good chance I might respond. This should also be useful for anyone wishing to understand the evidence for this admixture.

genetiker proposes that the "Amerindian-like" admixture in North Europeans is misunderstood and can be in fact explained by the existence of "North European-like" admixture in Amerindians. In support of this, he presents the results of an F4 Ratio estimation analysis which suggests that there is "Nordic admixture of the Amerindian populations in the 10 to 20 percent range."

F4 Ratio estimation produces admixture estimates but does not prove the existence of such admixture. The admixture estimates are as good as the relationship proposed for a particular set of populations. If the relationship is nonsensical, so will be the admixture estimates.

According to genetiker, the following relationship holds, with A=Sardinian, B=Orcadian, C=Dai, and O=Yoruba, with X=Amerindians.


But, is the above consistent with the data? The existence of Amerindian-like admixture was argued by Patterson et al. (2012) on the basis of the following F3 statistic (right):

F3(European; Sardinian, Amerindian)

which is signifantly negative for North Europeans. Now, consider the value of this statistic for genetiker's phylogeny.

F3(B = North European; A = Sardinian, X = Amerindian)


In the above figure I color-coded the path from B=North European to A=Sardinian (red) and from B=North European to X=Karitiana (green, if it goes via the supposed "North European" admixture, or blue, if it goes via the "Amerindian" admixture). The value of the F3 statistic is then the weighted sum of the overlap of the red/green and red/blue paths:

F3(B; A, X) = αBZ+(1-α)(BZ+ZW)

where BZ and ZW are drifts along the paths indicated in the figure. This statistic is then always positive, since the common segments in the graph are traversed in the same direction.

genetiker's model is thus falsified by the data: it predicts a positive f3(North European; Sardinian, Amerindian) statistic, but we in fact observe negative ones.

February 22, 2013

ADMIXTOOLS 1.1 released

A new 1.1 version of ADMIXTOOLS has been released. From the description:
ADMIXTOOLS (Patterson et al. 2012) is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates. It can be downloaded for LINUX (see documentation). The software package also includes Affymetrix Human Origins Curated Dataset. Write to Arti Tandon if you have questions about the software and for scientific questions write to Nick Patterson. The new release fixes a serious bug in qpDstat. 
I've used this software before and posted some D-statistics from it on the blog, so if you find any that look strange, feel free to leave a comment. In any case, I'll be using the new version of qpDstat from now on.

UPDATE (Feb 22): Nick Patterson asked me to post the following for users of ADMIXTOOLS:
Choongwon Jeong of the University of Chicago found a serious bug in qpDstat (computes D-statistics) that sometimes returns D with an incorrect sign.  If you use the program please download ADMIXTOOLS version 1.1 from  the Reich lab web page.  http://genetics.med.harvard.edu/reich/Reich_Lab/Software.html

November 03, 2012

rolloff and ALDER analysis of Turks

I carried out rolloff analysis of the Behar et al. (2010) sample of Turks together with the sample of Uzbeks from the same, and the Yunusbayev et al. (2011) sample of Armenians. A --geno 0.03 flag was applied for merging and SNPs available in the Rutgers recombination map for Illumina chips were used.

The exponential decay can be seen below:

The signal of admixture seems pretty clear and extends up to several cM. Of course, as always, this does not mean that exactly these two populations mixed to form the Turks sample, but it does mean that they are reasonable standins.

The jackknife gives an admixture time estimate of 27.622 +/- 5.348 generations or 800 +/- 160 years, which of course makes perfect historical sense as it is a date between the first arrival of the Seljuks in Anatolia and the final consolidation of power by the Ottomans. Note also that this probably applies principally to this particular sample (which I believe is from Cappadoccia) and there were perhaps different admixture dynamics elsewhere.

I had started this analysis before the announcement of ALDER, but since it is very fast, I decided to give it a go as well. Below is the raw output:




                    *** Admixture test summary ***

Weighted LD curves are fit starting at 1.45 cM

Pre-test: Does Turks have a 1-ref weighted LD curve with Armenians_Y?
   1-ref decay z-score:    0.09
   1-ref amp_exp z-score: -0.01
                                  NO: curve is not significant

Pre-test: Does Turks have a 1-ref weighted LD curve with Uzbeks?
   1-ref decay z-score:    6.56
   1-ref amp_exp z-score:  5.02
                                  YES: curve is significant

Does Turks have a 2-ref weighted LD curve with Armenians_Y and Uzbeks?
   2-ref decay z-score:    5.61
   2-ref amp_exp z-score:  5.58
                                  YES: curve is significant

Do 2-ref and 1-ref curves have consistent decay rates?
   1-ref Armenians_Y - 2-ref z-score:                  0.01   ( 13%)
   1-ref Uzbeks - 2-ref z-score:                       0.69   ( 11%)
   1-ref Uzbeks - 1-ref Armenians_Y z-score:          -0.00   ( -1%)
                                  YES: decay rates are consistent

Test FAILS (z=5.58, p=2.4e-08) for Turks with {Armenians_Y, Uzbeks} weights

DATA: failure 2.4e-08 Turks Armenians_Y Uzbeks 5.58 -0.01 5.02 13% 23.92 +/- 4.26 0.00002930 +/- 0.00000525 27.18 +/- 302.36 -0.00000082 +/- 0.00013129 26.84 +/- 4.09 0.00002316 +/- 0.00000461

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B



The age estimate appears to be very similar, and most curves appear to be significant, except the one with Armenians_Y. This makes good sense. From Loh et al. (2012):
Also, if a reference A' shares some of the same admixture history as C or is simply very closely related to C, the pre-test will typically identify long-range correlated LD and deem A' an unsuitable reference to use for testing admixture.
In our case, A'=Armenians and C=Turks. We can be fairly sure that Armenians lack the same admixture history as Turks (because they were not affected by Central Asian Turkic invasions), but we can try a 1-ref analysis of Armenians with Uzbeks to substantiate it. The admixture lower bound estimate is a huge interval 7.6 +/- 88.2 and the jackknife is unable to estimate the admixture time. Thus, more plausibly, the second explanation applies, and because Armenians_Y are very closely related to Turks, they are deemed as an inappropriate reference to test admixture.

Finally, the lower bound of the admixture fraction for Turks with an Uzbek reference is estimated as:

Mixture fraction % lower bound (assuming admixture): 29.8 +/- 4.0

This is a very interesting number. We can be fairly sure that Central Asian Turkic people who invaded Anatolia carried with them an East Eurasian component, but in what proportion to their West Eurasian one? The East Eurasian element in Turks has been rather consistently estimated at ~5-7% with various methods, so perhaps this formed the minority element in the Turkic people who arrived in Anatolia. 

On the other hand, this case is rather muddled by the occurrence of by-directional gene flow: Uzbeks may have West Eurasian ancestry of ultimate West Asian origin, just as Turks have Central Asian ancestry. And, indeed, when we estimate the admixture fraction of Uzbeks with the Turks as a reference, we obtain:

Mixture fraction % lower bound (assuming admixture): 46.7 +/- 2.4

The age estimate for this is ~16 +/- 2 generations = 460 +/- 60 years. Very similar time estimates appear when Armenians are used as a West Eurasian reference. So, this might indicate that the Uzbek population was formed by admixture after the Anatolian Turks were so formed.

I see no easy way to solve the problem of estimating admixture proportions when both extant populations have been both donors and recipients of gene flow, but in any case, these numbers are something to think about.

Analysis of Turks with a variety of Turkic and East Asian populations

I subsequently formed a new dataset by merging the sample of Turks with a variety of Turkic and East Asian populations (same procedure for SNP choice).


For the calendar year calculation, I arbitrarily set the birthdate of the modern sampled individuals at 1980; I have no idea on the age profile of the individuals comprising the Behar et al. sample of Turks. I have also used a mindis=0.5cM which facilitated the convenient automated extraction of the dates from the ALDER output and also gave a level playing field for all the reference populations. The age picked by ALDER using its own adaptive threshold did not usually differ from the reported one by more than a few generations.

The results indicate two things:

  • The % of admixture depends on the choice of population, with highest amount using Uzbeks  as a reference, and lowest using the far Asian populations from China. This indicates our uncertainty regarding the East/West Eurasian-ness of the people who settled in Anatolia.
  • Admixture times, on the other hand appear to be fairly constant and appear to frame an important watershed moment of Anatolian history, the Battle of Manzikert which paved the way for the eventual Turkification of the peninsula. The Turkmen sample appears as an outlier in this respect, which might indicate that limited migration of Turkmen tribes may have occurred at a later date.

Admixture in the Chuvash and the Uygur

I took the Behar et al. (2010) sample of Chuvash, excluding GSM536731 which has atypical ancestry and merged it with the Li et al. HGDP French_Basque and Dai. The latter two populations don't show evidence of admixture according to both the f3-statistic and ALDER (Loh et al. 2012). (I used a --geno 0.03 flag in PLINK and extracted a subset of SNPs including in the Rutgers recombination map for Illumina chips).

The f3-statistic f3(Chuvashs_16; French_Basque, Dai) was equal to -0.011311 (Z=-31.308), indicative of admixture.

I then ran an ALDER analysis:


Test SUCCEEDS (z=4.85, p=1.2e-06) for Chuvashs_16 with {French_Basque, Dai} weights

DATA: success (warning: decay rates inconsistent) 1.2e-06 Chuvashs_16 French_Basque Dai 4.85 3.78 5.18 50% 40.27 +/- 5.80 0.00032377 +/- 0.00006676 28.21 +/- 7.47 0.00004231 +/- 0.00000962 47.08 +/- 4.53 0.00016628 +/- 0.00003212

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

This indicates that the Chuvash can be seen as admixed, but with inconsistent decays: the one with the French Basque (=28.21) is younger than the one with the Dai (=47.08). I think this makes fairly good sense, because the Chuvash are descended from people who came to Europe during the 1st millennium AD and must have later mixed with Europeans, perhaps with eastern Slavs as these made their way eastward during the 2nd millennium AD.

I then carried out similar analyses on the HGDP Uygur. As expected f3(Uygur; French_Basque, Dai) = -0.023917 (Z = -60.362), indicative of admixture. The ALDER analysis:


Test SUCCEEDS (z=6.85, p=7.4e-12) for Uygur with {French_Basque, Dai} weights

DATA: success 7.4e-12 Uygur French_Basque Dai 6.85 4.47 7.39 15% 20.56 +/- 3.00 0.00036760 +/- 0.00003660 22.59 +/- 5.06 0.00010920 +/- 0.00002025 19.46 +/- 2.64 0.00007864 +/- 0.00000710

DATA: test status p-value test pop ref A ref B 2-ref z-score 1-ref z-score A 1-ref z-score B max decay diff % 2-ref decay 2-ref amp_exp 1-ref decay A 1-ref amp_exp A 1-ref decay B 1-ref amp_exp B

suggests a very recent admixture on both the European and East Asian side. It seems fairly clear that whatever admixture was taking place in Central Asia, perhaps for thousands of years, the present-day Ugyur were formed, at least in part, by a fairly recent, perhaps post-Mongol admixture event.

October 27, 2012

Inter-relationships between 'world' components

In a previous post I calculated f3-statistics between my K=7 and K=12 ancestral components. The basic idea is to discover which component A can be seen as a mixture of two other components, B and C, in which case (assuming A does not have excessive drift), we expect a negative f3(A; B, C) statistic.

As part of my analysis of the world dataset, I calculated f3-statistics for each of the K=3 to K=12, that is, for some K, I tried to see if one of the K inferred components could be seen as a mixture of the remaining K-1. It turns out that no negative f3 statistics appeared at all, and this suggests that the components inferred by ADMIXTURE at each K tend to form an "orthogonal" set that are not mixtures of each other.

More generally, we can calculate f3 statistics where A, B, and C are components inferred from any of the K=3 to K=12 runs. There is a total of 75 such components, and hence 75*(74 choose 2) = 202,575 such f3 statistics. Since calculating these would take a while (and would become intractable as K increases further), I decided to calculate pairwise f3 statistics, i.e., statistics where A, B, and C are constrained to be from successive K, K+1 runs. The significant results can be seen in the spreadsheet.

It might be worthwhile to develop an automated way of using these statistics to guide us in the interpretation of ADMIXTURE components. But, they are useful, in any case, as a source of information.

For example, consider the following (the third column represents the mixed population):

Atlantic_Baltic_6/globe6_Z Near_East_6/globe6_Z European_5/globe5_Z -0.013911 0.000084 -166.457

This means that the European component at K=5 can be seen as a mix of the Atlantic_Baltic and Near_East components at K=6. So, this suggests that the European component can be seen as "secondary", the product of admixture. But:

European_5/globe5_Z Amerindian_5/globe5_Z Atlantic_Baltic_6/globe6_Z -0.003964 0.000175 -22.588

This indicates conversely that the Atlantic_Baltic at K=6 component can be seen as a mix of the European and Amerindian components at K=6.

It would be very interesting to use f-statistics to guide one in the choice of an "orthogonal" set of ancestral populations, or to summarize the relationships between them in tree or network form. One could potentially use my ADMIXTURE to TreeMix script to do something like this, although as K increases, there is a combinatorial explosion in the total number of components with a probable runtime slowdown/memory usage blowup which might render this approach unusable, at least for large K.

October 18, 2012

rolloff analysis of Lezgins as Sardinian+Burusho

I have carried out rolloff analysis of Lezgins, a Northeast Caucasian population that is of particular interest due to it being modal for the "Dagestan" component whose long-distance relationships with Western Europe and South Asia have triggered a great deal of followup investigation on my part.

The Lezgins are also interesting for other reasons: they may be one of the populations related to the Kura-Araxes culture; they possess a high frequency of Y-haplogroup R1b, so they may be related to the migration that brought this haplogroup into Europe from West Asia.

In my previous analysis of the French using the same reference populations, I speculated that their signal of admixture may involve admixture between a Sardinian-like and a West Asian population in Asia itself circa 7,000 years ago, followed by a later expansion into Europe. And, in my analysis of Lithuanians and Ukrainians, I discovered a somewhat less "old" signal of admixture involving South Asian+North European references with a mean value of 5.5-6.3ky for the various population pairs.

The exponential fit for the Lezgins can be seen below:

The admixture time estimate is 198.773 +/- 70.649 generations or 5,760 +/- 2,050 years. This is not very precise, but seems consistent with the two phenomena described above. It also seems to contrast with the much younger signal for Armenians.

ADMIXTURE tracks Amerindian-like admixture in northern Europe

I have recently assembled a new "world" dataset of 4,280 individuals that I am currently incrementally analyzing with ADMIXTURE. But, I noticed an interesting pattern at K=4 that I wanted to share right away.

4 ancestral populations emerge at this level of resolution, which I have named: European, Asian, African, Amerindian. The names aren't important, and you can replace them with whatever you prefer. 

The interesting thing about this K=4 analysis is that European populations show evidence of Amerindian admixture, consistent with the pattern inferred using f-statistics, where European populations show admixture between Sardinians and a Karitiana-like population.

This pattern may have emerged at previous ADMIXTURE analyses at this level of resolution, but thanks to the f3 evidence presented in previous posts, it is now clear that it is no quirk of ADMIXTURE, but indicative of a real (albeit still rather mysterious) pattern of gene flow that differentially affected European populations.

For example, the Irish_D population has 7.6% of the Amerindian component, and so do HGDP Orcadians. HGDP Sardinians have only 1.7% of it, which appears to be the minimum in Europe, with French_Basque having more at 4.6%.

Another interesting observation is that West Eurasian populations that show an excess of East Eurasian-like admixture appear to be doing so for two separate reasons. For example, HGDP Russians have 11.7% of Amerindian component, but also 4.5% of "Asian", and 1000 Genomes Finns have 3.3% Asian and 12% Amerindian. Behar et al. (2010) Turks, on the other hand, have 9.9% Asian and 2.2% Amerindian. All these populations are East Eurasian-shifted relative to Sardinians, a pattern which can also be observed by looking at the K=3 analysis, but for apparently different reasons.

The pattern for Near Eastern populations is also interesting. For example, Yunusbayev et al. (2011) Armenians have 0% of the Amerindian component, and 5.7% of the Asian, and all three HGDP Arab populations (Druze, Palestinian, Bedouin) also have 0% of the Amerindian component, with variable levels of the Asian.

It would appear that whatever process contributed Amerindian-like admixture in Europeans, minimally affected Near Eastern populations, with Sardinians being demonstrably related to Neolithic Europeans (thanks to ancient DNA evidence), tilting towards the Near Eastern pattern. On the other hand, Near Eastern populations show evidence of Asian admixture, which probably involves unresolved East Asian/ASI ancestry, and will be resolved at higher K. Sardinians appear to be at the end of three clines: (i) Amerindian-like cline of Europe-Siberia-Americas, (ii) East Asian-like cline of Europe-Central Asia/Siberia-East Asia, (iii) ASI-like cline of Europe-Near East-South Asia. These are separate, but not independent phenomena.

To confirm that the signal picked up by ADMIXTURE tracks the signal picked up by ADMIXTOOLS formal tests, I calculated the following D-statistic:

D(Sardinian, European, Karitiana, San)

where European is any population with a sample size of at least 10, and which belonged at 99% in the European+Amerindian components:


And, here is a scatterplot:
The correlation is clear, and the Pearson coefficient is -0.96. This means that populations with higher % Amerindian, as estimated by ADMIXTURE, also show higher D-statistic evidence for admixture.

What of the actual estimates of admixture produced by ADMIXTURE? Using the F4 ratio test, I recently showed that African admixture in Sardinians confounds estimates of Amerindian-like admixture in northern Europeans and vice versa (Amerindian-like admixture in northern Europeans confounds African admixture in Sardinians).

In that experiment, I "scrubbed" Sardinians to remove segments of African ancestry, and showed that estimates of Amerindian-like admixture in the CEU population diminished from 13.9% to 8.8%. The latter seems reasonably close to the 7.1% inferred by ADMIXTURE.

On balance, I would say that ADMIXTURE at K=4 provides a good proxy for the effect described in Patterson et al. (2012). Its results are more difficult to interpret, because its underlying model does not take into account evolutionary relationships between populations. On the other hand, it has the advantage of being able to handle multiple ancestral populations, and has consistently proven able to generate useful data that correlate well with those from other techniques of population genetics.

October 17, 2012

The tangled web of humanity

Indian populations are composed of two ancestral components: Ancestral North Indians (ANI) and Ancestral South Indians (ASI), discovered by Reich et al. (2009). In that paper, it was also shown that ASI forms a clade with East Eurasians, while ANI does so with West Eurasians.

Patterson et al. (2012) published a different pattern: non-Sardinian Europeans have North Eurasian-like ancestry that links them to Amerindian populations. It is thus possible that ASI and the East Eurasian-like admixture in North Europeans may share a common evolutionary history:


Now, consider a hypothetical population of the Indian Cline. A European population is related to it both via its ANI/West Eurasian ancestry, but also via its ASI ancestry, because the East_Eurasian component in Europeans shares a portion of ancestry (indicated by the red arrow) with ASI.

Sardinians lack (or have less of) this "red arrow" portion of ancestry. 

It is also possible that ANI itself may have some East_Eurasian ancestry, like Europeans do; this is not indicated in the figure. More on this later.

Consider the following D-statistic:

D(European, Sardinian, Indian, San)

As we shall see, this takes positive values, consistent with the idea of gene flow between Europeans and Indians at the exclusion of Sardinians. However, this gene flow may involve either the West Eurasian component in the ancestry of Indians (i.e., this component is more related to Europeans than to Sardinians), or to the ASI component (which is related to Europeans via the common "red arrow" portions of ancestry).

We can figure out what is going on by trying different Indian populations along the Indian Cline, and seeing whether the D-statistic is inflated/deflated in populations of greater ANI/ASI ancestry.

Here are the results:


                Russian Orcadian French Lithuanians   ANI
Mala             0.0153   0.0120 0.0088      0.0131 38.86
Madiga           0.0153   0.0122 0.0091      0.0111 40.66
Chenchu          0.0157   0.0108 0.0088      0.0115 40.76
Bhil             0.0149   0.0115 0.0086      0.0124 42.96
Satnami          0.0166   0.0125 0.0091      0.0126 43.06
Kurumba          0.0156   0.0117 0.0095      0.0121 43.26
Kamsali          0.0139   0.0105 0.0088      0.0098 44.56
Vysya            0.0130   0.0099 0.0083      0.0102 46.26
Lodi             0.0143   0.0124 0.0092      0.0125 49.96
Naidu            0.0138   0.0104 0.0092      0.0108 50.16
Tharu            0.0150   0.0112 0.0095      0.0118 51.06
Velama           0.0126   0.0107 0.0083      0.0095 54.76
Srivastava       0.0144   0.0124 0.0091      0.0116 56.46
Meghawal         0.0131   0.0107 0.0088      0.0117 60.36
Vaish            0.0143   0.0144 0.0099      0.0128 62.66
Kashmiri_Pandit  0.0119   0.0116 0.0090      0.0116 70.66
Sindhi           0.0106   0.0112 0.0095      0.0111 73.76
Pathan           0.0098   0.0114 0.0087      0.0106 76.96

For each Indian Cline population, I list the ANI percentage, as estimated by Reich et al. (2009) in the last column, and the D-statistic of the above given form for different pairs of Indian and European populations.

We can plot the D-statistic vs. ANI for each of our European populations:




The correlation coefficients confirm the visual impression, that for the HGDP Russians there is a significantly negative relationship between ANI admixture in an Indian Cline population and the D-statistic:

Russian   Orcadian    French Lithuanians
-0.8631118 0.08670188 0.1870127  -0.1889908

In other words, the evidence for gene flow between Russians and Indians is maximized when south Indian (ASI-rich) populations are used.

The lack of a clear pattern in the other three populations is itself interesting. One possible explanation involves East Eurasian-like admixture in the ANI, a conjecture which would make sense, given that all non-Sardinian continental West Eurasians seem to possess it.

If that is true, then as we go "south" along the Indian Cline, ASI related admixture inflates the D-statistic by increasing the "red arrow" overlap with the East Eurasian-like admixture in Europeans. As we go "north" along this cline, then the D-statistic decreases, due to ASI-reduction, but also increases, due to East Eurasian-like admixture in ANI, with an end result of no clear pattern in the superposition of processes.

In any case, this is an interesting example of a crisscrossing type of admixture where unrelated processes (east Eurasian-like admixture in Russians and ASI admixture in Indians) combine to present an unusual effect.

October 14, 2012

Differential relationship of ANI to Caucasus populations

The observation in Reich et al. (2009) that Ancestral North Indians (ANI) and CEU (HapMap White Utahns) form a clade to the exclusion of Adygei (a NW Caucasian HGDP population) has always puzzled me, because in my ADMIXTURE experiments, the dominant West Eurasian component in South Asia has always been one centered in the Caucasus rather than Europe, an observation also confirmed by Metspalu et al. (2011).

I have now used the qpDstat program of ADMIXTOOLS to calculate some D-statistics using a wide variety of West Asian populations that have appeared in the literature since 2009 (mainly Behar et al. 2010, and Yunusbayev et al. 2011), in addition to the Adygei. This analysis is based on 87,925 SNPs. I have kept SNPs included in the Rutgers map for Illumina chips, since most of the datasets merged with the Reich et al. (2009) dataset were genotyped on such chips, and applied a --geno 0.01 flag after merging the various datasets.

The following populations were considered:
North_Kannadi, Sindhi, Pathan, Kashmiri_Pandit, Brahmins_from_Uttar_Pradesh_M, Iyer_D, Iyengar_D, CEU30, Onge, Adygei, Lezgins, Georgians, Ukranians_Y, Abhkasians_Y, Chechens_Y, North_Ossetians_Y, Armenians_Y, Kurds_Y, Iranians_19, Romanians_14, Bulgarians_Y, Greek_D
I calculated D-statistics of the form:

D(CEU30, non-CEU West Eurasian; South Asian, Onge)

I report, for each South Asian population, the score for non-CEU West Eurasian being Adygei, and the most negative Z-score:


It is clear, that while CEU are more related to Indian cline populations than Adygei are, at least for the case of the Pathans, they are less related to them than Georgians are. The Georgian population is one of the modal populations of the West Asian autosomal component.

The full set of results can be found here. It appears that North Ossetians (who are also from the NW Caucasus) follow the Adygei pattern, while Abkhazians, Lezgins, and Armenians appear more related to ANI than CEU are, similar to the Georgian pattern.

Interestingly, D(CEU, Iranian; South Asian, Onge) appear positive, and this is probably not because CEU are more related to ANI than Iranians, but because Iranians also have ASI admixture.

Ukrainians do not appear more closely related to ANI than CEU are, rather the opposite. This is consistent with the recent f3-statistics analysis of South Indian Brahmins, in which the strongest signals of admixture involved populations from Western Europe, the Balkans, and West Asia, but not from eastern Europe.

All the available evidence suggests that ANI is most related to populations of the South and NE Caucasus, and not to those of the NW Caucasus like Adygei. To confirm this, I calculated some additional D-statistics (also included in the spreadsheet):


All in all, this seems to be very consistent with my working model of Eurasian prehistory. It is also in agreement with proposals for a genetic relationship between Indo-European and NE Caucasian/Hurrian and/or early contacts between it and Kartvelian. No such relationship, as far as I can tell, has been seriously advanced with respect to NW Caucasian languages.

A valuable lesson from this analysis is that now that multiple West Asian populations have been genotyped, caution must be exercised when using the HGDP Adygei, because they are clearly not representative of the different language families (NE/S Caucasian and Indo-European) present in West Asia. Surprises may lurk even at the sub-1000km scale in a region as diverse as the Caucasus.

October 13, 2012

An estimate of the admixture time for Finns

Using a similar procedure as in my recent post on the Baltic (Update II), I used 15 FIN individuals from the 1000 Genomes together with 12 Nganasans from Rasmussen et al. (2010) as reference populations, and 15 other FIN individuals to estimate admixture LD in a rolloff analysis. Three outlier Nganasan individuals (GSM558800, GSM558802, GSM558807) were removed.
The estimated time of admixture is 86.095 +/- 10.187 generations, or 2500 +/- 300 years. It corresponds rather well to the beginning of the Iron Age in northern Europe.

As I mention in my previous post, there is evidence for intrusive cultures (Battle Axe and Seima Turbino) converging on the area from different directions during the preceding Bronze Age. If the above date is accurate, it will suggest a rather late admixture event between the Europeoid and Siberian elements of Finns. The former may have included both the descendants of Mesolithic European hunter-gatherers and intruders from Central Europe (Corded Ware/Battle Axe); the latter may have included both Comb Ceramic and the descendants of the Seima Turbino metallurgists.

October 10, 2012

The Indo-European invasion of the Baltic

In some recent posts, I showed that South Asian populations (North Indian BrahminsSouth Indian Brahmins) can be seen as mixtures of West Eurasian and South Indian populations, but also that West Eurasians (BulgariansGreeksArmenians, and French) can be seen as mixtures of South Asian and Sardinian populations.

This may seem strange, but can be explained if we understand how f3-statistics and rolloff actually work. These methods do not require pure or unadmixed ancestral populations, but exploit allele frequency differences in the reference populations together with either (i) allele frequencies in the mixed population, in the case of f3-statistics, or (ii) admixture linkage disequilibrium in the mixed population, in the case of rolloff.

If a and b are allele frequencies in two ancestral populations A and B that mix, then:

  • The frequency of a will shift towards b if A experiences gene flow from B
  • The frequency of a will randomly shift if A experiences gene flow from an "outgroup" population
  • The frequency of a will shift towards b if A experiences gene flow from a third population that is geographically and genetically intermediate between A and B

An application to the Europe-South Asia cline

I took the following set of populations, and calculated all 1,365 possible f3-statistics:
"FIN30"         "Lithuanians"   "Russian"       "Pathan"        "Balochi"       "North_Kannadi" "Polish_D"      "Russian_D"     "Mixed_Slav_D"  "Bulgarian_D"   "Serb_D"        "Ukrainian_D"   "Belorussian"   "Bulgarians_Y"  "Ukranians_Y"
In the following table, I report the lowest Z-scores for each target population (third column). So, for example, Polish_D can be seen as a mixture of Lithuanians and Balochi. Only negative scores are indicative of admixture. I highlight in bold the significant negative scores (Z less than -3)


Lithuanians North_Kannadi FIN30 0.001606 0.000259 6.193 280043
Ukrainian_D Belorussian Lithuanians 0.00078 0.000299 2.614 268493
Lithuanians North_Kannadi Russian -0.002738 0.000248 -11.045 279965
North_Kannadi Polish_D Pathan -0.006959 0.000229 -30.344 280220
North_Kannadi Bulgarians_Y Balochi -0.003636 0.000246 -14.781 281604
Pathan Ukrainian_D North_Kannadi 0.033802 0.000623 54.237 271858
Lithuanians Balochi Polish_D -0.001171 0.000178 -6.581 279519
Lithuanians Pathan Russian_D -0.001829 0.000166 -11.026 280658
Lithuanians Pathan Mixed_Slav_D -0.001715 2e-04 -8.594 277635
Lithuanians Balochi Bulgarian_D -0.001247 0.000313 -3.979 272342
Lithuanians Balochi Serb_D -0.00091 0.000377 -2.416 270807
Lithuanians Balochi Ukrainian_D -0.002222 0.000358 -6.211 270399
Lithuanians Balochi Belorussian -0.000897 0.00027 -3.325 273076
Balochi Polish_D Bulgarians_Y -0.001198 0.000185 -6.481 279632
Lithuanians Balochi Ukranians_Y -0.001727 0.000187 -9.236 278677

It is clear, that what I have described holds here: European populations appear like mixtures of Lithuanians and South Asians; conversely, South Asian populations appear like mixtures of Europeans and North Kannadi.

This does not mean that the populations that appear unadmixed (FIN30, Lithuanians, North_Kannadi, and Serbs) are in fact so, for at least two reasons:
  1. The f3 statistic confirms, but does not reject the presence of admixture; in particular, it fails to find real admixture in highly drifted populations
  2. The f3 statistics exploits allele frequency correlations between populations: but the North Kannadi and Lithuanians/Finns occupy opposite ends of the studied cline, so their lack of signal of admixture may be due to the non-existence of populations that are even more unadmixed than themselves.
In the case of South Indians, we are completely sure that this is the case. Reich et al. (2009) managed to show this not because there are any unadmixed Ancestral South Indians (ASI) left, but because they exploited the existence of the Onge, an isolated group from the Andaman Islands that was a sister group to the ASI. So, we can be fairly sure that southern Indians themselves have West Eurasian-like admixture, even the ones that are at the end of the West Eurasia-South India cline on its southern end.

The problem is: there is no isolated group of unadmixed Europeans left in existence that might serve a similar proxy function as the Onge did for South Asians.

Enter Pickrell et al. (2012) to the rescue. In that paper, the authors studied admixture in the Khoe-San of South Africa. Now, many of the Khoe-San sub-groups appeared to be admixed, but the "Juj'hoan North" population appeared to be at the "end of the cline": it's impossible to detect admixture in them using alelle frequency differences, because, quite simply, there are no populations that are less unadmixed than them: they're as pure descendants of "Ancestral Bushman" as exist on the earth today.

But, the clever thing is, that we don't have to detect admixture only using allele frequency differences, but also using admixture LD, i.e., by exploiting the correlation between linkage disequilibrium (the co-inheritance of physically separated markers on a chromosome) and allele frequency differences between populations. Pickrell el al. were able to do this not by conjuring up a more unadmixed population than the "Juj'hoan North" one available to them, but by splitting up that population, and using one half to find allele frequency differences, and the other half to detect admixture LD.

Admixture LD signal in Lithuanians

Using the aforementioned idea, I set out to see whether Lithuanians, who occupy the European end of the Europe-South Asia cline present such a signal of admixture LD. I used the Lithuanian_D sample from the Dodecad Project and the Balochi HGDP sample as reference populations (to calculate allele frequency differences), and the Behar et al. (2010) Lithuanians for admixture LD. There were only ~300k SNPs usuable in this set, but sufficient to detect the signal of admixture LD:
The admixture time estimate is 200.350 +/- 61.608 generations, or 5,810 +/- 1790 years. This is not very precise, probably because of the small number of SNPs and individuals used, but it certainly points to the Neolithic-to-Bronze Age for the occurrence of this admixture. The date is certainly reminiscent of the expansion of the Kurgan culture out of eastern Europe, or, the later Corded Ware culture of northern Europe.

So, it may well appear that at least some of the people participating in these groups of cultures, were indeed influenced by the Indo-Europeans as they expanded from their West Asian homeland. These intruders mixed with eastern Europeans who vacillated during the late Neolithic between a northern Europeoid pole akin to Mesolithic hunter gatherers from Gotland and Iberia, and a widely dispersed Sardinian-like population that is in evidence at least in the Sweden-Italian Alps-Bulgaria triangle. The gradual appearance of non-mtDNA U related lineages in Siberia and Ukraine is most likely related to this phenomenon.

It would seem that the Proto-Indo-Europeans mixed with different substrata in the four directions of their expansion: Sardinian-like people in southern Europe, Lithuanian-like people in northern Europe, South Indian-like people in South Asia, and East Eurasians in Siberia and east central Asia. Extant groups are descendants of divergent Neolithic population groups, brought closer together (genetically) because of variable admixture with the PIE population and its early offshoots.

Conclusion

There are mutual signals of admixture across a Europe-South Asia cline: Europeans appear to be mixed with South Asians, and South Asians appear to be mixed with Europeans. The simplest explanation for this pattern involves expansion of a third, geographically and genetically intermediate population that affected both Europe and South Asia. We can use the signal of admixture LD to prove that this expansion affected some of the most unadmixed populations in Europe (e.g., Lithuanians), just as it did the most unadmixed populations of India (e.g., Dravidians).

It will be interesting to use these techniques to study signals of admixture in other "end of the line" populations such as Sardinians, South Indians, etc.

UPDATE I (rolloff analysis of Poles):

I have carried out rolloff analysis of my 25-strong Polish_D sample using Lithuanians and Pathans as references:
The signal is fairly distinct, and corresponds to 149.296 +/- 38.783 generations or 4330 +/- 1120 years. I am guessing that either the different reference population (Pathans vs. Balochi), or, more likely the increased number of target individuals (25 vs. 10) have contributed to the narrowing down of the uncertainty. It will be interesting to explore this signal further with more population pairs.

UPDATE II (rolloff analysis of Finns):

I have also used the 1000 Genomes Finnish sample (FIN) in a similar manner as Lithuanians, using 15 individuals to estimate allele frequency differences, and 15 ones for admixture LD, and using the Pathans as a South Asian reference population. There is a clear signal of admixture:
This dates to 104.967 +/- 14.797 generations, or 3,040 +/- 430 years. Finland came under the influence of both Europeans (and likely Indo-Europeans) during the Bronze Age period (a mixture of Battle Axe with local Comb Ceramic seems to have occurred), as well as likely non-European (and likely Uralic) intrusions during the same time frame, as part of the Seima-Turbino phenomenon. It will be interesting to repeat this analysis with an East Eurasian reference population to isolate potential signals of admixture dating to either the Comb Ceramic or Seima-Turbino episodes of migration.

(Note; added Oct 14): I carried out rolloff analysis using Nganassans as suggested in the above paragraph here.

UPDATE III (rolloff analysis of Ukrainians):

I have used the Yunusbayev et al. sample of Ukrainians, and estimated its admixture time using Lithuanians and Balochi as reference populations:
The admixture time estimate is 191.078 +/- 35.079 generations, or 5,540 +/- 1,020 years. It seems very similar to that in Lithuanians, with a smaller standard error, perhaps on account of either the larger number of SNPs or larger number of individuals.

It is tempting to associate this admixture signal with the Maikop culture which appeared at around this time. Assuming that North_European/West_Asian (or Lithuanian-like and Balochi-like) gene pools existed north and south of the Pontic-Caspian-Caucasus set of geographical barriers, then the Maikop culture which shows links to both the early Transcaucasian culture and those of Eastern Europe would have been an ideal candidate region for the admixture picked up by rolloff to have taken place. There are, of course, other possibilities.

UPDATE IV (rolloff analysis of Lithuanians with Pathan reference):

I repeated the first analysis of this post, but this time, I used Pathans, rather than Balochi as a reference population:
The admixture time estimate of 217.501 +/- 51.170 generations, or 6,310 +/- 1,480 years appears to be similar with the original estimate of 5,810 +/- 1790 years, so it does not appear that the use of Balochi or Pathan as a reference population much affects this result.

October 08, 2012

rolloff analysis of North Indian Brahmins as Orcadian+North Kannadi

In a previous experiment, I tested the Dodecad Project South Indian Brahmin sample (Iyer and Iyengar) using Orcadians and North Kannadi as reference populations. In the current one, I use the same references to investigate admixture in the Uttar Pradesh Brahmins included in the Metspalu et al. (2011) dataset. A total of 473,837 SNPs are used in this experiment.

I first verified that f3(Brahmins_from_Uttar_Pradesh_M; Orcadian, North_Kannadi) is negative, using qp3Pop:
 Source 1 Source 2 Target f_3 std. err Z SNPs
 result: Orcadian North_Kannadi Brahmins_from_Uttar_Pradesh_M -0.007882 0.000359 -21.951 463297

The exponential fit can be seen below:
The estimated age is 79.706 +/- 9.197 generations, or 2,310 +/- 270 years.

This is about a thousand years younger than the signal observed for the South Indian Brahmin group. A possible explanation has to do with the fact that South Indian Brahmins migrated to South India, and hence did not intermarry with successive waves of invaders into India in historical times. Uttar Pradesh, on the other hand, received multiple invasions from the direction of Central Asia:
Most of the invaders of North India passed through the Gangetic plains of what is today Uttar Pradesh. Control over this region was of vital importance to the power and stability of all of India's major empires, including the Maurya (320–200 BC), Kushan (100–250 CE), Gupta (350–600 CE), and Gurjara-Pratihara (650–1036 CE) empires.[11] Following the Huns invasions that broke the Gupta empire, the Ganges-Yamuna Doab saw the rise of Kannauj.[12] During the reign of Harshavardhana (590–647 CE), the Kannauj empire reached its zenith.[12]
It will be interesting to see whether a young admixture signal also exists in my 5-strong sample of Jatts, since that population has traditions of "Scythian" origins.

October 07, 2012

rolloff analysis of Bulgarians as Sardinian+Pathan

Continuing my rolloff experiments, I have taken the Yunusbayev et al. sample of Bulgarians. This is interesting because of the recent evidence of a Sardinian-like individual from Iron Age Bulgaria, and also as a complement to a similar analysis on the Greeks. Bulgarians are Slavic speaking, but their ethnogenesis owes a great deal to the Bulgars, adding another potential element of complication. However, the paucity of East Eurasian admixture in Bulgarians, together with their Slavic language, probably suggests that this element represented a small elite that did not have a substantial role in the genetic formation of the Bulgarian population.

The top f3 statistics can be seen below:

Kshatriya_M Sardinian Bulgarians_Y -0.003813 0.000295 -12.918 237507
Velamas_M Sardinian Bulgarians_Y -0.003783 0.000285 -13.287 238276
Piramalai_Kallars_M Sardinian Bulgarians_Y -0.003693 0.000306 -12.061 238106
Kanjars_M Sardinian Bulgarians_Y -0.003643 0.000298 -12.227 237838
GIH30 Sardinian Bulgarians_Y -0.003638 0.000259 -14.028 240548
North_Kannadi Sardinian Bulgarians_Y -0.00355 0.000317 -11.187 237882
Muslim_M Sardinian Bulgarians_Y -0.003542 0.000333 -10.632 236964
Chamar_M Sardinian Bulgarians_Y -0.003505 0.000303 -11.585 238882
INS30 Sardinian Bulgarians_Y -0.003467 0.000264 -13.153 240279
Dharkars_M Sardinian Bulgarians_Y -0.003452 0.000309 -11.155 238211
Brahmins_from_Uttar_Pradesh_M Sardinian Bulgarians_Y -0.003448 0.000278 -12.42 238041
Indian_D Sardinian Bulgarians_Y -0.003411 0.000256 -13.308 241225
Iyer_D Sardinian Bulgarians_Y -0.003364 0.000291 -11.568 237509
Jatt_D Sardinian Bulgarians_Y -0.003327 0.000289 -11.513 236735
Pathan Sardinian Bulgarians_Y -0.003212 0.000239 -13.444 240969
Iyengar_D Sardinian Bulgarians_Y -0.003209 0.000308 -10.416 236840
Dusadh_M Sardinian Bulgarians_Y -0.003181 0.000313 -10.172 237512
Sindhi Sardinian Bulgarians_Y -0.003094 0.000239 -12.919 241268
Balochi Sardinian Bulgarians_Y -0.002804 0.00024 -11.686 240924


To maximize the number of SNPs and number of individuals, I used the Sardinian+Pathan pair as reference populations. 509,395 SNPs were used for this experiment. The exponential fit can be seen below:
There was a technical issue with the jackknife which I am currently investigating, but the mean time of the admixture was estimated at 126.83004 generations, or 3,680 years. This is similar to the value of 3,850 years I obtained on the Greek sample.

If this date is accepted, then the interesting issue is why an individual from Bulgaria was Sardinian-like during the Iron Age. Possibly, either this individual was Sardinian-like in the broad sense, despite having  minority West Asian admixture, or a few centuries after the admixture event, there was still an uneven distribution of the constituent elements, with most individuals still predominantly Sardinian-like. Given that the indigenous element was probably most numerous, so only part of it would have the opportunity to admix with the intrusive West Asian-like population, and this influence would spread to the population-at-large over time.

In any case, this evidence, such as it is, appears consistent with my idea about a Bronze Age invasion of Europe from Asia.

Naturally, only a broad sampling of ancient DNA variation from the Balkans, perhaps targeting different sites, cultures, times, social status, and physical types will be sufficient to track the early appearance of an intrusive population.

October 05, 2012

D-statistics reveal contrast between Yoruba and San in "Neandertal ancestry"

I have been exploring the HGDP version released by Patterson et al. (2012) in order to see whether patterns  of "archaic Eurasian" admixture could be detected in living Africans. In a previous experiment, I looked into a surprising link between Denisovans and Africans. Now, I want to investigate possible differences in Neandertal ancestry within Africa itself. An ASHG 2012 abstract suggests that both Neandertal and Denisovan ancestry may be relevant to the African story.

Previous research has concluded that living African groups do not appear to have substantial differences in their apportionment of archaic Eurasian ancestry. This has led to the reasonable idea that the signal of Neandertal admixture in non-Africans was driven by the encounter of Out-of-Africans with a Neandertal population in Asia, perhaps in the Near East, during their early steps outside Africa, involving a single or limited episodes of admixture, although more complex models may be needed as of late.

I have long suspected that part of this signal is due to population structure in Africa itself, and the possibility of archaic admixture in that continent, a hypothesis that is feasible a priori due to the geographical and ecological diversity of Africa and its large surface area, and which has also found support on the basis of recent palaeoanthropological and genetic research. In my opinion, the well-known abundance of polymorphism in Africans vis a vis non-Africans is not only due to the Out-of-Africa bottleneck, but may also be due to an addition of polymorphism via admixture with divergent native African hominin groups.

Advancing a good case for this admixture is rendered difficult by two factors:

  1. The inability of methods relying on linkage disequilibrium to operate on old admixture events, due to the exponential decay of LD over time, which renders archaic-introgressed segments pitifully small at long time scales.
  2. The high temperatures prevalent in sub-Saharan Africa which render DNA preservation problematic, although, to be honest, I have not even seen many attempts to test this hypothesis on whatever prehistoric skeletal remains there do exist from the region.
Why do African groups appear so little different in terms of possible "Neandertal admixture"? I conjecture that the answer lies in the idea that archaic African admixture will tend to even out the signal of Neandertal admixture. To use a geographical analogy, there is little distance difference (in relative terms) between Tokyo and Beijing from the vantage point of New York, but quite a lot from the vantage point or Hong Kong. Tests of archaic admixture rely on relative allele sharing between individuals or populations; consequently, the signal may be muddied by the occurrence of archaic admixture in Africans which -to use our geographical analogy- transposes them from Hong Kong to New York.

Now, consider the Z scores of the D-statistic of the form D(African1, African2, Neander, Outgroup) calculated using different panels and Outgroup being Chimp, Gorilla, or Orang. The raw numbers can be found in this spreadsheet.

Look at the Pearson correlations between the different panels:


While the Z-scores in most of the panels are strongly correlated with each other, the San panel #4 is strongly anti-correlated. An inspection of the raw numbers show why this is the case. For example:


Surprisingly, the San appear more Neandertal-admixed than the Yoruba using all Eurasian and the Yoruba ascertainment, and less so, using the San ascertainment!

A possible explanation for this pattern involves Eurasian back-migration into Africa combined with differential archaic African admixture.

The San may possess Eurasian ancestry consistent with the positive D(San, Yoruba, Neander, Chimp) statistics for all panels except their own; the negative statistics for their own panel is due to their archaic African ancestry which makes them less like Neandertals.

I conjecture that different archaic populations have contributed polymorphism to different African populations.

This question can be addressed empirically on the basis of whole genome sequence data. The Out-of-Africa bottleneck hypothesis suggests that reduced polymorphism in non-Africans is due to loss of variation as a limited number of founders exited Africa, carrying a subset of African variation. If Africans are descended primarily from the modern human groups left behind, then they will all carry the same "missing variation" set not found in Eurasians.

On the other hand, if, as I suggest, modern human groups encountered and admixed with different divergent African hominins, then different African populations will carry substantially disjoint sets of variants, reflecting deep population structure within Africa itself. Time will tell whether this prediction will prove to be true.