Showing posts with label Ethiopia. Show all posts
Showing posts with label Ethiopia. Show all posts

May 30, 2015

Out of Egypt or Out of Ethiopia?

I am skeptical that once you remove non-African ancestry from Egyptians (even if you were able to do so perfectly), what you are left with is indigenous Northeastern Africans, the direct descendants of people who left Africa tens of thousands of years ago.

For one thing, Egypt has not only experienced gene flow from Europe and the Middle East, but also from elsewhere in Africa, more recently because of enslaved black Africans.

For another, even if you perfectly identified and removed both Eurasian and African non-native influences on the Egyptian population, you're left with some kind of indigenous northeastern African. But, did such a population with long-term continuity exist in Egypt since Out-of-Africa? The Eurasian experience (where ancient DNA falsifies continuity left and right even in a 1/10th of the OOA time scale) makes me doubt this. The Nile may have facilitated gene flow in a north-south direction, and the relatively recent emergence of the Sahara desert may very well have pumped populations into Egypt.


AJHG DOI: http://dx.doi.org/10.1016/j.ajhg.2015.04.019

Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians

Luca Pagani et al.

The predominantly African origin of all modern human populations is well established, but the route taken out of Africa is still unclear. Two alternative routes, via Egypt and Sinai or across the Bab el Mandeb strait into Arabia, have traditionally been proposed as feasible gateways in light of geographic, paleoclimatic, archaeological, and genetic evidence. Distinguishing among these alternatives has been difficult. We generated 225 whole-genome sequences (225 at 8× depth, of which 8 were increased to 30×; Illumina HiSeq 2000) from six modern Northeast African populations (100 Egyptians and five Ethiopian populations each represented by 25 individuals). West Eurasian components were masked out, and the remaining African haplotypes were compared with a panel of sub-Saharan African and non-African genomes. We showed that masked Northeast African haplotypes overall were more similar to non-African haplotypes and more frequently present outside Africa than were any sets of haplotypes derived from a West African population. Furthermore, the masked Egyptian haplotypes showed these properties more markedly than the masked Ethiopian haplotypes, pointing to Egypt as the more likely gateway in the exodus to the rest of the world. Using five Ethiopian and three Egyptian high-coverage masked genomes and the multiple sequentially Markovian coalescent (MSMC) approach, we estimated the genetic split times of Egyptians and Ethiopians from non-African populations at 55,000 and 65,000 years ago, respectively, whereas that of West Africans was estimated to be 75,000 years ago. Both the haplotype and MSMC analyses thus suggest a predominant northern route out of Africa via Egypt.

Link

June 13, 2014

An older layer of Eurasian admixture in Eastern Africa

The authors propose that a genetic component found in Horn of Africa populations back-migrated to Africa from Eurasia ~23 thousand years ago. I did not read this carefully yet, but it seems reasonably plausible. The migration uncovered by Pagani et al. and Pickrell et al. (~3kya) is probably not the whole story of Eurasian back-migration into Africa; that episode probably involved only Semitic speakers (it's hard to imagine any other language being carried from the Middle East at that time frame). But the Eurasian affiliation of East Africans extends well beyond Semitic speakers. In North Africa this is even clearer as the native (pre-Arab) population is definitely broadly West Eurasian and this must have come about by back-migration.

The back-migration of M1 and U6 into Africa seems to be at a similar time depth as the one proposed by these authors (post-initial UP, but pre-Neolithic). I have proposed that haplogroup E may represent an even earlier layer of Eurasian back-migration, while the signals identified by Pagani et al. and Pickrell et al. in East Africa, Pickrell et al. for Southern Africa, and the limited (but present) Neandertal ancestry in Yoruba and Pygmies documented by Pruefer et al. represent later events. The arrival of Arabs and Europeans are later events still.

For a time, there was a taboo against imagining back-migration into Africa; in a sense this was reasonable on parsimony grounds: Africans have most autosomal genetic diversity and the basal clades of mtDNA and Y-chromosomes; a model with Out-of-Africa is simpler than one with both Out-of and Into-Africa. However, we now know that pretty much all Africans have Eurasian ancestry, ranging from at least traces in theYoruba and Pygmies (to account for the Neandertal admixture) to intermediate values in East Africans, to quite  a lot in North Africans.

Eurasian admixture in Africa seems to be general, variable, and to have occurred at different time scales. It's still the best hypothesis that modern humans originated in Africa initially and migrated into Eurasia. However, it is no longer clear that Africa was always the pump and never the destination of human migrations.

PLoS Genet 10(6): e1004393. doi:10.1371/journal.pgen.1004393

Early Back-to-Africa Migration into the Horn of Africa

Jason A. Hodgson

Genetic studies have identified substantial non-African admixture in the Horn of Africa (HOA). In the most recent genomic studies, this non-African ancestry has been attributed to admixture with Middle Eastern populations during the last few thousand years. However, mitochondrial and Y chromosome data are suggestive of earlier episodes of admixture. To investigate this further, we generated new genome-wide SNP data for a Yemeni population sample and merged these new data with published genome-wide genetic data from the HOA and a broad selection of surrounding populations. We used multidimensional scaling and ADMIXTURE methods in an exploratory data analysis to develop hypotheses on admixture and population structure in HOA populations. These analyses suggested that there might be distinct, differentiated African and non-African ancestries in the HOA. After partitioning the SNP data into African and non-African origin chromosome segments, we found support for a distinct African (Ethiopic) ancestry and a distinct non-African (Ethio-Somali) ancestry in HOA populations. The African Ethiopic ancestry is tightly restricted to HOA populations and likely represents an autochthonous HOA population. The non-African ancestry in the HOA, which is primarily attributed to a novel Ethio-Somali inferred ancestry component, is significantly differentiated from all neighboring non-African ancestries in North Africa, the Levant, and Arabia. The Ethio-Somali ancestry is found in all admixed HOA ethnic groups, shows little inter-individual variance within these ethnic groups, is estimated to have diverged from all other non-African ancestries by at least 23 ka, and does not carry the unique Arabian lactase persistence allele that arose about 4 ka. Taking into account published mitochondrial, Y chromosome, paleoclimate, and archaeological data, we find that the time of the Ethio-Somali back-to-Africa migration is most likely pre-agricultural.

Link

February 03, 2014

West Eurasian ancestry in eastern and southern Africa (Pickrell et al. 2014)

I had mentioned this when it was in preprint form and now it has appeared in PNAS. The great advantage of preprints (and why I'm all for them) is that they allow us to look at research much earlier (about half a year in this case) and thus help accelerate the pace of information dissemination. One disadvantage is that it is sometimes hard to keep track of how papers change between the preprint stage (and there may be multiple versions) and the final published stage; perhaps we need a diff for scientific papers.

PNAS doi: 10.1073/pnas.1313787111

Ancient west Eurasian ancestry in southern and eastern Africa

Joseph K. Pickrell et al.

The history of southern Africa involved interactions between indigenous hunter–gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter–gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger–Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to ∼900–1,800 y ago and show that it had the largest demographic impact in Khoisan populations that speak Khoe–Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to ∼2,700–3,300 y ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.

Link

July 31, 2013

West Eurasian admixture in Khoe-San via East Africa

A new paper on the arXiv quantifies and dates the West Eurasian admixture in east Africa, and uncovers the presence of such admixture even in the Khoe-San of southern Africa. It appears that the admixture first occurred in East Africa about ~3ky ago, and reached southern Africa about ~1.5ky ago.


It is quite remarkable that different waves of migration converged into southern Africa from different directions: west African farmers and west Eurasian-admixed east African pastoralists. We should count ourselves lucky that the Khoe-San were discovered when they did: a few centuries more, and they too might have followed the fate of other populations finding themselves at the losing side of a technology differential, their culture lost, and their DNA preserved only as fragments in the gene pools of the more successful groups.



arXiv:1307.8014 [q-bio.PE]

Ancient west Eurasian ancestry in southern and eastern Africa

Joseph K. Pickrell et al.

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 - 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.

Link

February 03, 2013

"In Africa" project

The new 5-year "In Africa" project headed by Marta Mirazon Lahr has a wonderful website filled with information. From the Aims section:


"The project hopes to achieve five main goals:

  1. to increase significantly the number of human and other mammalian fossils in East Africa dating to the last 250,000 years;
  2. to map changes in human morphology, behaviour and occupation in different basins of East Africa in the period before and after the main modern human dispersals across and out of Africa;
  3. to map the character and timing of the Middle to Later Stone Age transition in the Central Rift Valley;
  4. to integrate the human prehistoric record with local palaeoenvironmental data to explore the role climate change and its expression in the African tropics may have played in our recent evolutionary history;
  5. to increase the scientific and public awareness of how important it is to understand what happened in Africa in order to understand why Homo sapiens and its diversity evolved."

An example of the information that can be found in this site is this list of Middle Pleistocene Sub-Saharan African fossils (pdf). Please note that some of the given dates (such as that of Broken Hill/Kabwe) are controversial. The e-library is also full of a large number of  papers and is a very useful resource.

January 07, 2013

mtDNA variation in East Africa (Boattini et al. 2013)

From the paper:
Language diversity in EA fits well with its complicated genetic history. In Fleming words, ‘‘Ethiopia by itself has more languages than all of Europe, even counting all the so-called dialects of the Romance family’’ (Fleming, 2006). All African linguistic phyla are found in EA: Afro-Asiatic (AA), Nilo-Saharan, Niger-Congo and Khoisan (however, the genealogical unit of Khoisan is no longer generally accepted). Among them, AA is the most differentiated, being represented by three (Omotic, Cushitic, Semitic) of its six major clades (the others being Chadic, Berber and Egyptian). Omotic and Cushitic are considered the deepest clades of AA, and both are found almost exclusively in the Horn of Africa, along with the linguistic relict Ongota that is traditionally assigned to the Cushitic family but whose classification is still widely debated (Fleming, 2006). These observations are in agreement with a North-Eastern African origin of the AA languages, most probably in pre-Neolithic times (Ehret, 1979, 1995; Kitchen et al., 2009).
and:

This study confirms the central role of EA and the Horn of Africa in the genetic and linguistic history of a wide area spanning from Central and Northern Africa to the Levant. Our results confirm high mtDNA diversity and strong genetic structuring in EA. We were indeed able to identify three population clusters (A, B1, B2) that are related both to geography and linguistics, and signaling different population events in the history of the region. The Horn of Africa (cluster A), in accordance with its role as a major gateway between sub-Saharan Africa and the Levant, shows widespread contacts with populations from CA (AA-Chadic speakers), the Arabian peninsula and the Nile Valley. Southwards, Kenya, and Tanzania (clusters B1 and B2), despite being both heavily involved in Bantu and Nilo-Saharan pastoralist expansions, reveal traces of a more ancient genetic stratum associated with Cushitic-speaking groups (cluster B2). Conversely, Berber- and Semitic-speaking populations of NA and the Levant show only marginal traces of admixture with sub-Saharan groups, as well as a different mtDNA genetic background, making the hypothesis of a Levantine origin of AA unlikely. In conclusion, EA genetic structure configures itself as a complicated palimpsest in which more ancient strata (AA-Cushiticspeaking groups) are largely overridden by recent different migration events. Further explorations of AA-Cushitic- speaking populations – both in terms of sampled groups and typed genetic markers – will be of great importance for the reconstruction of the genetic history of EA and AA-speakers. 

The African origin of Afroasiatic would agree with its linguistic separateness from Eurasian languages, and the fact that a single branch of the family (Semitic) is likely to have originated in Asia, and fairly recently at that.

Related:



Am J Phys Anthropol DOI: 10.1002/ajpa.22212

mtDNA variation in East Africa unravels the history of afro-asiatic groups

Alessio Boattini et al.

East Africa (EA) has witnessed pivotal steps in the history of human evolution. Due to its high environmental and cultural variability, and to the long-term human presence there, the genetic structure of modern EA populations is one of the most complicated puzzles in human diversity worldwide. Similarly, the widespread Afro-Asiatic (AA) linguistic phylum reaches its highest levels of internal differentiation in EA. To disentangle this complex ethno-linguistic pattern, we studied mtDNA variability in 1,671 individuals (452 of which were newly typed) from 30 EA populations and compared our data with those from 40 populations (2970 individuals) from Central and Northern Africa and the Levant, affiliated to the AA phylum. The genetic structure of the studied populations—explored using spatial Principal Component Analysis and Model-based clustering—turned out to be composed of four clusters, each with different geographic distribution and/or linguistic affiliation, and signaling different population events in the history of the region. One cluster is widespread in Ethiopia, where it is associated with different AA-speaking populations, and shows shared ancestry with Semitic-speaking groups from Yemen and Egypt and AA-Chadic-speaking groups from Central Africa. Two clusters included populations from Southern Ethiopia, Kenya and Tanzania. Despite high and recent gene-flow (Bantu, Nilo-Saharan pastoralists), one of them is associated with a more ancient AA-Cushitic stratum. Most North-African and Levantine populations (AA-Berber, AA-Semitic) were grouped in a fourth and more differentiated cluster. We therefore conclude that EA genetic variability, although heavily influenced by migration processes, conserves traces of more ancient strata.

Link

November 14, 2012

High altitude adaptation in Ethiopia

The anthropometric characteristics on pp. 49-50 may also be of interest. It seems Amhara highlanders are shorter, thinner, and  lighter than their co-ethnic lowlanders. Oromo highlanders, on the other hand, appear to be heavier and less thin. (for males).

arXiv:1211.3053 [q-bio.PE]

The genetic architecture of adaptations to high altitude in Ethiopia

Gorka Alkorta-Aranburu, Cynthia M. Beall, David B. Witonsky, Amha Gebremedhin, Jonathan K. Pritchard, Anna Di Rienzo

Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control, DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo.

Link

September 18, 2012

Out-of-Asia and Into-Africa (?)

The publication of version 2 of the Pickrell et al. paper on South Africa is as good an opportunity as any to discuss something anew something that I've been hinting at for some time now.

First things first: Pickrell et al. find West Eurasian admixture in the Hadza and Sandawe:
Both of these are consistent with west Eurasian (either European or, more likely, Arabian), gene  ow into these populations. To further examine this, we turned to ROLLOFF. We used Dinka and French as representatives of the mixing populations (since date estimates are robust to improperly speci ed reference populations). The results are shown in Supplementary Figure S22. Both populations show a detectable curve, though the signal is much stronger in the Sandawe than in the Hadza. The implied dates are 89 generations ( 2500 years) ago for the Hadza and 66 generations ( 2000 years) ago for the Sandawe. These are qualitatively similar signals to those seen by Pagani et al. [65] in Ethiopian populations.
The presence of West Eurasian ancestry in the Hadza and Sandawe was anticipated in my world9 calculator, where both these populations were shown to possess Caucasoid admixture entirely of the "Southern" component. This component peaks in Arabia, and is unaccompanied by any other type of Caucasoid element really only there. So, it is very likely that there was indeed such a migration into East Africa. What Pickrell et al. have added to our knowledge is that this migration is fairly recent.

Razib repeats one of his favorite analogies about events taking place in Africa after the pyramids were rising in Egypt. I will use a Greek epic analogy, by pointing out that at the time that Memnon the Ethiopian led his contingent to the aid of Troy, these events had not yet taken place.

Depictions of Memnon changed during classical antiquity, from a Caucasoid norm, as in the red-figure kylix on the left, to a more stereotypically African form by Roman times. This is sometimes taken as simply a consequence of the fact that the ancient Greeks were unfamiliar with African phenotypes, and changed their portraiture of Ethiopians as they became more familiar with them during Hellenistic and Roman times.

But, the very name of Aithiopes first attested in Homer (8th c. BC) attests to the fact that the Greeks were aware of what Ethiopians looked like, at least in terms of their dark pigmentation. And, there are depictions of Africans in classical art, as well as a famous quote in Herodotus which makes abundantly clear that he was aware of the physical characteristics of what we would call "Sub-Saharan Africans".

We don't only need to look at Ethiopia for evidence of the strange events that were taking place in Africa during classical antiquity. A great punch-in-the-face reminder of these events comes from the much later Greek author Pausanias who records that a statue of Athena he observed in Attica had blue eyes which he ascribed to the Libyan origin of her myth. How strange it seems to us that one would look to Africa for an explanation for the blue-eyed goddess.

Libya was of course, the ancient name for Africa, and especially Africa west of Egypt, what we might call Berber-land. Egypt was often reckoned by the ancient as part of Asia. In any case, Pausanias' strange assertion finds support in the Egyptian monuments that really do depict the ancient Libyans (=Berbers) as Caucasoid, and often lighter than Middle Eastern people. This would also accord with Coon's famous discovery of "Irish-like" Berbers among the Riffians; I often dismissed such assertions, but in a landscape of human prehistory that is getting stranger by the month, it is worth digging for gold nuggets in old texts.

A recent study claimed that there was back-to-Africa gene flow into Eurasia more than 12,000 years ago. On the other hand, both HAPMIX and StepPCO estimate the admixture in Mozabite Berbers as taking place ~120 generations ago, or, about 3.5kya assuming a generation length of 29 years as Patterson et al. (2012) do. I have observed that rolloff produces generally lower dates than these two methods, so I would not be surprised if that is the case here as well.

It seems that as recently as a few thousand years ago, West Eurasian populations were moving into Africa from both north and east. As Pickrell et al. have discovered, their eastern branch also contributed to South Africans, tagging along the dispersal of pastoralists from East-to-South Africa.

The big question is: did West and Central Africa escape this population movement?

I seriously suspect that it did not. I base that assertion on several arguments, of varying strength:

  1. Why would they? If they inundated East and North Africa, why would they not venture further?
  2. Living Sub-Saharan African farmers are not symmetrically related to West and East Eurasians: they are closer to the former. West Eurasian back-migration would explain this phenomenon.
  3. The Great Event in Sub-Saharan Africa was doubtlessly the Bantu explosion, and it is a curious coincidence that this took place precisely close to the time of these events
  4. The Iwo Eleru crania from Nigeria are of late Pleistocene age, archaic in character, and unlike modern West Africans. Something did happen in West Africa over the course of, say, the last 10,000 years
And, I always try to remind myself of the Kiffians and Tenerians. I have not seen any follow-up work on them, but if anyone has an ancient DNA lab, I'd think they would be prime candidates for a study.

Speaking of ancient DNA, this unexpected archaeogenetic study from the University of Khartoum, hints at important changes in Africa:

The area known today as Sudan may have been the scene of pivotal human evolutionary events, both as a corridor for ancient and modern migrations, as well as the venue of crucial past cultural evolution. Several questions pertaining to the pattern of succession of the different groups in early Sudan have been raised. To shed light on these aspects, ancient DNA (aDNA) and present DNA collection were made and studied using Y-chromosome markers for aDNA, and Y-chromosome and mtDNA markers for present DNA. Bone samples from different skeletal elements of burial sites from Neolithic, Meroitic, Post-Meroitic and Christian periods in Sudan were collected from Sudan National Museum. aDNA extraction was successful in 35 out of 76 samples, PCR was performed for sex determination using Amelogenin marker. Fourteen samples were females and 19 were males. To generate Y-chromosome specific haplogroups A-M13, B-M60, F-M89 and Y Alu Polymorphism (YAP) markers, which define the deep ancestral haplotypes in the phylogenetic tree of Y-chromosome were used. Haplogroups A-M13 was found at high frequencies among Neolithic samples. Haplogroup F-M89 and YAP appeared to be more frequent among Meroitic, Post-Meroitic and Christian periods. Haplogroup B-M60 was not observed in the sample analyzed.
I was reminded of it recently when this curious abstract came up, which I still believe is missing a zero somewhere, but these days you never know.


Evidence that Sub-Saharan Africans too have experienced gene flow from West Eurasians occasionally comes up, but formal tests of admixture, e.g., f3(Yoruba; San, French) usually do not achieve significance. But, we must be cautious: South Africans do appear admixed between San and East Africans, but this is a consequence of the fact that admixture is recent, leaving a trail of populations of varying East African ancestry, and the San still exist and can serve as one pole in a comparison of admixture.

David Reich has hinted at dual origins for West Africans. I am looking forward to learning what he means by it, but I would not be surprised if it involves admixture between a Eurasian-like population with a Palaeoafrican population of indigenous West African hunter-gatherers.

In any case, ex Africa semper aliquid novi even today. But, interdum, aliquid novum in Africam.

UPDATE: Pickrell and co-authors discuss their paper here.

June 24, 2012

Clusters Galore analysis of East Africans

I have included the new data from Pagani et al. (2012) together with various other East African datasets available to me, including various East African Dodecad Project participants.

The first four PCA dimensions can be seen below:


Project participants can find their co-ordinates in the first four dimensions below:

I have also run MCLUST over the first 4 dimensions, which resulted in 12 clusters inferred:


All Project participants fall in the expected clusters, so there is no need to report any individual clustering results.

June 22, 2012

Assessing East Africans of Pagani et al. (2012) using 'weac2'

Thanks to the publication of new data from Pagani et al. (2012), we now have 235 more individuals from East Africa, mainly Ethiopians, but also Somalis and South Sudanese with dense genotype data.

Naturally, I wanted to make sure that everything was in order, so I applied the 'weac2' calculator on the new data. Here are the normalized median admixture proportions:



I have also created population portraits for the 12 different populations, which appear to show rather homogeneous samples.

Here are the descriptions of the data from the original paper:

The populations sampled (numbers) were the Semitic-speaking Amhara (26) and Tigray (21); the Cushitic-speaking Oromo (21), Ethiopian Somali (17), and Afar (12); the Omotic-speaking Ari Cultivators (24), Ari Blacksmiths (17), and Wolayta (8); and the Nilotic-speaking Gumuz (19) and Anuak (23). In addition to these groups, we also generated South Sudanese data from mixed populations (24) and Somali data from Somali populations (23).


Newer versions of the Dodecad tools will of course take into account the new samples, which ought to  help better define the "East_African" component that often arises at higher levels of detail.

And, of course kudos to all researchers who make their data publicly available and hence provide genome bloggers such as myself with much appreciated "fuel" for their inquiries.

June 21, 2012

Ethiopian origins (Pagani et al. 2012)

The study attempts to answer four questions:
Our current study is motivated by four questions. First, where do the Ethiopians stand in the African genetic landscape? Second, what is the extent of recent gene flow from outside Africa into Ethiopia, when did it occur, and is there evidence of selection effects? Third, do genomic data support a route for out-of-Africa migration of modern humans across the mouth of the Red Sea? Fourth, assuming temporal stability of current populations, what are the estimated ages of Ethiopian populations relative to other African groups?
Link to press release. Link the supplemental data.

The authors reiterate that modern humans left Africa 50-70kya, a hypothesis that seems to me pretty much dead in the light of recent archaeological evidence.

The lack of antiquity in the Ethiopian population, even in only the African component thereof argues against that population being ancestral to modern humans. Note that if the Out-of-East Africa hypothesis is correct, then skulls like Omo I represent ancestral modern humans and they are followed much later by modern humans anywhere else. So, while anatomical modernity may have emerged in East Africa --or maybe not; let's not forget that we have early modern skulls from the region in part because of the excellent preservation conditions and excess of scholarly interest-- there is no evidence that they spread from there.

I have little doubt that my own theory about substantial back-migration of Eurasians into Africa will eventually win the day. Of course, I am not referring to the recent (in the last 3,000 years) admixture with West Eurasians that the Ethiopian population has undergone, but rather to the more ancient migration that was probably associated with Y-haplogroup DE-YAP.

The fact that the African component of diverse African populations is more closely related to West than to East Eurasians is one piece of evidence among many for that scenario. Hopefully, it can be tested soon using whole genome data which may have enough density to detect much older admixture events.

UPDATE I: Since the dates in the paper are based on ROLLOFF, a piece of software that is not publicly available more than a year after its announcement, and which contradicts other software released by the same authors, I will take the Queen of Sheba stories circulated in the media with a huge grain of salt.

The American Journal of Human Genetics, 21 June 2012 doi:10.1016/j.ajhg.2012.05.015

Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool

Luca Pagani et al.

Humans and their ancestors have traversed the Ethiopian landscape for millions of years, and present-day Ethiopians show great cultural, linguistic, and historical diversity, which makes them essential for understanding African variability and human origins. We genotyped 235 individuals from ten Ethiopian and two neighboring (South Sudanese and Somali) populations on an Illumina Omni 1M chip. Genotypes were compared with published data from several African and non-African populations. Principal-component and STRUCTURE-like analyses confirmed substantial genetic diversity both within and between populations, and revealed a match between genetic data and linguistic affiliation. Using comparisons with African and non-African reference samples in 40-SNP genomic windows, we identified “African” and “non-African” haplotypic components for each Ethiopian individual. The non-African component, which includes the SLC24A5 allele associated with light skin pigmentation in Europeans, may represent gene flow into Africa, which we estimate to have occurred ∼3 thousand years ago (kya). The African component was found to be more similar to populations inhabiting the Levant rather than the Arabian Peninsula, but the principal route for the expansion out of Africa ∼60 kya remains unresolved. Linkage-disequilibrium decay with genomic distance was less rapid in both the whole genome and the African component than in southern African samples, suggesting a less ancient history for Ethiopian populations.

Link

February 17, 2012

Adaptation to high altitude in Ethiopian highlands

There was another paper on adaptation to high altitude of Tibetans.
Another article on Tibetans and Andean high-altitude adaptation, and another one by Bigham et al.


Genome Biology 2012, 13:R1 doi:10.1186/gb-2012-13-1-r1

Genetic adaptation to high altitude in the Ethiopian highlands

Laura B Scheinfeldt et al.

Abstract (provisional)


Background

Genomic analysis of high-altitude populations residing in the Andes and Tibet has revealed several candidate loci for involvement in high-altitude adaptation, a subset of which have also been shown to be associated with hemoglobin levels, including EPAS1, EGLN1, and PPARA, which play a role in the HIF-1 pathway. Here, we have extended this work to high and low altitude populations living in Ethiopia for which we have measured hemoglobin levels. We genotyped the Illumina 1M SNP array and employed several genome-wide scans for selection and targeted association with hemoglobin levels to identify genes that play a role in adaptation to high altitude.

Results

We have identified a set of candidate genes for positive selection in our high-altitude population sample, demonstrated significantly different hemoglobin levels between high and low altitude Ethiopians and have identified a subset of candidate genes for selection, several of which also show suggestive associations with hemoglobin levels.

Conclusions

We highlight several candidate genes for involvement in high-altitude adaptation in Ethiopia, including CBARA1, VAV3, ARNT2 and THRB. Although most of these genes have not been identified in previous studies of high-altitude Tibetan or Andean population samples, two of these genes (THRB and ARNT2) play a role in the HIF-1 pathway, a pathway implicated in previous work reported in Tibetan and Andean studies. These combined results suggest that adaptation to high altitude arose independently due to convergent evolution in high-altitude Amhara populations in Ethiopia.

Link

October 25, 2010

More detailed analysis of Eurasian populations (K=10)

I have removed some populations from the previous run (such as Moroccan Jews and Samaritans) that tended to generate mini-clusters due to the presence of close relatives and/or inbreeding in the sample. I have removed some redundant populations to even out the dataset, and I have also added North Kannadi and Gujarati, which helped reveal the gradient of ancestry in South Asia.

ADMIXTURE results:

Admixture proportions:


Some interesting observations:
  • The occurrence of 3.8% South Asian in Romanians may signify its Roma population. Indeed, almost all of this comes from a 25% South Asian individual, almost certainly a Roma.
  • The small African component in Spaniards which was revealed in a previous K=8 run turns out to be East African (0.5%) rather than West African (0.1%). If this holds up in larger sets then it might signify that its origin is from East African admixed populations from the east, rather than Sub-Saharan Africans.
  • The multiplicity of ancestries of the Uygur is made evident, in agreement with the extensive craniometric and genetic data on prehistoric and extant populations from the area.
  • The proportion of the two East Eurasian components in Turkic populations is interesting. It seems that the earliest departures from the Turkic homeland (such as the Chuvash and Yakut) have a predominance of the NE Asian component, the Anatolian Turks are intermediate, and the Uygurs, the only ones to have stayed close to the homeland, have experienced an increase in the E Asian component.
  • The absence of the West African component in Ethiopians is striking. Here are the individual results for Ethiopians, illustrating the variability of the Southwest African vs. East African components. The Ethiopian sample consists of a number different ethnic groups of the country, some of which (like the Amharas) are of Western Eurasian linguistic origin.

I am currently running K=11 and K=12 on the exact same data to see how the LogLikelihood and Bayes Information Criterion will move and whether new mini-clusters will appear, or if the mega-components (such as the "West Asian", "South European", and "North European") will split informatively. I will update this post with information on what actually happened, and with additional plots -- if I get robust results.

October 20, 2010

The shape of things to come

(Last Update: Oct 22)

Here is a teaser from my ongoing ADMIXTURE experiments. I have assembled a nice set of Eurasian populations, and this time I will continue increasing K until the Bayes Information Criterion maxes out.

I had previously used BIC to successfully choose K in my craniometric analysis of world populations. I am now at K=7 and it keeps on rising! Either it will stop or my computer will burn out doing Quasi-Newton iterations.

Onto the teaser: I'll only say that "People" are the 12 Genomes Unzipped individuals; I will leave their individual proportions a mystery - for now.

In color:
Sampled populations, # of individuals, ancestral components:


UPDATE (Oct 21):

Here are the K=8 results. I've added some more populations. The next step is to integrate HapMap data with HGDP-CEPH and the Behar et al. dataset. Stay tuned.

I've decided to present the data as population averages, as this is much more readable, especially as the number of individuals grows.

In this run, one of the clusters (purple) became associated with north Mongoloids (Yakuts, and partially Mongols and Daur), whereas Han are mostly in the green component. Notice that Yakut, Uzbeks, Chuvash, and Turks (all Turkic populations) are predominantly in the north cluster as far as their Mongoloid component is concerned, as expected.

UPDATE II (Oct 22):

Here are results for K=9. The addition of Maasai (MKK) has revealed the east African component which they share with Ethiopians; the latter also have a significant West Asian component.

For lack of a better term, I decided to call the lime component "Sardinian", as it is dominant in that population, but it is clearly reflective of something much broader.

July 14, 2010

mtDNA of Yemeni and Ethiopian Jews

From the paper:
Mitochondrial DNA analysis also revealed a high diversity of sub-Saharan African and Eurasian haplotypes in both the Yemenite and Ethiopian Jewish populations (see Fig. 2). Specifically, common haplotypes (haplotypes present at [5%) in Yemenite Jews include the African haplogroup L3x1 and Eurasian haplogroups R0a (renamed from (preHV)1 (Torroni et al., 2006), HV1, J2a1a [renamed from J1b (Palanichamy et al., 2004)] K, R2, U, and U1, and in Ethiopian Jews include African haplogroups L2a1b2 and L5a1 and Eurasian haplogroups R0a and M1a1 (see Fig. 2). Overall, sub-
Saharan African L haplotypes [hereafter referred to as L(xM,N), i.e., all African haplotypes except M and N, following the nomenclature of Behar et al. (2008)], comprise a large proportion of the genetic variation in both Jewish populations, representing 20% in the Yemenite Jews and 50% in Ethiopian Jews. This high frequency contrasts with other Jewish populations, such as Near Eastern and Ashkenazi Jews, who almost entirely lack L(xM,N) haplogroups (Thomas et al., 2002; Richards et al., 2003).
I think that the authors' conclusion that Yemenite Jews are partially descended from Israeli exiles is premature. Sure, they can exclude large-scale introgression of Yemeni mtDNA, but the universe of possibilities is not limited to either Israeli or Yemenite.

The way I see it, only a large-scale study of all global Jewish populations may uncover verified ancient Jewish lineages for both Y-chromosomes and mtDNA. The recent studies on Jews have uncovered several genetic sub-clusters of Jews, and only lineages that occur in 2 or more of these clusters, and preferably geographically separated ones have a strong claim of representing original Jewish lineages. There is a limit on what can be uncovered about the past from the study of living populations.

American Journal of Physical Anthropology doi: 10.1002/ajpa.21360

Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia

Amy L. Non et al.

Abstract

Southern Arabia and the Horn of Africa are important geographic centers for the study of human population history because a great deal of migration has characterized these regions since the first emergence of humans out of Africa. Analysis of Jewish groups provides a unique opportunity to investigate more recent population histories in this area. Mitochondrial DNA is used to investigate the maternal evolutionary history and can be combined with historical and linguistic data to test various population histories. In this study, we assay mitochondrial control region DNA sequence and diagnostic coding variants in Yemenite (n = 45) and Ethiopian (n = 41) Jewish populations, as well as in neighboring non-Jewish Yemeni (n = 50) and Ethiopian (previously published Semitic speakers) populations. We investigate their population histories through a comparison of haplogroup distributions and phylogenetic networks. A high frequency of sub-Saharan African L haplogroups was found in both Jewish populations, indicating a significant African maternal contribution unlike other Jewish Diaspora populations. However, no identical haplotypes were shared between the Yemenite and Ethiopian Jewish populations, suggesting very little gene flow between the populations and potentially distinct maternal population histories. These new data are also used to investigate alternate population histories in the context of historical and linguistic data. Specifically, Yemenite Jewish mitochondrial diversity reflects potential descent from ancient Israeli exiles and shared African and Middle Eastern ancestry with little evidence for large-scale conversion of local Yemeni. In contrast, the Ethiopian Jewish population appears to be a subset of the larger Ethiopian population suggesting descent primarily through conversion of local women.

Link

September 05, 2009

More ASHG 2009 abstracts

For the first part, see here.

See Part I for another study on Ashkenazi Jews. We will have to look at the details of the study when it comes out, but the fact that Ashkenazi Jews are (a) between southern Europeans and Near Easterners, and (b) form a distinct cluster of their own at K=3 seems to support my theory that most of the European ancestry in Jews is of ancient origin in southern Europe rather than due to recent admixture with Central/Eastern Europeans: at K=2 the ancestral components are identified, but these components mixed a relatively long time ago, so that after a subsequent period of relative isolation, a distinctive pattern was formed out of the mixture which is identified at K=3.

Genome-wide SNP analysis of Ashkenazi Jews reveals unique population substructure
The Ashkenazi Jews (AJ) are a genetic isolate that has been widely utilized in genetic studies of both mendelian and complex disorders. However, the genetic variation and population structure of the AJ have been previously investigated with relatively few individuals and few genetic markers. We have now genotyped a large AJ cohort with the Affymetrix 6.0 genome-wide SNP array. After strict quality control filters, genotype data at 775K SNPs in 466 unrelated AJ individuals were available for analysis. To investigate the genetic structure of the AJ relative to other populations we used principle components analysis (PCA) as well as the frappe clustering algorithm. When merged with the worldwide Human Genome Diversity Project dataset, PCA shows the AJ are distinct from all other groups, including both European and Middle-Eastern populations. Further PCA using AJ genotypes combined with a large European dataset again validates the separation of AJ from European populations. Interestingly, principle component one seems to largely separate European and Middle-Eastern populations geographically according to latitude with the AJ fitting South of Europe and North of the Middle-East. Additional analysis using the frappe population clustering algorithm is consistent with a unique population signature for the AJ. Limiting the frappe clustering to only two population groups, specifying k=2, reveals that AJ cluster more closely to Europeans than Middle-Eastern populations but when allowing three populations, k=3, AJ form a group distinct from both the Middle-East and Europe. Compared to European populations, AJ also show an increase in genome-wide linkage disequilibrium, consistent with possible founder effects. These findings will aid in the design and use of AJ in case-control and association studies and clearly demonstrate the genetic separation of AJ from other populations.
Another paper on the topic, albeit one which uses HLA haplotypes to infer admixture and is limited to Jewish/Central European admixture.

Admixture between Ashkenazi Jews and Central Europeans
When distinct populations inhabit the same geographic space, culture often acts to restrict random mating in our species, while at the same preventing complete genetic privacy. The residency across Central Europe by the Ashkenazi Jews over the last thousand years is such a case. HLA typing from bone marrow donor registries in Israel, Poland and Germany were utilized to measure admixture between central European host populations and Ashkenazim. Inferred high resolution HLA A-B-DRB1 haplotype frequencies were generated from each population. A total of 1,676 Polishorigin- Ashkenazim and 13,556 Polish haplotypes were analyzed, along with a similar sample of ~5 million German haplotypes. The informativeness of HLA haplotypes is shown by the A-B-DRB1 haplotype 0101-0801-0301, the most common haplotype found in northern Europe. HLA B*0801 bearing haplotypes are present in the Near East, but those B*0801 haplotypes carry the HLA C allele Cw*0702 instead of the Cw*0701 found in 0101-0801- 0301. The 100 most common haplotypes constituted 53% of the total Ashkenazi, and 45% of the Polish, and 43% of the German samples, reflecting the sizeable total fraction of very rare haplotypes familiar in population samples of the diverse HLA system. The most common Ashkenazi haplotype had a frequency of 6.14% (n = 102.9) and the 100th haplotype was present at 0.29% (n = 4.86). Comparable values for the Polish sample were 5.83% (n = 790.3) and 0.13% (n = 17.6), respectively. Haplotypes from one population compared to those haplotypes in a second could be classified into three categories: less frequent, statistically identical or more frequent. In the graph of the ordered 100 Polish haplotypes, the less frequent Ashkenazi haplotypes supply a possible signature of admixture from the Poles into the Polish Ashkenazim, while the haplotypes more frequent in Ashkenazim than Poles are candidates for movement of genes from the Ashkenazim to the Poles. The averaged frequency differences between these categories give an indication of population admixture. The analysis showed that 1.8% of Polish haplotypes may be of Ashkenazi origin and 0.6% of Ashkenazi of Polish origin. The sample from Germany, in which the initial generations of Polish- Ashkenazi history was spent, was useful in demonstrating consistency of haplotype frequencies by rank order. The results show clear evidence of admixture occurring in both directions between two largely HLA-distinct populations.

The following study demonstrates a point I have argued several times before with Afrocentrists, namely the intermediate genetic position of Ethiopians between Caucasoids and Sub-Saharan Africans. It also underscores the difference between social and biological classifications: Ethiopians are undoubtedly "socially" black in most other societies, but intermediate between Negroids and Caucasoids anthropologically. This reality was recognized even by early anthropologists who coined the term of Ethiopids to describe them as a separate intermediate category between Caucasoids and Negroids.

The distribution of sex-specific human genetic variation in Ethiopia.
Ethiopia has been proposed as a candidate location for the emergence of anatomically modern humans, and the source region for the expansion out of Africa. It is also a region of substantial cultural diversity as expressed in languages (Nilo-Saharan, Cushitic, Semitic, and Omotic language families), religions (Christians, Jews, Moslems and Animists), ethnic identities (over 80 groups) as well as many marginalised groups socially excluded on grounds of caste-like occupation, supposed origin, or both. The demographic history of Ethiopia over the past several thousand years has involved both sustained migration of Semitic speakers from the Arabian Peninsula as well as internal conquests of lands in the south. To investigate the demographic histories of ethnic groups we analysed a battery of SNPs and microsatellites on the non-recombining portion of the Y chromosome (NRY) and sequence variation in the Hypervariable Segment 1 (HVS1) of mtDNA (5756 samples from 45 ethnic groups). Commonly used summary statistics (gene diversity h, genetic distance Fst) were analysed within the context of non Ethiopian data e.g. West Africa (Igbo, Nigeria) and Europeans. We present preliminary results reporting a wide range of genetic diversity values within ethnic groups (h: NRY = 0.743 - 0.972, HVS1 = 0.962 - 0.996) and pairwise genetic distance values between groups (Fst: NRY = 0.000 - 0.294, HVS1 = 0.000 - 0.035). A clustering of Ethiopian groups was observed when using principal coordinate analyses with genetic distances, appearing midway between a West African Niger-Congo speaking group (Igbo of Nigeria) and an Indo- European speaking group (Greek Cypriots). Some south-western groups (e.g. Anuak) showed greater similarity to West-Africans while the culturally influential Amhara were more similar to Europeans. Gene flow between dominant Dawuro agriculturalists and excluded members of the Manja was sex-biased, with many more NRY haplotypes common to the two groups than mtDNA haplotypes, relative to the distribution of the two systems across all the ethnic groups. The marginalised group had a particularly low level of mtDNA HVS1 diversity (h = 0.705). Of particular interest is the extensive sharing of discriminating NRY and mtDNA haplotypes across many ethnic groups, suggesting either a) the creation or preservation of cultural diversity despite substantial inter-group gene flow or b) recent ethnogenesis of the currently extant groups.
Yet another study of differences between ancient and modern mtDNA gene pools. I hope the 2012 crowd doesn't follow up on this for its own bizarre purposes...

Genetic Diversity of the Ancient People in Mesoamerica
DNAs were extracted from the human remains buried in the Moon Pyramidat archaeological Teotihuacan site in Mexico. Nucleotide sequences of theirmitochondrial D-loop and SNP sites were determined by the PCR-directsequencing. To reveal the genealogy of mitochondrial DNA sequences ofthe individuals buried in the Moon Pyramid and assess their positions amongNative Americans, we first constructed a network of the mitochondrial DNAfrom the contemporary Native Americans; the northern Native Americans(Haida, Bella Coola, and Nuu Chah Nulth), the central Native Americans(Huetar, Kuna, and Ngöbé), and the southern Native Americans (Yanomami,Zoro, Gavião, and Xavante), and compared them with those of the individualsfrom the Moon Pyramid. All of the mitochondrial DNA types from the MoonPyramid individuals were unique, and clear genetic affinities were notobserved between the Moon Pyramid individuals and any of the 10 NativeAmerican populations. To investigate genetic diversity among the contemporarycentral Native American populations, we constructed a phylogenetictree of their mitochondrial DNA sequences using the neighbor-joiningmethod. There was a major mitochondrial DNA sequence common to thesethree central Native American populations. However, there were a relativelysmall number of mitochondrial DNA types in each population, most of whichwere, moreover, unique to each Native American population. Next we comparedthe mitochondrial DNA sequences of the Moon Pyramid individualswith those of the ancient Mesoamerican people, ancient Maya people fromthe classic Copán site. We also used Huetar people as a reference for thecontemporary central Native Americans. The distribution of the mitochondrialDNA types found in the ancient Native Americans is greatly different fromthat found in the contemporary Native Americans. These results show thatgenetic diversity in the ancient Native Americans was not as low as that inthe contemporary Native Americans, suggesting an occurrence of bottleneckin the past.
This will be of great interest to Y chromosome enthusiasts.

Improved resolution of the human Y-chromosomal phylogeny using
targeted next-generation sequencing

The non-recombining part of the Y chromosome provides unique insights into male-specific aspects of human genetics and history. We are using next-generation Illumina sequencing to fully re-sequence targeted regions of the Y and resolve the Y-chromosomal phylogeny by characterization of additional single nucleotide polymorphisms (SNPs) on lineages of interest. Initially ~6 Mb of Y sequence (NCBI36:Y-chromosome: 12,308,579- 18,230,132) is being generated for an African haplogroup A male. The strategy involves sequence enrichment by long template PCR of genomic DNA (10-20 ng/reaction) using overlapping fragments of 5.5 - 6.5 kbp. Currently ~70% of primer pairs work using a standard touchdown PCR protocol. Fragments obtained from a single individual are pooled and used for library preparation and IIlumina sequencing. Re-sequencing generates accurate high coverage data; SNP calling and their subsequent validation will be presented. Most SNPs are expected to be rare but some are likely to resolve deep divisions within African populations. Subsequently, we aim to (1) determine the time depth of the human Y phylogeny, (2) resolve multifurcations in the major lineages by discovering additional SNPs on the relevant and (3) discover SNPs that mark any lineage of particular interest. In addition, we will be able to provide a subset of all primers that work well with this protocol to investigators who are interested in Y-chromosomal phylogenies so that comparable standard datasets can be generated for use by the community.
Female to male breeding ratio in the history of modern humans
Was the genetic contribution of men and women to successive generations the same? As a population, did we have fewer fathers than mothers? Was polygyny present among hominid lineages to influence relative divergence rates of autosomes and sex chromosomes? Students of genetic variation of the uniparentally inherited mitochondrial and Y-chromosome DNA confronted these questions, fewer addressed it by looking at the DNA diversity of autosomes and sex chromosomes (Hammer et al. 2009, Keinan et al. 2009) with equivocal results. Our approach is different: we analyzed the ratio of the population recombination rate, ρ, between autosomes and the X chromosome. The chromosome X recombines only in the female meiosis whereas autosomes undergo cross-overs in both male and female germ lines such that their relative ρ reflects changes in the breeding ratio, β. The estimate of β is calculated from the observed chromosomal ρ’s, obtained by InfRec (Lefebvre and Labuda 2008), after their calibration with the average chromosomal recombination rates known from pedigree data. We have tested our approach using coalescent simulations under different input parameters’ values and various demographic scenarios. For the HapMap populations we obtained β of 1.4 in Yoruba from West Africa, 1.2 in European and 1.0 in East Asian samples. This suggests that in the history of modern humans the reproductive variance between men and women did not drastically differ, thus consistent with the prevalence of monogamy or mild polygyny in the human lineage. Known incidences of polygyny may be of recent origin, related to raise of agriculture and shift from hunter-gathering to food producing economies, and therefore not sufficiently common to leave a strong genetic signature in the recombinational record. (Supported by GenomeQuebec/Genome Canada and Canadian Institutes of Health Research).


Accurate inference of individual ancestry geographic coordinates
within Europe using small panels of genetic markers

The study of genomewide datasets of thousands of individuals of European ancestry supports the close correspondence between genetic distances and geographic coordinates within Europe, especially when information from hundreds of thousands of genetic markers is used. In fact, Principal Components Analysis (PCA), summarizing genetic variation over the top two principal components (PCs), results in plots that are surprisingly reminiscent of geographic maps of Europe. We set out to discover those markers that are most closely correlated with geographic origin, seeking to predict individual ancestry at a fine level, and even for closely spaced populations. To this end we analyzed a previously described subset of the Population Reference Sample (POPRES). We focused on 12 populations and 1224 individuals for which geographic coordinates (longitude and latitude) of individual origin are given for at least 20 individuals per population. First, we performed a complete leave-one-out crossvalidation experiment using 447,212 SNPs, and a simple nearest neighbors approach to infer geographic coordinates. This resulted in extremely high accuracy, placing individuals within an average longitudinal error of 2.2 degrees, and an average latitudinal error of 0.88 degrees. Next, we applied an algorithm that we have previously described to select the top 5,000 SNPs that correlate well with population structure as captured by PCA. We then filtered highly correlated SNPs using standard linear algebraic algorithms for the column subset selection problem. We thus selected 500 maximally uncorrelated markers, which have a Pearson correlation coefficient of 0.92 with PC 1, and 0.83 with PC 2. We extensively validated the effectiveness of such SNP panels for genetic ancestry testing by once more performing a complete leave-one-out crossvalidation experiment on the 1224 studied individuals (approx. two weeks of CPU time in commodity hardware). Using 500 carefully selected SNPs we can place individuals within a few hundred kilometers of their reported origin (average longitudinal and latitudinal error of 4.7 and 1.9 degrees respectively). Finally, we crossvalidated our best panel of 500 SNPs on the HapMap CEPH European individuals, placing them accurately on the Northwestern corner of Europe. Not surprisingly, our SNP panel includes markers that are either within genes reported to be under selective pressure in Europeans, or in high LD with such genes.


Genetic relationships among the ancient Chinese populations viewed
from discrete cranial traits

The discrete cranial traits are informative in revealing the genetic relationship of human populations. Given little available knowledge on these traits, especially their underlying genetic determinants, the primary aim of this study is to select a small number of traits that are sufficiently informative to represent genetic differentiation among East Asian populations. We studied overall 51 traits for 1,578 skulls from 19 necropolises, and found that 5 traits could capture the largest variation in East Asian populations studied. They are accessory mandibular foramen, palatine torus, mandibular torus, mastoid foramen extra-sutural, and infraorbital suture. The analysis on these 5 traits resulted in similar population relationships to that using all 51 traits. The study on discrete cranial traits could not only facilitate exploration of the genetic relationship of populations, and could also allow identification of the genes underlying these anthropological traits.

Admixed ancestry and stratification of regional gene pools of Quebec
In Quebec, studies of different molecular polymorphisms have shown that the French Canadian gene pool is as diverse as its source European populations and, contrary to what was previously anticipated, does not display more homogeneity. To better understand the genetic structure of the contemporary population, we analyzed the origins and contribution of 7,798 immigrant founders identified in the genealogical ascendance of a sample of 2,221 subjects representative of the French Canadian population of Quebec. As expected, French founders are the most important in number (n=5,326) in all Quebec regions. They contribute for about 90% of the regional gene pools, except for regions located in the easternmost part of the province (76%), which are characterized by more diverse origins. Although this study supports the French founders’ importance, it also puts in the balance arguments in favor of the heterogeneity of the founding pool. The majority of immigrants landed as single member of their family, originating from all the regions of France. In addition, nearly all subjects have mixed origins, including French and non-French. Taken together, these results put into perspective the idea of the homogeneity of the origins of the French Canadians and of a pan-Quebec founder effect. The differential descent and genetic contribution of immigrant founders across regions points to the stratification of the French Canadian population of Quebec, showing a east-west gradient of diversity. These results will contribute to optimize study design in gene mapping studies relying on the founder effect in the French Canadian population of Quebec.

A nonsynonymous SNP in EDAR is associated with tooth shoveling
Teeth display variations among individuals in the size and the shape of cusps, ridges, grooves, and roots. In addition, there are certain dental characteristics which are predominant in certain human groups, such as tooth shoveling of upper incisors that is major in Asian populations but rare or absent in African and European populations. The common characteristics of dental morphology are thought to be determined mainly by genetic factors. However, genetic polymorphisms associated with dental morphology have not been elucidated yet. In humans, the ectodysplasin A receptor gene (EDAR) as well as the ectodysplasin A gene (EDA) is know to be responsible for hypohidrotic ectodermal dysplasia, a genetic disorder causing abnormal morphogenesis of teeth, hair, and eccrine sweat glands. Human genome diversity data have revealed that the derived allele of a nonsynonymous single nucleotide polymorphism (SNP), rs3827760 that is also called EDAR T1540C, is predominant in East Asian populations but absent in populations of African and European origins. It has recently been reported that the 1540C allele is associated with Asian-specific hair thickness. The aim of this study is to clarify whether the nonsynonymous polymorphism in EDAR is also associated with dental morphology in humans or not. For this purpose, we measured crown diameters and tooth shoveling grades, genotyped EDAR T1540C, and analyzed the correlations between them in Japanese populations. To comprehend individual patterns of dental morphology, we applied a principal component analysis (PCA) to individual-level metric data, the result of which implies that multiple types of factors affect the tooth size. This study clearly demonstrated that the number of the Asian-specific EDAR 1540C allele is strongly correlated with the tooth shoveling grade. The SNP significantly affected PC1 and PC2 in PCA, which denotes overall tooth size and the ratio of mesiodistal diameter to buccolingual diameter, respectively. Our study revealed a main genetic determinant of tooth shoveling that has classically received great attention from dental anthropologists. Further studies using powerful DNA technology will lead to clearer understanding about genetic factors for phenotypic variations in tooth morphology such as Carabelli’s tubercle, the numbers of cusps and roots, and the size balances shown in metric measurements.
Direct estimation of the microsatellite mutation rate
Characterizing the behavior of mutations is fundamental to our understanding of genetic variation. Attempts to directly observe DNA mutations arising from germline transmissions are confronted by two challenges: The large amount of DNA sequence that needs to be collected in order to observe a mutation (since the mutation rate in humans is estimated to be ~2x10-8 per generation), and a poor signal-to-noise ratio, due to the fact that any modern genotyping technology has an error rate far exceeding the mutation rate. Using deCODE Genetics’ database of over 95,000 Icelanders genotyped at over 3,000 microsatellite loci, we directly observed mutations in germline transmissions from pedigrees. Microsatellites are thought to have mutations rates as high as 10-3 per locus per generation. To overcome the genotyping error rate, which was estimated in this data set to be ≤10-2 per allele call after appropriate filtering, we carried out two independent analyses: (1) We restricted our analysis to mother-father-child trios, and required the mutated allele to be genotyped at least twice in both the child and in the transmitting parent to confirm mutant transmissions. This identified 2,124 mutant events from 5.62 million instances of parent-child transmissions, yielding a mutation rate estimate of 3.78x10-4 averaged across the markers that we analyzed. (2)Wetraced the haplotype affected by the mutation through local pedigrees, requiring that the mutant haplotype is observed in the affected proband’s children, and simultaneously, that the wildtype haplotype is observed in the affected proband’s siblings. This identified 788 mutant events from 1.59 million instances of parent-child transmissions, yielding a mutation rate of 4.96x10-4. Our collection of mutant events is significantly larger than previous studies. This allows for categorical analyses of microsatellite mutation rates partitioned based on the gender and age of the individual transmitting the allele, as well as the repeat type and cytogenetic position.

August 30, 2009

mtDNA and ethnic differentiation in East Africa

From the paper:
The pattern observed in East Africa (with the exception of the Khoisan-related Hadza and Sandawe populations), which combines a high level of within-population diversity with strong genetic structure among populations, suggests the occurrence of periodical episodes of admixture in these populations, separated by periods of isolation and genetic drift. Indeed, the observation of high levels of diversity within populations could be due to long-term large effective population sizes maintained in East Africa. In this case, however, little genetic structure between populations should be expected, since there would be little opportunity for genetic drift to act. Alternatively, gene flow can produce high within population diversity, and in the present case, it could also account for the extensive sharing of haplotypes and haplogroups observed between the Nyangatom and the Daasanach, as well as with other populations.
This seems like a very clever observation: substantial gene flow and a large effective population size would be inconsistent with population structure, as the different populations would be homogenized and drift would not be able to differentiate them. Long-term lack of gene flow, on the other hand, would not explain the sharing of haplotypes between populations, as each population would develop its own distinctive genetic signatures over time. Thus, the simplest explanation for the observed pattern is that gene flow has indeed occurred (accounting for the sharing of haplotypes), but that it was not continuous (accounting for the fact that populations are, after all, substantially differentiated).

From the paper:
The intermediate linkage disequilibrium (LD) found in East Africa (Tishkoff et al., 1996) in contrast with Europe (high LD) and Sub-Saharan Africa (low LD, Tishkoff & Kidd, 2004; Conrad et al., 2006), could be due to such admixture events, more frequently occurring in this region compared to other Sub-Saharan populations. Substantial levels of gene flow among Nilo-Saharan, Afro-Asiatic and Niger-Congo populations from Tanzania have already been inferred by Tishkoff et al. (2007a) and our results suggest that these gene flows could have occurred in a larger region extending up to Southern Ethiopia.
Indeed, in the absence of recent admixture, the East African populations would exhibit similar levels of LD with Sub-Saharan Africans., or even lower, as the indigenous East Africans are arguably older than those of the interior of the continent. The fact that they exhibit higher LD (intermediate between Europe and Sub-Saharan Africa) can be explained by admixture, i.e., the fact that they have inherited long stretches of DNA from the parental populations in each admixture event, and that time since that event has not been sufficiently long to cause the decay of these chunks into smaller pieces.

And, from the conclusions of the paper:
The high diversity in East Africa was interpreted as a sign of an ancient origin. However, our results might indicate that this high diversity could also come from a particular history of recent migrations and admixture promoted by the pastoralist societies that dominate in the region.
Note, that an East African origin of mankind is still the best hypothesis on palaeoanthropological and simply geographical grounds. However, the high genetic diversity found in East Africa does not necessarily reflect the antiquity of that population, but rather its history of repeated admixture by peoples of different origins.

There are two alternative hypotheses for why East Africans accumulated so much genetic diversity:
  1. They are the oldest population, and have been accumulating genetic diversity for the longest period of time
  2. They are substantially admixed with very divergent components (e.g., Semites, Nilo-Saharans, Cushitic speakers, and so on)
A not-so-bad example would be to compare them with other known population sources in the world, e.g., Anatolia, from where multiple waves of humans entered Europe in Paleolithic and Neolithic times. Many would agree that such movements took place, but it would be incorrect to see the population of Anatolia as a little-altered descendant of its earliest inhabitants, as the current genetic diversity observed there is -at least in part- the result of the settlement of the region by peoples from the Balkans, Central Asia, Levant, and even Western Europe.

Ann Hum Genet. 2009 Aug 25. [Epub ahead of print]

Genetic Evidence for Complexity in Ethnic Differentiation and History in East Africa.

Poloni ES, Naciri Y, Bucho R, Niba R, Kervaire B, Excoffier L, Langaney A, Sanchez-Mazas A.

Summary

The Afro-Asiatic and Nilo-Saharan language families come into contact in Western Ethiopia. Ethnic diversity is particularly high in the South, where the Nilo-Saharan Nyangatom and the Afro-Asiatic Daasanach dwell. Despite their linguistic differentiation, both populations rely on a similar agripastoralist mode of subsistence. Analysis of mitochondrial DNA extracted from Nyangatom and Daasanach archival sera revealed high levels of diversity, with most sequences belonging to the L haplogroups, the basal branches of the mitochondrial phylogeny. However, in sharp contrast with other Ethiopian populations, only 5% of the Nyangatom and Daasanach sequences belong to haplogroups M and N. The Nyangatom and Daasanach were found to be significantly differentiated, while each of them displays close affinities with some Tanzanian populations. The strong genetic structure found over East Africa was neither associated with geography nor with language, a result confirmed by the analysis of 6711 HVS-I sequences of 136 populations mainly from Africa. Processes of migration, language shift and group absorption are documented by linguists and ethnographers for the Nyangatom and Daasanach, thus pointing to the probably transient and plastic nature of these ethnic groups. These processes, associated with periods of isolation, could explain the high diversity and strong genetic structure found in East Africa.

Link

August 26, 2009

Bronze Age origin of Semitic languages

Bayesian phylogenetic methods, originally developed for biology, have been increasingly -and successfully- applied to linguistic data in recent years (e.g., for Indo-Europeans, Melanesians, and Austronesian speakers from the Pacific).

The current paper proposes a Bronze Age origin for Semitic languages, ~3 thousand years after the split of European from Anatolian Indo-European speakers. I don't find this particularly surprising, as Semitic has been, until relatively recently, much more geographically constrained than Indo-European, and -due to the early literacy of the populations of the Near East, its post-Neolithic arrival can be observed in the archaeological record itself.

It also explains a facet of Y-chromosome distribution, that I have commented on before, namely the fact that the common Near Eastern haplogroup J2 extends from Europe to South Asia in a "horizontal zone" accompanied with little of its sister clade J1, but in the Near East itself, there is a "vertical zone" from the Black and Caspian seas to Arabia of high J1 frequency. As I have explained recently, the mixed J2/J1 frequency in the central Near East is due to an enrichment with J1 lineages of a population that had (in pre-Semitic times) a high J2/J1 ratio like those of Europe, Asia Minor, and Iran. J1 should not be seen as exclusively Semitic, but it can't be denied that the major factor affecting its current spread has been the arrival of Semites from the South, the latest episode of which involved the spread of Arab Muslims.

The current study also demonstrates that linguistic Bayesian phylogenetics (LBP) has no inherent bias to produce older dates for language dispersals; while the origin of the Indo-European (IE) language family has been dated to the early European Neolithic, and now Semitic to ~6,000 years, the spread of Melanesian languages to Pleistocene times, and of the Austronesian settlement of the Pacific to ~5,000 years. The congruence between LBP and traditional archaeology in all these cases should force IE exceptionalists who cling to the old theory of "steppe horse riders" to explain why, only in the dispersal of IE, it should LBP should have failed.

The paper also has free supplementary data, including a multistate phylogeny (pdf) of Semitic languages (reproduced top left of this post).

(More details to follow after I thoroughly read the paper)

UPDATE (Aug 27):

From the paper:
Furthermore, Eblaite (no Eblaite wordlists were available for our study), the closest relative of Akkadian and the only other member of East Semitic, was spoken in the Levant (specifically the northeast Levant or present-day Syria; Gordon 1997), which is also where some of the oldest West Semitic languages were spoken (Ugaritic, Aramaic and ancient Hebrew). The presence of ancient members of the two oldest Semitic groups (East andWest Semitic) in the same region of the Levant, combined with a possible long interval (100–3000 years) between the origin of Semitic and the appearance of Akkadian in Sumer, suggests a Semitic origin in the northeast Levant and a later movement of Akkadian eastward into Mesopotamia and Sumer (see figure 1 for a map of our proposed Semitic dispersals).
An origin of Semitic in northeast Levant (Syria) would be consistent with the observed east-west cline of decreasing J1 frequency in the Levant; the authors do, however, mention that the possibility for unknown extinct languages of the Semitic language may shift both the age of the language and its place of origin.
Lacking closely related non-Semitic languages to serve as out-groups in our phylogeny, we cannot estimate when or where the ancestor of all Semitic languages diverged from Afroasiatic. Furthermore, it is likely that some early Semitic languages became extinct and left no record of their existence. This is especially probable if early Semitic societies were pastoralist in nature (Blench 2006), as pastoralists are less likely to leave epigraphic and archaeological evidence of their languages.
A pastoralist association of Semitic languages is also consistent with the observed correlation of haplogroup J1 with herders and J2 with settled farmers in the Near East.


Proc. R. Soc. B 7 August 2009 vol. 276 no. 1668 2703-2710

Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East

Andrew Kitchen et al.

Abstract

The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which gave rise to some of the world's first major religious and literary traditions. In this study, we employ Bayesian computational phylogenetic techniques recently developed in evolutionary biology to analyse Semitic lexical data by modelling language evolution and explicitly testing alternative hypotheses of Semitic history. We implement a relaxed linguistic clock to date language divergences and use epigraphic evidence for the sampling dates of extinct Semitic languages to calibrate the rate of language evolution. Our statistical tests of alternative Semitic histories support an initial divergence of Akkadian from ancestral Semitic over competing hypotheses (e.g. an African origin of Semitic). We estimate an Early Bronze Age origin for Semitic approximately 5750 years ago in the Levant, and further propose that contemporary Ethiosemitic languages of Africa reflect a single introduction of early Ethiosemitic from southern Arabia approximately 2800 years ago.

Link