Showing posts with label HLA. Show all posts
Showing posts with label HLA. Show all posts

August 16, 2012

Neandertal STAT2 haplotype in Eurasians

Two recent papers have argued that African population structure or late Middle Paleolithic/Upper Paleolithic Neandertal admixture have contributed to the finding that Non-Africans appear to be a few percent more similar to Neandertals than Africans are across the genome. I would add that modern human admixture in the Vindija individual remains a distinct possibility.

What percentage of the ~3% Eurasian excess can be accounted by each of these three processes? The jury is out, and we won't find out until someone decides to tackle the problem comprehensively and/or new ancient DNA samples become available to inform the discussion. African population structure cannot be discounted, and intriguing new evidence may appear thanks to ancient DNA analysis.

But, there is a different approach to detecting Neandertal admixture that zeroes in on specific genomic locations and dissects them in great detail. This single-region approach provides evidence for admixture, without necessarily arguing about how extensive it was.

The single-region dissection was previously used in the Hammer lab to identify the first very convincing evidence for archaic admixture in Africans and Melanesians. In a new paper, Mendez et al. identify a small region in chromosome 12 that shows evidence for archaic introgression from Neandertals, or a species closely related to them.

But, it is worthwhile to begin with a list of other Neandertal introgression candidates from the literature:

Thus far, only a handful of loci have been hypothesized to have entered the human gene pool through archaic admixture and positive selection, including MAPT (MIM 157140),5 MCPH1 (MIM 607117),3 and particular alleles at the HLA locus (MIM 142800, 142830, 142840).6 However, analysis of the Neanderthal genome failed to provide evidence of introgressive alleles at the former two loci.1 Because of its role in fighting pathogens, HLA presents an instance where it is relatively easy to conceive of an a priori reason that acquisition of an archaic Eurasian HLA allele would benefit human ancestors, especially as they expanded into new habitats.7 However, the fact that HLA haplotypes are known to exhibit transspecific polymorphism and show evidence of strong balancing selection 8,9 increases the probability that similarities between modern and archaic haplotypes are due to ancestral shared polymorphism (i.e., as opposed to archaic admixture). In addition, the SNPs tagging the main HLA haplotype that was said to have introgressed were not observed in the Denisova or Neanderthal draft genomes. 
So, what lines of evidence support the notion that the new STAT2 haplotype is the "real deal"?
First, N matches the Neanderthal sequence at all 18 sites that fall within the resequenced 8.6 kb STAT2 region and have Neanderthal sequence coverage (Table 1). Second, N lineages are broadly distributed at relatively low frequencies in Eurasian populations (Figure 3) and are not observed in sub-Saharan African populations (Table S6). Third, the N haplotype extends for ~130 kb in West Eurasians and up to ~260 kb in some East Asians and Melanesians, producing much stronger LD than that observed in sub-Saharan Africans.


Given that the N lineage and the reference sequence diverged ~600 kya, these results suggest that population structure has influenced the recent evolution of this locus. Balancing selection alone is not expected to maintain this extent of LD and consequently is not sufficient to explain these patterns. Moreover, although a strong bottleneck could generate extended LD similar to the levels we observe near STAT2 in non-Africans, it would not explain why the N lineage went extinct in Africa (i.e., why the SNPs associated with the N lineage in non- Africans were not observed in sub-Saharan Africans that are part of our WGS or public SNP panels).


We point out that although a recent common ancestry between a human lineage and Neanderthal sequences might indicate gene flow between Neanderthals and modern humans, this information alone does not inform us about the direction of gene flow. With the additional evidence of the observed extent of LD in modern human sequences, it is possible to infer that the N lineage introgressed into modern humans (either from Neanderthals or another archaic source that contributed to both Neanderthals and AMH).
Actually, the N haplotype is observed in North Africa, but this might be due to relatively recent back-migration. One might also argue that a recent bottleneck in a Eurasian population generated the high degree of LD, and the N haplotype was lost in a back-to-Africa migration, or North-to-Sub-Saharan Africa migration. But, that would not seem to explain how the deeply divergent lineage persisted in the North African population of proto-modern humans for such a long time; the evidence for recent common ancestry of N with the Neandertal haplotype would argue against incomplete lineage sorting (=inheritance of related forms of the haplotype from before the modern-Neandertal divergence).

All in all, this probably represents the best evidence for Neandertal-to-modern introgression to date. As full genomes of different human groups become available, it will be possible to automate this analysis and pick off other such strong signals. This may not indicate the level of admixture, but it might provide strong evidence against the idea of reproductive isolation between modern humans and Neandertals.

It is also noteworthy that this is barely consistent with the coastal migration theory with respect to the origin of Australo-Melanesians, because humans trekking along the coast would not have the opportunity to admix with Neandertals who are completely unattested there in either their physical, or archaeological (Mousterian) form.

But, it is consistent with my Out-of-Arabia theory. Australo-Melanesian Y chromosomes belong to the CF clade of the phylogeny. I have speculated that the post-70ka climate crisis in Arabia spurred some human groups to escape north (CF), and others to remain south (DE). The latter eventually gave rise to the major African lineage, heading west (E), as well as a relic Asian lineage heading east (D) that was later inundated by the descendants of CF. If Australo-Melanesians are descended from the CF folk who went north out of Arabia, then they too would have had the opportunity to admix with Neandertals in the Near East.

The American Journal of Human Genetics, Volume 91, Issue 2, 265-274, 10 August 2012

A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea

Fernando L. Mendez, Joseph C. Watkins and Michael F. Hammer

Signals of archaic admixture have been identified through comparisons of the draft Neanderthal and Denisova genomes with those of living humans. Studies of individual loci contributing to these genome-wide average signals are required for characterization of the introgression process and investigation of whether archaic variants conferred an adaptive advantage to the ancestors of contemporary human populations. However, no definitive case of adaptive introgression has yet been described. Here we provide a DNA sequence analysis of the innate immune gene STAT2 and show that a haplotype carried by many Eurasians (but not sub-Saharan Africans) has a sequence that closely matches that of the Neanderthal STAT2. This haplotype, referred to as N, was discovered through a resequencing survey of the entire coding region of STAT2 in a global sample of 90 individuals. Analyses of publicly available complete genome sequence data show that haplotype N shares a recent common ancestor with the Neanderthal sequence (∼80 thousand years ago) and is found throughout Eurasia at an average frequency of ∼5%. Interestingly, N is found in Melanesian populations at ∼10-fold higher frequency (∼54%) than in Eurasian populations. A neutrality test that controls for demography rejects the hypothesis that a variant of N rose to high frequency in Melanesia by genetic drift alone. Although we are not able to pinpoint the precise target of positive selection, we identify nonsynonymous mutations in ERBB3, ESYT1, and STAT2—all of which are part of the same 250 kb introgressive haplotype—as good candidates.


September 27, 2008

More ASHG 2008 abstracts

The previous batch is here.

Analysis of East Asia Genetic Substructure: Population Differentiation and PCA Clusters Correlate with Geographic Distribution
Accounting for genetic substructure within European populations has been important in reducing type 1 errors in genetic studies of complex disease. As efforts to understand complex genetic disease are expanded to other continental populations an understanding of genetic substructure within these continents will be useful in design and execution of association tests. In this study, population differentiation(Fst) and Principal Components Analyses(PCA) are examined using >200K genotypes from multiple populations of East Asian ancestry(total 298 subjects). The population groups included those from the Human Genome Diversity Panel[Cambodian(CAMB), Yi, Daur, Mongolian(MGL), Lahu, Dai, Hezhen, Miaozu, Naxi, Oroqen, She, Tu, Tujia, Naxi, and Xibo], HapMap(CHB and JPT), and East Asian or East Asian American subjects of Vietnamese(VIET), Korean(KOR), Filipino(FIL) and Chinese ancestry. Paired Fst(Wei and Cockerham) showed close relationships between CHB and several large East Asian population groups(CHB/KOR, 0.0019; CHB/JPT, 00651; CHB/VIET, 0.0065) with larger separation with FIL(CHB/FIL, 0.014). Low levels of differentiation were also observed between DAI and VIET(0.0045) and between VIET and CAMB(0.0062). Similarly, small Fsts were observed among different presumed Han Chinese populations originating in different regions of mainland of China and Taiwan. For example, the four For PCA, the first two PCs showed a pattern of relationships that closely followed the geographic distribution of the different East Asian populations.corner groups were JPT, FIL, CAMB and MGL with the CHB forming the center group, and KOR was between CHB and JPT. Other small ethnic groups were also in rough geographic correlation with their putative origins. These studies have also enabled the selection of a subset of East Asian substructure ancestry informative markers(EASTASAIMS) that may be useful for future genetic association studies in reducing type 1 errors and in identifying homogeneous groups.

Worldwide Population Structure using SNP Microarray Genotyping
We genotyped 348 individuals sampled from 24 populations world-wide using the Affymetrix 250k NspI microarray chip. For context, we added matching genotypes from 210 HapMap individuals for a total of 250,823 loci genotyped in 543 individuals from 28 populations. We included populations from India and Daghestan to provide detail between the genetic poles of Western Europe, East Asia, and sub-Sahara Africa. With so many markers, principal components analyses reveal genetic differentiation between almost all identified populations in our sample. Northern and southern European populations (FST = 0.004, p <0.01) are statistically distinguishable, as are upper and lower caste groups in India (FST = 0.005, p <0.01). All individuals are accurately classified into continental groups, and even between closely-related populations, genetic- and self-classifications conflict for only a minority of individuals (e.g. ~2% between upper and lower Indian castes; k-means clustering.) As expected, the HapMap CHB+JPT, CEU, and YRI samples are most similar to our east Asian, west European, and African samples, respectively. The HapMap CEU samples and our northern European ancestry samples were both collected from Utah. Although individual samples cannot be reliably classified into their collection of origin, the groups are statistically distinguishable despite their high similarity (FST = 0.0005, n.s.). Our Japanese group is also statistically distinguishable from the HapMap JPT group (FST = 0.006, p <0.01), and in this comparison, most samples can be correctly classified. With such large numbers of genotypes, significant differences can be found even between very similar population samplings. Our results provide guidelines for researchers in selecting suitable control populations for case-control studies.

Frequency distribution and selection in 4 pigmentation genes in Europe
Pigmentation is one of the more obvious forms of variation in humans, particularly in Europeans where one sees more within group variation in hair and eye pigmentation than in the rest of the world. We studied 4 genes (SLC24A5, SLC45A2, OCA2 and MC1R) that are believed to contribute to the pigment phenotypes in Europeans. SLC24A5 has a single functional variant that leads to lighter skin pigmentation. Data on 83 populations worldwide (including 55 from our lab) show the variant (at rs1426654) has almost reached fixation in Europe, Southwest Asia, and North Africa, has moderate to high frequencies (.2-.9) throughout Central Asia, and has frequencies of .1-.3 in East and South Africa. The variant is essentially absent elsewhere. SLC45A2 also has a single functional variant (at rs16891982) associated with light skin pigmentation in Europe. Data on 84 populations worldwide show the light skin allele is nearly fixed in Northern Europe but has lower frequencies in Southern Europe, the Middle East and Northern Africa. In Central Asia the frequency of the SLC45A2 variant declines more quickly than the SLC24A5 variant. It is absent in both East and South Africa. In OCA2 we typed 4 SNPs (rs4778138, rs4778241, rs7495174, rs12913832) with a haplotype associated with blue eyes in Europeans. This haplotype shows a Southeastern to Northwestern pattern in Europe with frequencies of .25 (.05 homozygous) in the Adygei to .85 (.75 homozygous) in the Danes. In MC1R we typed 5 SNPs (rs3212345, rs3212357, rs3212363, C_25958294_10, rs7191944) that cover the entire MC1R gene and found a predominantly European haplotype that ranges in frequency from .35 to .65 in Europe, reaching its highest levels in Southwest Asia and Northwestern Europe. Extended Haplotype Heterozygosity (EHH) and normalized Haplosimilarity (nHS) show evidence of selection at SLC24A5 in not only our European and Southwest Asian populations but also our East African populations. Neither SLC45A2 or OCA2 showed evidence of selection in either test. MC1R did not show evidence of selection for our European specific haplotype but we did see some evidence both upstream and downstream in our nHS test in Europe.

Using principal components analysis to identify candidate genes for natural selection.
Genetic markers that differentiate populations are excellent candidates for natural selection due to local adaptation, and may shed light into physiological pathways that underlie disorders with varying frequencies around the world. Principal Components Analysis (PCA) has emerged as a powerful tool for the characterization and analysis of the structure of genomewide datasets. In prior work, we described an algorithm that can be used to select small subsets of genetic markers (SNPs) that correlate well with population structure, as captured by PCA. Our method can be used to detect SNPs that differentiate individuals from different geographic regions, or even neighboring subpopulations. We set out to explore the nature and properties of the genes where population-differentiating SNPs reside, by analyzing the publicly available Human Genome Diversity Panel dataset (650,000 SNPs for 1,043 individuals, 51 populations). Applying our SNP selection algorithms, we chose small subsets of SNPs that almost perfectly reproduce worldwide population structure as identified by PCA. We determined SNP panels both for population differentiation within seven geographic regions, as well as around the globe. We then explored the hypothesis that the selected SNPs attained their current worldwide allele frequency patterns as a response to the pressure of natural selection. Comparing our lists to recently published reports, we found a significant overlap with other genomewide scans for selection, thus validating our hypothesis. For example, EDAR (involved in the development of hair follicles) harbors the most differentiating SNPs in our world-wide panels. SNPs located in genes that are involved in skin and eye pigmentation (OCA2, MYO5C, HERC1, HERC2) are also among the top population differentiating markers. In East Asia, SNPs residing at the ADH cluster appear among the most important SNPs for population structure, while, in Europe, the same is true for genes that are involved in immune response to pathogens (CR1, DUOX2, TLR, and HLA). Finally, a comprehensive gene ontology analysis is presented.