Search This Blog

Paper 2



 “But the Sasquatch Mitochondrial
DNA is all Human…”  Really?

By Haskell V. Hart
 
 
 Copyright(c) Haskell V. Hart all rights reserved.  May be reproduced for personal, noncommercial use only.
 
Abstract
 
While the nuclear DNA of three samples in the Ketchum et al. sasquatch study has been shown to be from a bear, a human, and a dog, many supporters of this study point to the result that the mtDNA is human, which indicates a human hybrid sasquatch.  In this paper the published mutations of all 29 mtDNA samples in the Ketchum study were examined in detail and compared to the Poisson Distribution of mutations.  Although most samples were within the normal range of number of private (extra) mutations (≤6) from their respective haplogroups (and had probabilities ≥ 2.3%), eight of the 18 samples with complete mtDNA sequences exceeded six mutations (<1% probability).  Of the other 11 samples with HVR-1 only results, eight of these had an extra mutation.  Possible reasons for this are discussed.  It was determined that nonhuman samples cannot produce human-like mtDNA results by submission of blind samples of cat, dog, and horse mtDNA (plus two human controls) to a reputable commercial laboratory.
 
Introduction
 
In February, 2013, Ketchum et al.[1] published a paper that reported sequencing three samples (S26, S31, and S140) of nuDNA and 29 samples of mtDNA, with the conclusion that these were from a hybrid of a previously unknown primate male and a modern human female.  Subsequently the three nuDNA samples were shown to be from a black bear, a human, and a dog, respectively.[2,3(Samples 25104, 25106),4]   The Ketchum claim also relied on the results of the mtDNA sequencing, which for all 29 samples were reported to be human.  Because of this puzzling contradiction for S26 and S140, all the reported mtDNA mutations in Ketchum et al.[1] “Supplementary Data  2” have been examined in detail for anomalies.  There were many.
 
Methods and Materials
 
1.  Computer
The mtDNA mutations in the Ketchum et al.[1] Supplementary Data 2 were found to be based on rCRS (revised Cambridge Reference Sequence), which is haplogroup H2a2a1.  There is no explanation in text or caption to indicate this or that 11 samples were HVR-1 (a.k.a. HVS-1 or HV-1) only and 18 samples were from complete mtDNA sequencing. Throughout the table, there were some mutations without a suffix (A, G, T, or C).   These were counted as having the appropriate suffix for a transition based on the accepted human mtDNA Phylotree Build 16 (Feburuary 19, 2014) prefix,[5] and assuming only A→G, G→A, T→C, C→T (no transversions).  The 18 sets of complete mtDNA mutations were entered as input to the program FASTmtDNA to convert them from rCRS to RSRS (Reconstructed Sapiens Reference Sequence).[6]  This system references all mutations from the MCRA (Most Recent Common Ancestor) of all humans, “Mitochondrial Eve”.  While either reference system could have been used, the choice of RSRS allows the use of the program mtDNAble, which uses the output of FASTmtDNA to determine the haplogroup and the RSRS based mutations (output in a Microsoft Excel™ file).  Both programs are available free through mtDNA Community, an outgrowth of [6], and can be downloaded free from its website: http://www.mtdnacommunity.org/downloads.aspx.
All searches and alignment comparisons were performed with BLAST™ search/match software[7, 8] against the GenBank Nucleotide Database on the National Center for Bioinformation (NCBI)  website: http://www.ncbi.nlm.nih.gov/ .  The use of this database and search/match software is free to the general public, so that every result reported here can be verified.  Such is not the case in Ketchum et al.[1]
 
2.  Statistical
 
Random, extra or “private” DNA mutations are known to follow the statistical Poisson Distribution, for example [9].  Such mutations do not define the haplogroup and are considered “extra” in that context.  The Poisson Distribution describes any process which involves multiple occurrences, each with a fixed probability per unit length (such as a DNA sequence), unit area, unit volume, or unit time. A common example is the number of telephone calls per hour to a service center.  In our case the probability of a particular human sample having X extra mutations in the complete mitochondrial genome is
  
           
              
where  , and λ equals the average value of X over many independent trials, here different humans.  This relationship will be applied to mutations in NCBI database entries (haplogroup H1a), and the resulting value of λ will be used to calculate probabilities of occurrence for the number of mutations in the Ketchum Supplementary Data 2 samples with complete mitochondrial sequences.  The likelihood of these samples being from the known human population will be calculated from Equation (1) as Pr(X=k) for samples with k mutations.
A more sophisticated statistical model is the negative binomial distribution, in which λ is allowed to vary according to the different probabilities of a mutation at different sites.[10]  This model requires much larger data sets (high hundreds to thousands of samples) to be justified. 
All statistical calculations and graphics were done with Microsoft Excel™.
 
3. Biological
 
Blind buccal swab samples of cat (Felis catus), dog (Canus lupus familiaris) and horse (Equus caballus) and two each from two unrelated humans (one male, one female) as controls were submitted to a reputable DNA sequencing laboratory as human samples according to their written protocols, which required that duplicate samples be submitted.  Results were returned through their website.    
 
Results and Discussion
 
Results of the biological sample submissions were as follows:
F. catus:  Failure to Sequence.
C. lupus fam.:  Failure to Sequence.
E. caballus:   Failure to Sequence.
Control 1 (male) 1) HVR-1 and HVR-2:  Clade H;   2) Complete sequence: Haplogroup H27.
Control 2 (female) 1) HVR-1 and HVR-2:  Clade W;   2) HVR-1 and HVR-2: Clade W.
In the case of each human control, the reported HVR-1 and HVR-2 mutations were identical for submissions 1) and 2).  From this data it is seen that the primers used by this laboratory are specific to human mtDNA and will not amplify nonhuman mammalian mtDNA.  Also, the laboratory’s sequencing is reproducible.  Whole genome mutations from Control 1, submission 2) were inputted to FASTmtDNA, and those results to mtDNAble, which computed the haplogroup H27 as a check, the same haplogroup determined by the commercial laboratory (above).     
Initial focus was on S26 (the bear with haplogroup H1a). The NCBI Nucleotide database was queried for entries with “H1a” in the title or the “haplogroup =     “ fields.  Thirty-five were found.   Insertions at positions 309, 515- 522 are discounted as well as any mutations at 16182, 16183, 16193, or 16519, as these are very common and not used in the mtDNA Phylotree Build 16 construction of van Oven[5 ], which was used to verify every extra mutation.  Also, missing topological mutations were considered reverse mutations and therefore included in the number of extra mutations.  The average number of mutations in this set was 2.3714, which was used as the value of λ, from which the Poisson Distribution of mutations (n=35) was calculated by Equation (1) for comparison.  These results are tabulated in Table 1 and graphed in FIG.  1.  From these it is seen that S26 is an outlier with probability of occurring of only 0.000000004, or 1 chance in 224,056,304 of being in the known human population.  The statistical comparison in FIG. 1 is very similar to other applications of the Poisson Distribution to human mtDNA mutations[9], but is tighter (lower value of λ) because only one haplogroup is involved, whereas [9] involved entire geographical populations in each statistical analysis. Divergence causes λ to increase.
 
 

 
 

 
 

Table 1.  Poisson Distribution for H1a

No. of  Mutations(k)a

No. of Samplesa

Pr(X=k) b

One chance

 Inc

35 *Pr(X=k)d

0

3

0.093347278

11

3.3

1

7

0.221366401

4.5

7.7

2

9

0.262477305

3.8

9.2

3

10

0.20748206

4.8

7.3

4

3

0.123007221

8.1

4.3

5

2

0.058340568

17

2.0

6

1

0.023058415

43

0.81

7

0

0.007811626

128

0.27

8

0

0.002315589

432

0.081

9

0

0.000610139

1,639

0.021

10

0

0.00014469

6,911

0.005

11

0

3.1193E-05

32,059

0.001

12

0

6.16432E-06

162,224

0.0002

13

0

1.12448E-06

889,299

0.00004

14

0

1.90473E-07

5,250,081

0.000007

15

0

3.01129E-08

33,208,345

0.000001

16

0

4.46316E-09

224,056,304

0.0000002

17

0

6.22593E-10

1,606,186,760

0.00000002

__________

__________

Sum

35

1.000000000

35.00000000
 
a Corrected to discount common mutations not used to construct Phylotree Build 16[5], see text.  Blue bar graph in FIG. 1.

b From Equation (1) with λ = 2.3714, the average k for 35 H1a samples, see text.

c = 1/ Pr(X=k)

d The expected value for k mutations in a set of 35 samples, rounded off to conserve space.  Red bar graph in FIG. 1. 
 
Sums reflect full accuracy of each entry.

 
 

 
 


FIG. 1.  Distribution of mutations from H1a in Phylotree[5]: 35 H1a samples (blue) and the corresponding Poisson Distribution (n=35) with the same value of λ = 2.3714, as calculated from Equation (1) (red).  Above eight mutations the probabilities are so low that the bars are not plottable or observable on this scale (see Table 1).

 

 

Table 2 presents mutation results for all 18 samples with a complete mitochondrial genome sequence.  Haplogroups and raw rCRS based mutations were taken from Ketchum et al.[1] Supplementary Data 2 and the latter converted to a haplogroup and a RSRS based list of mutations.  Numbers of mutations were corrected as described above for H1a.  Most haplogroup differences from Ketchum to mtDNAble were minor.  Major differences for S26, S39b, and S44 were due to large numbers of extra mutations.  These samples did not fit into the established human mitochondrial phylogenetic tree of mutations at www.phylotree.org.[5] as discussed below.  Assuming the same Poisson Distribution for H1a in Table 1, eight samples had mutations with probabilities less than 0.01 (less than1% chance, or 1 in over 100), most of these considerably less.  As a cross check, all samples were aligned against the NCBI Nucleotide database, which has 27,156 complete human mitochondrial sequences as of July 20, 2014.  The same eight samples had poor matches against the database, i.e. unacceptably high mismatches and gaps.  Note that although the %ID for these eight samples vs. their best matches was greater than 99.9%, the human mitochondrial genome is so well defined that 0.1% deviation from established haplogroups is an outlier.  In fact, the results for H1a database samples in Table 1 suggest that greater than 6 / 16,568 (0.036%) extra mutations constitute an outlier worthy of reexamination (less than 1% probability). An additional 20 H1a samples found on the mtDNA Community website: http://www.mtdnacommunity.org/downloads.aspx NCBI samples utilized by "mtDNA Community" also had only 1-3 extra mutations each, consistent with the disributions in Table 1 and FIG. 1. 
A survey of T2b sequences in the Nucleotide Database confirmed that S2 is an outlier, while S1, S12, S36, ES-1 and ES-2 are within the normal range of extra mutations for T2b.  Similarly, S29, S44, S46, and S138 are outside the normal range of extra mutations for H2a2.  In both cases distributions of extra mutations were similar to H1a, confirming that the H1a statistics in Table 1 and FIG. 1 are valid enough for other haplogroups to identify outliers as in Table 2.  Much greater numbers of samples might reveal slightly different distributions (λ) for each haplogroup, however.  Such an analysis is beyond the scope of this study.   


 

 

Table 2. Samples with Complete mtDNA Sequences
Sample IDa
Haplogroup
No. of 
Poisson
One Chance
Ketchum
mtDNAble
Mutations, kb
Pr(X=k)c
Ind
Best Matche
1
T2b
T2b
2
0.262
3.8
3,0 (JX153739.1)
2
T2b
T2b
13
0.000001
889,299
14,0 (JX153739.1)
4
H3
H3
4
0.123
8.1
3,1 (JX153639.1)
11
A6L2c
L2c3
4
0.123
8.1
9,0 (DQ304989.1)
12
T2b
T2b
6
0.023
43
8,0 (JX297190.1)
24
H1s
H1ba
3
0.207
4.8
4,0 (JQ702799.1)
26
H1a
H5e
16
0.000000004
224,056,304
16,0 (JX153188.1)
28
H1
H1ba
8
0.002
432
7,1 (KF161997.1)
29
H2a2
H2a2
7
0.008
128
8,0 (JX153451.1)
31
L0d2a
L0d2a1
4
0.123
8.1
5,0 (KC346174.1)
35
H10
H10e
0
0.093
11
0,0 (KC257308.1)
36
T2b
T2b
3
0.207
4.8
5,0 (JX297190.1)
37
H3
H3
2
0.262
3.8
2,0 (JX153533.1)
38
V2
V2c
1
0.221
4.5
0,0 (JQ705254.1)
39b
T2
R2'JT
12
0.000006
162,224
17,0 (KJ690074.1)
44
H2a2
T2
17
0.0000000006
1,606,186,760
15,0 (JQ703290.1)
46
H2a2
H2a2
12
0.000006
162,224
13,0 (JX153451.1)
138
H2a2
H2a2
7
0.008
128
8,0 (JX153451.1)
C-3
HV
HV
4
0.207
4.8
4,0 (KC765916.1)
ES-1f
none
T2b
1
0.221
4.5
2,0 (JN106403.1)
ES-2f
none
T2b
2
0.262
3.8
3,0 (JX153739.1)

 

Outlier, <1% probability of being in the normal human population.   Pr(X=k) <0.01, or One Chance In  >100.

a From Ketchum et al.[1] Table 1 and Supplementary Data 2.

b Corrected to discount common mutations not used to construct Phylotree Build 16[5], see text.

c Calculated from Equation (1) with λ = 2.3714, same as in Table 1.  Rounded off to fit the  table.

d = 1/Pr(X=k) as in Table 1.

e Best match in Nucleotide database as:  mismatches, gaps (Accession Number).

f Extra Sample not contained in Ketchum et al.[1] Table 1 or haplogrouped in Supplementary Data 2.  Possible controls (?).

 

 
The existence of an evolutionary phylogenic tree requires that mutation occur in a continuous fashion down each branch of the tree only.   Cross overs between branches are not allowed.  Such a pattern of evolution would resemble a maze or a lattice, not a tree.  Yet this is what samples such as S26, S39b and S44 would require to be human, since they don’t match any one haplogroup very well.  Supplementary FIG. 1 demonstrates this point for S26.  S39b and S44 present similar phylogenetic dilemmas.    
The Phylotree of mtDNA was built on over 20,000 human samples.  As new data is added, radically new deep-rooted branches do not occur at this advanced stage of Phylotree (Build 16).  New human mtDNA phylogeny is nearly always added near the tips of the existing branches.  It would take a lot more samples to reconcile S26, S39b and S44 with any future build of the Phylotree.  Such is not likely to occur.  It may even be impossible.   
The 11 samples of HVR-1 (only) mutations were not analyzed statistically because of the many examples of ambiguity involved in this screening technique, primarily due to homoplasty.[11]  For this reason, DNA laboratories usually report at least HVR-1 AND HVR-2 mutations in deciding a haplogroup, actually only the clade, as for example in the case of the biological control samples above.  A complete haplogroup requires a complete sequence.  In our case, eight samples have an extra mutation, and one is misgrouped but otherwise normal (S33 should be U5).  Three of the eight (S71, S117, and S118) have equally likely alternate haplogroups (L3d or L3e4) all according to [6], Supplementary Table S2. These three present the same phylogenetic conundrum as S26 mentioned above (SUPP. FIG. 1). There could be other extra mutations in the HVR-2 and coding regions which were not determined by this limited laboratory analysis.  Thus, only three of the 11 should be considered as phylogenetically human by this very limited HVR-1 criterion.
 
Conclusions
 
Based on a limited study of cat, dog, horse, and human samples, nonhuman samples do not amplify or sequence with human mtDNA primers.  Consequently, samples that analyze for human mtDNA and animal nuDNA MUST BE MIXTURES of the species, e. g. S26 and S140.  The 1 – 10k X per cell number advantage of mtDNA over nuDNA allows a much smaller number of human cells (0.01 to 0.1% of total) to show up in mtDNA analysis; and if only human primers are used, the results can be misleading.  Over a year ago, the author recommended to Dr. Ketchum than nonhuman primers (such as bear and dog) be used on some of the ambiguous samples.  No response was received.  If only human primers are used, only human mtDNA will be sequenced.  A sea of nonhuman mtDNA would be missed.  The same applies to nuDNA at specific gene loci.  This subject will be addressed in a future paper.
Samples 2, 26, 28, 29, 39b, 44, 46, and 138 are not human as defined by the accepted Phylotree for human mtDNA (all with less than 1% probability of occurring, most much less).   However, they do not match any other animal nearly as closely as they match human (top 500 matches were all human in every case). 
Of the 11 HVR-1-only samples eight have one extra mutation.  Another one was misgrouped but normal, which leaves a total of three that are potentially human as far as the limited HVR-1 information can tell.  However, the remarkable claim of sasquatch mtDNA demands a full sequence for credibility.
The two most likely causes for these anomalies are:
(1)  Sequencing errors, due to small amounts of sample and/or contamination, and/or degradation.
(2)  Primate male/human female hybridization events, as suggested by Ketchum et al., in the sufficiently distant past that additional mtDNA mutations have since occurred along nonhuman evolutionary lines.  Keep in mind that Denisovan mtDNA matches human 99.7%.  Neanderthal matches human 98.9%.  Denisovan matches Neanderthal 99.7%.  So the purported sasquatch hybridization events resulting in present day 0.036 – 0.1% mismatches for 18 samples (99.96-99.9% agreement) would had to have occurred more recently than the divergence of these two subspecies (assuming equal rates of mutation throughout).
Sample 26 remains an enigma.  Bear nuDNA cannot go with nearly human mtDNA.  Human and bear mtDNA align only about 75%, and the species have different numbers of chromosomes (46 and 74, respectively).  We are left with (1) as the cause for mtDNA anomalies in this sample.
Sample 140 (dog nuDNA) has an anomalous mtDNA mutation, 16176?, in the HVR-1 region, which might also be due to (1).  Dog nuDNA cannot go with human mtDNA either (78 vs. 46 chromosomes, respectively).
Sample 31 (human nuDNA) does indeed have an acceptable number of mtDNA mutations (4) for a human sample.  This sample IS human by ALL DNA measures.  Could it also be a sasquatch?  Only if sasquatch are feral humans. In that case, why do so many of the mtDNA samples fail the test, i.e. fall outside the range of human mutations, as shown above?  Are more than one species or subspecies involved?  Are hybridization events sufficiently spaced out in time to allow greatly different numbers of subsequent mutations (as in Table 2)?  Could these be along nonhuman evolutionary lines?  Answers to these and other such questions require much more data, good data from controlled samples.
There is enough information in this paper for anyone to validate every single result. 
Future work will address the specific gene sequences in Ketchum et al.[1], Supplementary Data 3, and on the Sasquatch Genome Project website: http://sasquatchgenomeproject.org/.  These too are not all what they were reported to be. 
Conflict of Interest 
 
The author declares no conflicting interests.
 
Acknowledgement
 
Thanks go to the Sasquatch Genome Project for sharing their data online.  No financial support was received for this work.
 
References
 
[1]        Ketchum, M. S. et al. Novel North American Hominins: Next Generation Sequencing of Three Whole Genomes and Associated Studies. DeNovo, 2013, 1:1, Online only: http://sasquatchgenomeproject.org/view-dna-study/
[2]        Khan, T.; White, B.  Final Report on the Analysis of Samples Submitted by Tyler Huggins, Wildlife Forensic DNA Laboratory Case File 12-019; Trent University Oshawa: Peterborough, Ontario, Canada, 2012. http://www.bigfootbuzz.net/bart-cutino-tyler-huggins-release-sierra-kills-sample-dna-results/
 
[3]        Sykes, B. C.; Rhettman A.; Mullis, R. A.;  Hagenmuller, C.; Melton, T. W.;   Sartori, M. Genetic Analysis of Hair Samples Attributed to Yeti, Bigfoot and Other Anomalous Primates.  Proc. R. Soc. B, 2014, 281, 20140161.
 
[4]        Hart, H. V.  Methodology and New Metrics for Distinguishing Related Species from Incomplete nuDNA. Unpublished. 
 
[5]        van Oven, M. Revision of the mtDNA Tree and Corresponding Haplogroup Nomenclature. Proc. Natl. Acad. Sci. USA, 2010, 107(11), E38-E39.   http://dx.doi.org/10.1073/pnas.0915120107  
[6]        Behar D.M.; van Oven, M.; Rosset, S.; Metspalu, M.; Loogväli, E.-L.; Silva, N. M.; Kivisild, T.; Torroni, A.; Villems, R.  A “Copernican" Reassessment of the Human Mitochondrial DNA Tree from Its Root.  Am. J. Hum. Genet., 2012, 90(4), 675-684. http://dx.doi.org/10.1016/j.ajhg.2012.03.002
[7]        Altschul, S. F.; Gish, W.; Webb, M.; Meyers, E. W.; Lipman, D. J.  Basic Local Alignment Search Tool.  J. Mol. Biol., 1990, 215 (3), 403-410.  
[8]        Madden, T. The BLAST Sequence Analysis Tool, In The NCBI Handbook; McEntyre J; Ostell J., Eds.; National Center for Biotechnology Information: Bethesda, MD, 2003; http://www.ncbi.nlm.nih.gov/books/NBK21097/.
[9]       Di Rienzo, A.; Wilson, A.C.  Branching Pattern in the Evolutionary Tree for Human Mitochondrial DNA.  Proc. Nat. Acad. Sci. USA, 1991, 88, 1597-1601.
[10]     Tamura, K; Masatosi, N.  Estimation of the Number of Nucleotide Substitutions in the Control Region of Mitochondrial DNA in Humans and Chimpanzees. Mol. Biol. Evol., 1993, 10 (3), 512-526.
[11]      Behar, D. M.; Rosset, S.; Blue-Smith, J.; Balanovsky, O.; Tzur, S.; Comas, D.; Mitchell, R. J.; Quintana-Murci, L.; Tyler-Smith, C.; Wells, R. S.  The Genographic Project Public Participation Mitochondrial DNA Database.   PLOS Genet., 2007, 3(9), e169.
 
 

 

 
 
 

 
 
SUPP. FIG. 1.  Alternate haplogroups for S26:  haplogroup H1a with evolution along (1) in red, and haplogroup H5e, with evolution along (2) in blue, are equally likely, but crossovers to include all six mutations are not allowed.  In either case, there are left three more extra mutations, for a total of 16 as seen in Table 2.  Black segments are schematic and incomplete.  For complete subbranches see [5, 6].  Directions (1A) and (2A) represent hypothetical (?) sasquatch (nonhuman) evolution paths SINCE a hybridization event involving a H1a human female or a H5e human female, respectively.  Each path would have 16 more mutations (not listed above) to reach S26, which would include the three mutations shown above on the red or blue path to the other haplogroup.  Similar phylogenetic conundrums exist for S39b and S44 also (see Table 2).      
 
 

 
 







 

 
 
 
 
 
 
 

No comments:

Post a Comment