Thursday, February 7, 2019

Largest Shared Segment Variation from Half Life Equation

HAM Group #1 

 

Largest Shared Segment Variation from Half Life Equation




This table is intended to report out the variation in error from the autosomal Half Life equation for HAM Y-DNA Group #1, and mention a few related problems.
  

HAM Group #1 Largest Shared Segment Variation from Half Life Equation


Here, I am using the autosomal half life equation on the largest shared segment, when compared to the closest (confirmed) cousin level.

Comparing Maximum cMs for the Half Life Equation

Although it would appear that using the setting of maximum cMs per chromosome should in general produce an improvement in estimates, an excessive IBD segment can throw off a much larger error. For example, between 4C1R T133xxx and T611xxx the error from using the maximum cM values throws off significant error. In this case, the segment begins just within a known "Excess IBD" area, and ends well outside the known "Excess IBD" area.

The table shows that on average, the error from the use of 281.5 cMs is slightly less than using the maximum per chromosome. I should emphasize that these are the relationships as I have it today.

For a different comparison of error, 5th cousin T074xxx and T611xxx share a different segment on chr 21, which is outside the known "Excess IBD" area. The answer may or may not be endogamy, so it might be interesting to examine why using chromosome maximums with the half life equation might throw error from a non "Excess IBD" area.

Endogamy or Excess IBD?

Wendell and Dave relate a couple of times. The basic thought there is that William Ham, Jr. and Thomas Ham were two brothers that married two sisters, Catherine Eldridge and Ann Eldridge. Also, Wendell descends twice from our common ancestor William Ham, Sr. and wife.

So for me that is:

Dave -> Rufus -> 1) Ambrose 2) Joshua 3) Eli 4) William Ham, Jr. 5) William Ham, Sr.

Or, 5 generations back.

For Wendell, that goes like this:

Wendell -> mother -> 1) Nettie 2) Nelson 3) Larkin 4) Thomas Ham 5) William Ham, Sr.

For the two Eldridge sisters:

Dave -> Rufus -> 1) Ambrose 2) Joshua 3) Eli 4) Catherine Eldridge 5) Zachariah Eldridge
Wendell -> mother -> 1) Nettie 2) Nelson 3) Larkin 4) Ann Eldridge 5) Zachariah Eldridge

On top of that, Wendell is also my 5C1R cousin via Levi Ham:

Wendell -> mother -> 1) Nettie 2) Tabitha 3) Luvene 4) Levi 5) Thomas Ham 6) William Ham, Sr.
Wendell -> mother -> 1) Nettie 2) Tabitha 3) Luvene 4) Levi 5) Ann Eldridge 6) Zachariah Eldridge

Therefore, Wendell and I can relate at the 5th cousin level to these common ancestors:

1) William Ham, Sr.
2) the wife of William Ham, Sr.
3) Zachariah Eldridge via daughter Catherine Eldridge, the wife of William Ham, Jr.
4) the wife of Zachariah Eldridge
5) Thomas Ham, the son of William Ham, Sr.
6) the wife of Thomas Ham, Ann Eldridge (i.e. daughter of Zachariah Eldridge)

Just to emphasize, I can relate to Wendell *as 5th cousin* through either Zachariah Eldridge, the wife of Zachariah Eldridge, the daughter of Zachariah Eldridge, via William Ham, Sr, or his son Thomas Ham, depending upon which line that you happen to follow.

To be brief, the closest ancestor that Wendell and I share in common is at the 5th cousin level. But there are multiple shared ancestors at that level for us for multiple reasons.

In order to be specific to each 5th cousin ancestor, that requires a comparison of the starting and ending locations to the matching shared segment. So, with enough participants, we should eventually be able to sort out which segment belongs to which ancestor.

Basically, the procedure to separate out multiple ancestors in common is to see who else shares the same starting and/or ending location at about the same size. Often, that leads you back to a recurring surname (from the general population).

Most vendors currently set a minimum requirement for a match at 7 cMs.

Of the above 17 known relationships for this Group #1 autosomal study:

 - 60 % (3 out of 5) of the 4th cousins do not meet the minimum matching requirements.
 - 57 % (4 out of 7) of the 5th cousins do not meet the minimum matching requirements.

Among those above 3rd cousins, 58 % (7 out of 12) do not meet the minimum matching requirements.
Which means overall, 41 % (7 out of 17) do not meet the minimum matching requirements.

Another Example:

What strikes me as interesting is the relationship between 5th cousins Jimmy and Dave (T103xxx and T611xxx). The two do not show up as a match at Family Tree DNA (FTDNA), yet have a similar relationship where two first cousins married two Brinegar females. (Isaac Ham married Mary Brinegar and Eli Ham married Celia Brinegar).

Note: T103xxx and T611xxx are an exact match on the Y-DNA at 111 markers (40777 and 478168).

So, at the 5th cousin level:

 - they should relate in more than one way (Ham and Brinegar),
 - the segment is NOT in a known "Excess IBD" area,
 - they are an exact match at 111 markers on the Y-DNA,

and yet they do not show up as a match from the vendor because the largest shared matching segment is 6.7 cMs (which is below the vendor's lower limit threshold of 7 cMs).

The sum of shared segments greater than 1 cM between T103xxx and T611xxx is 68.7 over 29 segments. Again, this is not compatible with the Shared cM Project, due to the difference in parameter set at GedMatch using the "One to One' Utility.

Data has not yet been collected from GedMatch Genesis.

Multiple Relationships?

I should note that if the Brinegar ladies are sisters (not yet confirmed), then the closest relationship between T103xxx and T611xxx would be at the 3rd cousin level (5th cousins by Ham, and 3rd cousins by Brinegar). Also, T611xxx and A658xxx also be 5th cousins and 3rd cousins in the same manner. It is possible that the Brinegar ladies are cousins, instead of sisters.

Which means, these relationships will not be clear until each segment can be verified at their starting and ending locations from multiple kits, as well as more documentation in some cases. We would want to see "who else" matches at the same starting and ending locations.

Therefore, the variation in error reported may be different than it now appears. For example, the sums could work out better if the Eldridge and Brinegar lines were better known.

It could also mean that FTDNA is not picking up 3rd cousins using the autosomal DNA either.

Of those participating in the tiny autosomal study, we have similar starting and ending locations, and for the most part, the same size. However, they are not yet matching exactly:

Kit #1        Kit #2    Chr     Start           End            cMs      SNPs

A561xxx       A832xxx    20      55,600,000      60,800,000       19.9    1750
A438xxx       A832xxx    20      55,900,000      57,900,000        6.2     640
A658xxx       T074xxx    20      56,700,000      58,400,000        5.9     563
T103xxx          T611xxx    20      57,500,000      59,000,000        6.7     511
T133xxx          T368xxx    20      59,200,000      60,700,000        6       505

Which is an indication that we have the same ancestor in common at about the 5th cousin level. So, they do not yet exactly triangulate. More data should fill that in and confirm that "hint" by matching starting and ending locations much better. We will not know whether this is male or female with any confidence until we can fill in the gaps in order to see if another surname fits there.

The Half Life equation would put the 19.9 cM segment at about the 3.8 cousin level, but because we have older 'almost matches' in our common surname study, we may presume that the area already existed previously at about the 5th cousin level.

For those of you in the Group #1 "Tiny Segment Study," that is the reason I sent out the spreadsheets at about the end of October. In order to view that, see:

 "Spreadsheet_Combine_All_at_GEDMatch_Output_ByChr_101818"

Ancient DNA?


Of the 9 kits in the listing above, it looks like brother and sister A561xxx and A832xxx have attached an "Excess IBD" to the older segment, which appears to fill in for all. This would make it an apparent "recombination." This is not a known "Excess IBD" area, so this is more likely be due to either a "persistent" ancient DNA segment, or due to endogamy at the 4th cousin level. You might notice that the RATIO of SNPs/cMs for all of the 9 kits is less than 100, when the largest segment between mother and son has a RATIO at about 200 (except for out 23andMe data). I usually suspect ancient DNA when the ratio falls below 200. Chromosome 20 ratio using maximum cMs and SNPs is about 155.

I should note here that FTDNA and GedMatch has a match between T611xxx and T133xxx at a different location entirely (45,000,000 to 49,000,000 at 6.8 cMs to 8 cMs).

Of the above 9 kits, they have:

A438xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
A561xxx Most Common Ancestor from NC? d: 1858 Hickman, Arkansas
A832xxx Most Common Ancestor from NC? d: 1858 Hickman, Arkansas
A658xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T074xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T103xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T133xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T368xxx Most Common Ancestor is from Patrick County, Virginia
T611xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA



Shared cM Project

Sum of segments have been included for comparison with the Shared cM Project.
Here, the sums are not compatible with the Shared cM Project, mainly due to the difference in parameters (for example: 250 SNPs, Bunching Limit of 100 SNPs, and minimum 1 cM).

Here, sums by themselves are not a good indicator of (distant) relationships.

Updated Feb 7, 2019 to indicate that Wendell and Dave would be 5C1R via Levi Ham, instead of 6th cousin.

Further Reading:


"Autosomal DNA Half Life Equation"  Ham Country blog

 http://hamcountry-blog.blogspot.com/2018/02/autosomal-dna-half-life-equation.html

"Spreadsheet_Combine_All_at_GEDMatch_Output_ByChr_101818" sent to participants in this study by Dave Hamm

"Visualizing Data From the Shared cM Project" by The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/05/29/visualizing-data-from-the-shared-cm-project/

"The Shared cM Project – An Update" by The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/05/25/the-shared-cm-project-an-update/
"The Shared cM Project –Version3.0(August 2017)" by Blaine T. Bettinger
  (PDF file)
 https://thegeneticgenealogist.com/wp-content/uploads/2017/08/Shared_cM_Project_2017.pdf

"Collecting Sharing Information for Known Relationships (Part 1)" The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/03/04/collecting-sharing-information-for-known-relationships/

"Collecting Sharing Information for Known Relationships – Part II" The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/04/06/collecting-sharing-information-for-known-relationships-part-ii/



Articles on Triangulation of segments by Starting and Ending Locations:



"A Study Utilizing Small Segment Matching" by Roberta Estes at DNAeXplained

 https://dna-explained.com/2015/01/21/a-study-utilizing-small-segment-matching/

"A Triangulation Intervention" The Genetic Genealogist

 https://thegeneticgenealogist.com/2016/06/19/a-triangulation-intervention/


More Information for HAM DNA Group #1 at HAM Country:

 http://ham-country.com/HamCountry/HAM_DNA_Project/Groups/HAM_DNA_Group001.html











Thursday, September 20, 2018

Ancient DNA Clovis Anzick and HAM DNA Group #1



Ancient DNA Clovis Anzick and HAM DNA Group #1


Because of the destroyed or missing records in Virginia, I had been working on the autosomal Half Life equation in order to tie our group to Somerset by use of autosomal DNA. Previously, we had seen that two kits from Somerset, England are a match to HAM DNA Group #1 (I1-M253).

I has recently noticed that the Half Life equation was throwing off errors, or variation from what one would expect to see from the Half Life equation. This was particularly troublesome when the SNP density ratio was between 1.0 and 1.5 where RATIO = SNPs/(100*cMs).

So, when I ran across an article on Ancient DNA by Roberta Estes, I became curious as to what the half Life equation might look like when used on Ancient DNA.

"Analyzing the Native American Clovis Anzick Ancient Results" DNAeXplained – Genetic Genealogy


Roberta had been talking to Felix Chandrakamur about the Clovis results that had been uploaded to GEDMatch, and Roberta had noticed that the Clovis upload was matching living people. She had found 1466 matches to Clovis at GEDMatch above the 7 cM level.

This is a stunning result. For a little background on the Half Life equation, you can find it's limit by plugging "1 cM," which shows that the equation is built to display a result of about 8th cousin level (or 9 generations) for a shared segment of 1 cM in size.

   Half Life = - LN(1/281.5)/.693147


   Half Life = 8.1

For the 11th cousin level, it needs a segment size of 0.1 cMs.

Clearly, if the Clovis sample is 12,500 years old and is matching living people at 7 cMs and above, then the Half Life equation is useless in it's current form.

Now, the usual argument might be that these Clovis samples are Identical by State (IBS), and not Identical by Descent (IBD). For example, the current ISOGG statistics show that the smallest shared segment equivalent to 5th cousins is 3.32 cMs.

See:

"Autosomal DNA statistics"   at ISOGG


Or see also:

"Cousin statistics"   at ISOGG


See the scientific paper for the expected (i.e., theoretical) number of cMs at the 5th cousin level:

"Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

   
 
Table 1. Expected extent of IBD and number of cousins for 1st–10th degrees of cousinship.

https://doi.org/10.1371/journal.pone.0034267.t002



Also, it is instructive to note that a good quality 10 cM segment was extracted from the Altai Neanderthal who lived 50,000 years ago in Siberia.


 - see "The complete genome sequence of a Neanderthal from the Altai Mountains"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4031459/



"To estimate the extent of their relatedness, we scanned the genome for 1Mb regions where most non-overlapping 50-kb-windows were devoid of heterozygous sites and merged adjacent regions (SI 10). The Neandertal genome has 20 such regions longer than 10cM whereas the Denisovan genome has one."

Finally, Roberta Estes wrote an article regarding possible sampling errors, due to the nature of conversion for upload to GEDMatch. The basic concern appears to be "No-Call" rates:

"Ancient DNA Matching – A Cautionary Tale"   DNAeXplained – Genetic Genealogy, by Roberta Estes


Roberta also explains that subsequent comparisons do not match previous comparisons. I have seen that with GEDMatch data, particularly when he changes versions of the 'One to One' Utility (which is now at version 2.1.1(c). I have had to re-work the data for Group #1 several times over the years, just to keep the data consistent with the current version of the GEDMatch 'One to One' Utility.



Also note that these Clovis segments for kit F999919 match to 12 cMs on living Native Americans.


With that, the first thing I wanted to know was to take a look at the SNP density RATIO for the Clovis comparisons. How do ancient Clovis segments compare with SNP density RATIO, and do these ancient segments also throw off variations from the Half Life equations?


  RATIO = SNPs/(100*cMs)

I took a look at hr matching segments for HAM DNA Group #1, plus our 'control group' kit, Arnold (22 kits). I found that out of the seven largest segments matching Clovis, all but five had a RATIO of less than 2.0, or about 3.2%.

To put that into perspective, 7 x 22 = 154 segments

    5 segments/154 segments = 3.2%

That means, for our sample (the Ancient Clovis DNA matching shared segments), 97% have an SNP density RATIO of less than 2.0

Among the Clovis shared segments for the group, there are about 8% that are in 'Excess IBD' regions.

Below is a summary table of the results.



 
Clovis Largest Shared Segment and HAM DNA Group01


  

The kit in HAM DNA Group #1 with the largest matching segment to Clovis:
   A404xxx at 5.9 cMs

- Kits with matching Clovis segments with the largest matching starting and/or ending locations:

     Clovis              Kit      Chr   Start Location   End Location    cMs    SNPs

   F99919 and T074xxx   1         1,751,874       3,003,550      4.2    261

   F99919 and T133xxx   1         1,751,874       3,003,550      4.2    258

   F99919 and T630xxx   9       38,523,004      70,819,104      4.0    286

   F99919 and T682xxx   9       38,694,680      70,536,108      3.4    148

   F99919 and A561xxx 12       11,840,131      12,861,007      3.8    347

   F99919 and A832xxx 12       11,840,131      12,861,007      3.8    350

   F99919 and A438xxx 17       13,813,353      14,461,941      3.6    251

   F99919 and T368xxx 17       13,785,798      14,618,990      4.7    354

   F99919 and A984xxx 19        8,282,431        9,460,034      4.2    292

   F99919 and T611xxx 19        8,214,446        9,934,324      5.5    401

I would think that these matching Clovis segments would imply a Native American ancestor for the above kits (Amelia County, Ashe County, and the Arkansas lines).

This also implies that if kit A171xxx (of Somerset) does not have a Native American ancestor, then the timeline to connection could be considerably further back in time and may have an impact upon how the Half Life equation should function.

For example, a person in Somerset with a documented line may have a very small chance of having a Native American ancestor. A Clovis match would push the age back at least 12,500 years, much more than the 9 generations from the current 1 cM limit of the Half Life equation.

Sample    Location  GEDMatch  Sex    Y-DNA    Mt-DNA    Approx. Age by authors   
Felix Chandrakumar Analysis or Comments
              

Clovis-Anzick-1    Montana, North America    F999919    M    Q-Z780    D4h3a    12,500 years    Matches Living people.

http://www.y-str.org/2014/09/clovis-anzick-dna.html



References:



"Analyzing the Native American Clovis Anzick Ancient Results" DNAeXplained – Genetic Genealogy, Roberta Estes

https://dna-explained.com/2014/09/23/analyzing-the-native-american-clovis-anzick-ancient-results/


"Ancient DNA Matching – A Cautionary Tale"   DNAeXplained – Genetic Genealogy, by Roberta Estes
 

https://dna-explained.com/2014/09/30/ancient-dna-matching-a-cautionary-tale/

"Matching DNA of Living Native Descendants to DNA of Native Ancestors

https://nativeheritageproject.com/2014/09/25/matching-dna-of-living-native-descendants-to-dna-of-native-ancestors/

"Autosomal DNA statistics"
https://isogg.org/wiki/Autosomal_DNA_statistics

Or see also:

"Cousin statistics"
https://isogg.org/wiki/Cousin_statistics

The scientific paper for the expected (i.e., theoretical) number of cMs at the 5th cousin level:

"Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples"

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034267 

 
GEDMatch   John Olson depends upon financial contributions.


Clovis-Anzick-1    Montana, North America    F999919    M    Q-Z780    D4h3a    12,500 years    Matches Living people.

http://www.y-str.org/2014/09/clovis-anzick-dna.html


Autosomal DNA Half Life Equation
http://hamcountry-blog.blogspot.com/2018/02/autosomal-dna-half-life-equation.html



Friday, May 4, 2018

Y-DNA STR Genetic Distance And The Probability of Error

Y-DNA STR Genetic Distance And The Probability of Error



A Brief Review of HAM DNA Group #1



This topics is in regard to whether or not Y-DNA is an adequate predictor of relationships. That is, are DNA matches using Y-DNA a good indicator of close relationships at up to 111 markers?

Genetic Distance” is a term used to show how well the DNA matches when compared to another person. Which is to say, what a perfect “match” at 111 Y-DNA STR markers should be. Genetic Genealogists generally want to associate Genetic Distance with how closely related the lineages may be. Genetic Distance combined with the concept of Time to Most Recent Common Ancestor (TMRCA) should deliver an indication of when the lines converge.


However, projects are finding that Genetic Distance for Y-DNA STRs are not a good indication of how closely related two individuals may be. In many cases, it does provide a fairly reliable account of surname history.

This article is written because it came as a surprise to me when a certain FaceBook group apparently censored my comments regarding The Genetic Distance for Y-DNA. 
That is, the "Y-DNA - Applied Genealogy & Paternal Origins" group on FaceBook.


I wanted to address what Genetic Distance means in terms a beginner could understand, as we had a little bit of a conversation about it.

Another FaceBook Group had also censored my comments regard the DNA analysis by Law Enforcement in the recent “Golden State Killer.”

To review a few Genetic Distance examples, my line has the following figures regarding Genetic Distance.

This first group connects at circa 1755 from Grayson County, Virginia. The values below are the Genetic Distance to me:

Jimmy 5th cousin, once removed GD 0 - 111 markers
Julian 5th cousin GD 1 - 37 markers
Gene 5th cousin GD 4 - 37 markers
Bill 5th cousin GD 2 - 37 markers
Steven 5th cousin GD 0 - 12 markers
Brick 5th cousin GD 0 - 67 markers



Most listed above descend from John Ham (1780-1850) of Grayson County, Virginia:

Julian and Jimmy are 3rd cousins and have a Genetic Distance of 1 on 37 markers.
Steven and Bill are 3rd cousins, with a Genetic Distance of 1 on 12 markers

Gene is 4th cousin to Jimmy with a Genetic Distance of 4 on 37 markers
Gene is 4th cousin to Julian with a Genetic Distance of 4 on 37 markers
Gene is 4th cousin to Bill with a Genetic Distance of 5 on 37 markers
Gene is 4th cousin to Steven with a Genetic Distance of 0 on 12 markers


Brick is about 5th cousin from everybody else above, as he descends from Thomas HAM of Ashe County, NC (1795-1865)

Brick is 5th cousin to Jimmy with a Genetic Distance of 0 on 67 markers
Brick is 5th cousin to Julian with a Genetic Distance of 1 on 37 markers
Brick is 5th cousin to Gene with a Genetic Distance of 4 on 37 markers
Brick is 5th cousin to Bill with a Genetic Distance of 2 on 37 markers
Brick is 5th cousin to Steven with a Genetic Distance of 0 on 12 markers

Brick is 5th cousin to Dave, as previously mentioned.


At Greater Than 5th cousin level (to me), the following three have a line are from another geographic areas, Franklin County, North Carolina, and connect to the above prior to 1755:

Marvin Greater Than 5th cousin GD 5 - 111 markers
Leonard GT 5th cousin GD 4 - 37 markers
James GT 5th cousin GD 2 - 37 markers

That is, James from a completely different line has a Genetic Distance “as close” or closer than two of my actual 5th cousins. The three above are from the same Franklin County line.

Above, between themselves, Marvin & Leonard have a Genetic Distance of 1 on 37 markers.
Between Marvin & James, they have a Genetic Distance of 1 on 37 markers.

Marvin and Leonard descend from Robert Solomon Ham, and appear to be about 2nd cousins. James descends from Francis (Frank) James Hamm, and appears to be about 3rd cousin to Marvin and Leonard.

Continuing on with the Genetic Distance to me:

Jon GT 6th cousin (Somerset, England) GD 5 - 111 markers
Tony GT 5th cousin (Somerset, England) GD 1 - 37 markers

[Tony and Jon have a Genetic Distance of 3 between the two of them.]

Tony has a Most Recent Common Ancestor in England and has a closer Genetic Distance to me than at least three of my 5th cousins, although we know Tony has to relate further back, as my line has been in this country prior to 1783, and and Tony’s genealogical information shows no connection (Tony’s line arrived in the U.S. circa 1850).

Michael Gene Greater Than 5th cousin (Patrick County, Virginia) & myself have a GD 2 on 111 markers.

That is, Michael Gene is from a completely different line from a different geographic area and has a closer Genetic Distance than two of my 5th cousins (at 111 markers), but we know that we must relate further back than 5th cousins from the genealogical information.

Occasionally, FTDNA has made changes that has these values jump around a bit. At one time, FTDNA had Michael Gene and I at a Genetic Distance of one on 111 markers.


The guidance given by Family Tree DNA on 111 markers says that at 50% confidence level, an exact match on 111 markers should be within 2 generations. That is, 2 generations or less. Obviously, if I am an exact match to my 5th cousin Jimmy at 111 markers, then we certainly do NOT want to use 50% confidence levels.

Fortunately, FTDNA provides other confidence levels for 111 markers: 90%, 95%, and 99%.

The 90% level also fails for the GD of zero between my 5th cousin Jimmy and myself. At 90% confidence level, the table says that Jimmy & I should be 4th cousins or less.

It is only when we reach the 95% or 99% confidence level that FTDNA returns a valid TMRCA for Genetic Distance of 0 on 111 markers of at least 5 generations. Since we are 5th cousins, Jimmy and I would be the 6th generation, meaning only the 99% confidence level actually meets that criteria.

If you are using Dean McGee's Y-Utility, you will want to use the highest probability for general purpose use.

Anybody looking at Genetic Distance should be thinking in terms of “X” generations OR LESS. For example, I typically refer to an exact match at 37 markers as “Any time after 1600,” as Ron Blevins has reported seeing that in his project.

Another genetic genealogist has also mentioned how unreliable Genetic Distance may be in determining relationships is Jim Owston in his 2014 article “Is Genetic Distance an Adequate Predictor of Relationships?” (Updated Jan 23, 2018)

Jim Owston mentions:

“Therefore, it is unlikely that two people with a GD=4 are close relatives; however, a GD=0 could represent numerous relationships from very close relatives to those who are very distant, as a genetic distance of zero is all over the road.”

Jim Owston has information back to 13th cousins, where 12th cousins or more are estimated.

We have few in the HAM DNA Project that can claim accurate documentation that far back. However, the Grayson County group does have a similar number of known 5th cousins who have tested with the Y-DNA.

In comparison, Jim Owston lists roughly eight 5th cousins listed, and I list roughly eleven 5th cousins relationships above, among 7 kits. Jim has roughly eleven 4th cousins listed, and I have five 4th cousin relationships listed above. Otherwise, Jim Owston has multiple dozens of relationships listed at 8th cousins or more.

Jim Owston now has 253 relationships 43 markers and 153 relationships at 37 markers on record, compared to 59 kits in the HAM DNA Project, and 17 autosomal kits in the HAM DNA Group #1 study. I do not know off hand how many relationships that represents for the HAM Group #1, but a reasonable guess would be roughly two dozen. Tiny in comparison Jim Owston.

In an effort to obtain a better TMRCA, Jim Owston is considering a study of the BigY results (the BigY-500 product provides over 500 STRs, and is largely based on SNPs).

For an improved TMRCA, I have been looking at autosomal results. There are 16 kits in Group #1 now participating in the autosomal study, with at least 7 kits from the Grayson County line. My initial autosomal DNA studies indicate that the autosomal DNA may deliver better TMRCA results than does up to 111 Y-DNA STR markers.

However, for the autosomal DNA, the immediate issues include the apparent removal of “Excess IBD” segments from GEDMatch reports, vendor conversion issues (such as 23andMe conversion issues), or slight differences in starting locations when compared to the vendor, and ‘How To’ verify data that falls below the vendor’s lowest threshhold, privacy issues, etc. It is not yet known if the autosomal DNA will hold up any accuracy when taken to the 13th cousin level that Jim Owston has in his study. According to the Autosomal Half Life Equation, the threshholds would have to be taken down to about 0.01 cMs in order to deliver 14th cousin relationships. GEDMatch cannot bet set lower than 1 cM (about 8th or 9th cousin level, according to the Half Life Equation). If concepts such as the “EndogamyFactor” could be considered to be a valid evaluation, then perhaps the lowest 1 cM threshhold at GEDMatch may deliver results even further back than 9th cousin.

Related Topics:

Y-DNA Mutation Rates – A Case Study

Y-DNA Project Grouping with Genetic Distance

Tree Building for Y-DNA Surname Projects

HAM DNA Output From Dean McGee’s Y-DNA Utility

Is Genetic Distance an Adequate Predictor of Relationships?

Autosomal Small Segment Triangulation HAM DNA Group #1

Autosomal Small Segment Phylogenetic Tree

Autosomal DNA Half Life Equation

FTDNA's Interpreting Genetic Distance for 37 Markers

FTDNA's Interpreting Genetic Distance for 67 Markers

FTDNA's Interpreting Genetic Distance for 111 Markers

FTDNA BigY-500 product

GEDMatch