Thursday, February 7, 2019

Largest Shared Segment Variation from Half Life Equation

HAM Group #1 

 

Largest Shared Segment Variation from Half Life Equation




This table is intended to report out the variation in error from the autosomal Half Life equation for HAM Y-DNA Group #1, and mention a few related problems.
  

HAM Group #1 Largest Shared Segment Variation from Half Life Equation


Here, I am using the autosomal half life equation on the largest shared segment, when compared to the closest (confirmed) cousin level.

Comparing Maximum cMs for the Half Life Equation

Although it would appear that using the setting of maximum cMs per chromosome should in general produce an improvement in estimates, an excessive IBD segment can throw off a much larger error. For example, between 4C1R T133xxx and T611xxx the error from using the maximum cM values throws off significant error. In this case, the segment begins just within a known "Excess IBD" area, and ends well outside the known "Excess IBD" area.

The table shows that on average, the error from the use of 281.5 cMs is slightly less than using the maximum per chromosome. I should emphasize that these are the relationships as I have it today.

For a different comparison of error, 5th cousin T074xxx and T611xxx share a different segment on chr 21, which is outside the known "Excess IBD" area. The answer may or may not be endogamy, so it might be interesting to examine why using chromosome maximums with the half life equation might throw error from a non "Excess IBD" area.

Endogamy or Excess IBD?

Wendell and Dave relate a couple of times. The basic thought there is that William Ham, Jr. and Thomas Ham were two brothers that married two sisters, Catherine Eldridge and Ann Eldridge. Also, Wendell descends twice from our common ancestor William Ham, Sr. and wife.

So for me that is:

Dave -> Rufus -> 1) Ambrose 2) Joshua 3) Eli 4) William Ham, Jr. 5) William Ham, Sr.

Or, 5 generations back.

For Wendell, that goes like this:

Wendell -> mother -> 1) Nettie 2) Nelson 3) Larkin 4) Thomas Ham 5) William Ham, Sr.

For the two Eldridge sisters:

Dave -> Rufus -> 1) Ambrose 2) Joshua 3) Eli 4) Catherine Eldridge 5) Zachariah Eldridge
Wendell -> mother -> 1) Nettie 2) Nelson 3) Larkin 4) Ann Eldridge 5) Zachariah Eldridge

On top of that, Wendell is also my 5C1R cousin via Levi Ham:

Wendell -> mother -> 1) Nettie 2) Tabitha 3) Luvene 4) Levi 5) Thomas Ham 6) William Ham, Sr.
Wendell -> mother -> 1) Nettie 2) Tabitha 3) Luvene 4) Levi 5) Ann Eldridge 6) Zachariah Eldridge

Therefore, Wendell and I can relate at the 5th cousin level to these common ancestors:

1) William Ham, Sr.
2) the wife of William Ham, Sr.
3) Zachariah Eldridge via daughter Catherine Eldridge, the wife of William Ham, Jr.
4) the wife of Zachariah Eldridge
5) Thomas Ham, the son of William Ham, Sr.
6) the wife of Thomas Ham, Ann Eldridge (i.e. daughter of Zachariah Eldridge)

Just to emphasize, I can relate to Wendell *as 5th cousin* through either Zachariah Eldridge, the wife of Zachariah Eldridge, the daughter of Zachariah Eldridge, via William Ham, Sr, or his son Thomas Ham, depending upon which line that you happen to follow.

To be brief, the closest ancestor that Wendell and I share in common is at the 5th cousin level. But there are multiple shared ancestors at that level for us for multiple reasons.

In order to be specific to each 5th cousin ancestor, that requires a comparison of the starting and ending locations to the matching shared segment. So, with enough participants, we should eventually be able to sort out which segment belongs to which ancestor.

Basically, the procedure to separate out multiple ancestors in common is to see who else shares the same starting and/or ending location at about the same size. Often, that leads you back to a recurring surname (from the general population).

Most vendors currently set a minimum requirement for a match at 7 cMs.

Of the above 17 known relationships for this Group #1 autosomal study:

 - 60 % (3 out of 5) of the 4th cousins do not meet the minimum matching requirements.
 - 57 % (4 out of 7) of the 5th cousins do not meet the minimum matching requirements.

Among those above 3rd cousins, 58 % (7 out of 12) do not meet the minimum matching requirements.
Which means overall, 41 % (7 out of 17) do not meet the minimum matching requirements.

Another Example:

What strikes me as interesting is the relationship between 5th cousins Jimmy and Dave (T103xxx and T611xxx). The two do not show up as a match at Family Tree DNA (FTDNA), yet have a similar relationship where two first cousins married two Brinegar females. (Isaac Ham married Mary Brinegar and Eli Ham married Celia Brinegar).

Note: T103xxx and T611xxx are an exact match on the Y-DNA at 111 markers (40777 and 478168).

So, at the 5th cousin level:

 - they should relate in more than one way (Ham and Brinegar),
 - the segment is NOT in a known "Excess IBD" area,
 - they are an exact match at 111 markers on the Y-DNA,

and yet they do not show up as a match from the vendor because the largest shared matching segment is 6.7 cMs (which is below the vendor's lower limit threshold of 7 cMs).

The sum of shared segments greater than 1 cM between T103xxx and T611xxx is 68.7 over 29 segments. Again, this is not compatible with the Shared cM Project, due to the difference in parameter set at GedMatch using the "One to One' Utility.

Data has not yet been collected from GedMatch Genesis.

Multiple Relationships?

I should note that if the Brinegar ladies are sisters (not yet confirmed), then the closest relationship between T103xxx and T611xxx would be at the 3rd cousin level (5th cousins by Ham, and 3rd cousins by Brinegar). Also, T611xxx and A658xxx also be 5th cousins and 3rd cousins in the same manner. It is possible that the Brinegar ladies are cousins, instead of sisters.

Which means, these relationships will not be clear until each segment can be verified at their starting and ending locations from multiple kits, as well as more documentation in some cases. We would want to see "who else" matches at the same starting and ending locations.

Therefore, the variation in error reported may be different than it now appears. For example, the sums could work out better if the Eldridge and Brinegar lines were better known.

It could also mean that FTDNA is not picking up 3rd cousins using the autosomal DNA either.

Of those participating in the tiny autosomal study, we have similar starting and ending locations, and for the most part, the same size. However, they are not yet matching exactly:

Kit #1        Kit #2    Chr     Start           End            cMs      SNPs

A561xxx       A832xxx    20      55,600,000      60,800,000       19.9    1750
A438xxx       A832xxx    20      55,900,000      57,900,000        6.2     640
A658xxx       T074xxx    20      56,700,000      58,400,000        5.9     563
T103xxx          T611xxx    20      57,500,000      59,000,000        6.7     511
T133xxx          T368xxx    20      59,200,000      60,700,000        6       505

Which is an indication that we have the same ancestor in common at about the 5th cousin level. So, they do not yet exactly triangulate. More data should fill that in and confirm that "hint" by matching starting and ending locations much better. We will not know whether this is male or female with any confidence until we can fill in the gaps in order to see if another surname fits there.

The Half Life equation would put the 19.9 cM segment at about the 3.8 cousin level, but because we have older 'almost matches' in our common surname study, we may presume that the area already existed previously at about the 5th cousin level.

For those of you in the Group #1 "Tiny Segment Study," that is the reason I sent out the spreadsheets at about the end of October. In order to view that, see:

 "Spreadsheet_Combine_All_at_GEDMatch_Output_ByChr_101818"

Ancient DNA?


Of the 9 kits in the listing above, it looks like brother and sister A561xxx and A832xxx have attached an "Excess IBD" to the older segment, which appears to fill in for all. This would make it an apparent "recombination." This is not a known "Excess IBD" area, so this is more likely be due to either a "persistent" ancient DNA segment, or due to endogamy at the 4th cousin level. You might notice that the RATIO of SNPs/cMs for all of the 9 kits is less than 100, when the largest segment between mother and son has a RATIO at about 200 (except for out 23andMe data). I usually suspect ancient DNA when the ratio falls below 200. Chromosome 20 ratio using maximum cMs and SNPs is about 155.

I should note here that FTDNA and GedMatch has a match between T611xxx and T133xxx at a different location entirely (45,000,000 to 49,000,000 at 6.8 cMs to 8 cMs).

Of the above 9 kits, they have:

A438xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
A561xxx Most Common Ancestor from NC? d: 1858 Hickman, Arkansas
A832xxx Most Common Ancestor from NC? d: 1858 Hickman, Arkansas
A658xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T074xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T103xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T133xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA
T368xxx Most Common Ancestor is from Patrick County, Virginia
T611xxx Most Common Ancestor is from Ashe County, NC / Grayson County, VA



Shared cM Project

Sum of segments have been included for comparison with the Shared cM Project.
Here, the sums are not compatible with the Shared cM Project, mainly due to the difference in parameters (for example: 250 SNPs, Bunching Limit of 100 SNPs, and minimum 1 cM).

Here, sums by themselves are not a good indicator of (distant) relationships.

Updated Feb 7, 2019 to indicate that Wendell and Dave would be 5C1R via Levi Ham, instead of 6th cousin.

Further Reading:


"Autosomal DNA Half Life Equation"  Ham Country blog

 http://hamcountry-blog.blogspot.com/2018/02/autosomal-dna-half-life-equation.html

"Spreadsheet_Combine_All_at_GEDMatch_Output_ByChr_101818" sent to participants in this study by Dave Hamm

"Visualizing Data From the Shared cM Project" by The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/05/29/visualizing-data-from-the-shared-cm-project/

"The Shared cM Project – An Update" by The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/05/25/the-shared-cm-project-an-update/
"The Shared cM Project –Version3.0(August 2017)" by Blaine T. Bettinger
  (PDF file)
 https://thegeneticgenealogist.com/wp-content/uploads/2017/08/Shared_cM_Project_2017.pdf

"Collecting Sharing Information for Known Relationships (Part 1)" The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/03/04/collecting-sharing-information-for-known-relationships/

"Collecting Sharing Information for Known Relationships – Part II" The Genetic Genealogist

 https://thegeneticgenealogist.com/2015/04/06/collecting-sharing-information-for-known-relationships-part-ii/



Articles on Triangulation of segments by Starting and Ending Locations:



"A Study Utilizing Small Segment Matching" by Roberta Estes at DNAeXplained

 https://dna-explained.com/2015/01/21/a-study-utilizing-small-segment-matching/

"A Triangulation Intervention" The Genetic Genealogist

 https://thegeneticgenealogist.com/2016/06/19/a-triangulation-intervention/


More Information for HAM DNA Group #1 at HAM Country:

 http://ham-country.com/HamCountry/HAM_DNA_Project/Groups/HAM_DNA_Group001.html











No comments: