Y-DNA STR Genetic Distance And The Probability of Error
A Brief Review of HAM DNA Group #1
This
topics is in regard to whether or not Y-DNA is an adequate predictor
of relationships. That is, are DNA matches using Y-DNA a good
indicator of close relationships at up to 111 markers?
“Genetic
Distance” is a term used to show how well the DNA matches when
compared to another person. Which is to say, what a perfect “match”
at 111 Y-DNA STR markers should be. Genetic Genealogists generally
want to associate Genetic Distance with how closely related the
lineages may be. Genetic Distance combined with the concept of Time
to Most Recent Common Ancestor (TMRCA) should deliver an indication
of when the lines converge.
However,
projects are finding that Genetic Distance for Y-DNA STRs are not a
good indication of how closely related two individuals may be. In
many cases, it does provide a fairly reliable account of surname
history.
This
article is written because it came as a surprise to me when a certain
FaceBook group apparently censored my comments regarding The Genetic
Distance for Y-DNA.
That is, the "Y-DNA - Applied Genealogy & Paternal Origins" group on FaceBook.
I wanted to address what Genetic Distance means in terms a beginner could understand, as we had a little bit of a conversation about it.
Another FaceBook Group had also censored my comments regard the DNA analysis by Law Enforcement in the recent “Golden State Killer.”
That is, the "Y-DNA - Applied Genealogy & Paternal Origins" group on FaceBook.
I wanted to address what Genetic Distance means in terms a beginner could understand, as we had a little bit of a conversation about it.
Another FaceBook Group had also censored my comments regard the DNA analysis by Law Enforcement in the recent “Golden State Killer.”
To review a few Genetic Distance examples, my line has the following figures regarding Genetic
Distance.
This first group connects at circa
1755 from Grayson County, Virginia. The values below are the Genetic
Distance to me:
Jimmy 5th cousin, once removed GD 0 - 111 markers
Julian 5th cousin GD 1 - 37 markers
Gene 5th cousin GD 4 - 37 markers
Bill 5th cousin GD 2 - 37 markers
Steven 5th cousin GD 0 - 12 markers
Brick 5th cousin GD 0 - 67 markers
Most listed above descend from John Ham (1780-1850) of Grayson County, Virginia:
Jimmy 5th cousin, once removed GD 0 - 111 markers
Julian 5th cousin GD 1 - 37 markers
Gene 5th cousin GD 4 - 37 markers
Bill 5th cousin GD 2 - 37 markers
Steven 5th cousin GD 0 - 12 markers
Brick 5th cousin GD 0 - 67 markers
Most listed above descend from John Ham (1780-1850) of Grayson County, Virginia:
Julian
and Jimmy are 3rd cousins and have a Genetic Distance of
1 on 37 markers.
Steven
and Bill are 3rd cousins, with a Genetic Distance of 1 on
12 markers
Gene
is 4th cousin to Julian with a Genetic Distance of 4 on 37
markers
Gene
is 4th cousin to Bill with a Genetic Distance of 5 on 37
markers
Gene
is 4th cousin to Steven with a Genetic Distance of 0 on 12
markers
Brick
is about 5th cousin from everybody else above, as he
descends from Thomas HAM of Ashe County, NC (1795-1865)
Brick
is 5th cousin to Jimmy with a Genetic Distance of 0 on 67
markers
Brick
is 5th cousin to Julian with a Genetic Distance of 1 on 37
markers
Brick
is 5th cousin to Gene with a Genetic Distance of 4 on 37
markers
Brick
is 5th cousin to Bill with a Genetic Distance of 2 on 37
markers
Brick
is 5th cousin to Steven with a Genetic Distance of 0 on 12
markers
Brick is 5th cousin to Dave, as previously mentioned.
Brick is 5th cousin to Dave, as previously mentioned.
At
Greater Than 5th cousin level (to me), the following three have a
line are from another geographic areas, Franklin County, North
Carolina, and connect to the above prior to 1755:
Marvin
Greater Than 5th cousin GD 5 - 111 markers
Leonard
GT 5th cousin GD 4 - 37 markers
James
GT 5th cousin GD 2 - 37 markers
That
is, James from a completely different line has a Genetic Distance “as
close” or closer than two of my actual 5th cousins. The three above
are from the same Franklin County line.
Above,
between themselves, Marvin & Leonard have a Genetic Distance of 1
on 37 markers.
Between
Marvin & James, they have a Genetic Distance of 1 on 37 markers.
Marvin and Leonard descend from Robert Solomon Ham, and appear to be about 2nd cousins. James descends from Francis (Frank) James Hamm, and appears to be about 3rd cousin to Marvin and Leonard.
Marvin and Leonard descend from Robert Solomon Ham, and appear to be about 2nd cousins. James descends from Francis (Frank) James Hamm, and appears to be about 3rd cousin to Marvin and Leonard.
Continuing
on with the Genetic Distance to me:
Jon
GT 6th cousin (Somerset, England) GD 5 - 111
markers
Tony
GT 5th cousin (Somerset, England) GD 1
- 37 markers
[Tony
and Jon have a Genetic Distance of 3 between the two of them.]
Tony
has a Most Recent Common Ancestor in England and has a closer Genetic
Distance to me than at least three of my 5th cousins,
although we know Tony has to relate further back, as my line has been
in this country prior to 1783, and and Tony’s genealogical information
shows no connection (Tony’s line arrived in the U.S. circa 1850).
Michael
Gene Greater Than 5th cousin (Patrick County, Virginia) &
myself have a GD 2 on 111 markers.
That
is, Michael Gene is from a completely different line from a different
geographic area and has a closer Genetic Distance than two of my 5th
cousins (at 111 markers), but we know that we must relate further
back than 5th cousins from the genealogical information.
Occasionally,
FTDNA has made changes that has these values jump around a bit. At
one time, FTDNA had Michael Gene and I at a Genetic Distance of one on
111 markers.
The
guidance given by Family Tree DNA on 111 markers says that at 50% confidence
level, an exact match on 111 markers should be within 2 generations.
That is, 2 generations or less. Obviously, if I am an exact match to
my 5th cousin Jimmy at 111 markers, then we certainly do
NOT want to use 50% confidence levels.
Fortunately,
FTDNA provides other confidence levels for 111 markers: 90%, 95%, and
99%.
The
90% level also fails for the GD of zero between my 5th
cousin Jimmy and myself. At 90% confidence level, the table says that
Jimmy & I should be 4th cousins or less.
It
is only when we reach the 95% or 99% confidence level that FTDNA
returns a valid TMRCA for Genetic Distance of 0 on 111 markers of at
least 5 generations. Since we are 5th cousins, Jimmy and I
would be the 6th generation, meaning only the 99%
confidence level actually meets that criteria.
If you are using Dean McGee's Y-Utility, you will want to use the highest probability for general purpose use.
If you are using Dean McGee's Y-Utility, you will want to use the highest probability for general purpose use.
Anybody
looking at Genetic Distance should be thinking in terms of “X”
generations OR LESS. For example, I typically refer to an exact
match at 37 markers as “Any time after 1600,” as Ron Blevins has
reported seeing that in his project.
Another
genetic genealogist has also mentioned how unreliable Genetic
Distance may be in determining relationships is Jim Owston in his
2014 article “Is Genetic Distance an Adequate Predictor of Relationships?”
(Updated Jan 23, 2018)
Jim Owston mentions:
“Therefore, it is unlikely that two people with a GD=4 are close relatives; however, a GD=0 could represent numerous relationships from very close relatives to those who are very distant, as a genetic distance of zero is all over the road.”
Jim Owston mentions:
“Therefore, it is unlikely that two people with a GD=4 are close relatives; however, a GD=0 could represent numerous relationships from very close relatives to those who are very distant, as a genetic distance of zero is all over the road.”
Jim
Owston has information back to 13th cousins, where 12th
cousins or more are estimated.
We
have few in the HAM DNA Project that can claim accurate documentation
that far back. However, the Grayson County group does have a similar
number of known 5th cousins who have tested with the
Y-DNA.
In
comparison, Jim Owston lists roughly eight 5th cousins
listed, and I list roughly eleven 5th cousins
relationships above, among 7 kits. Jim has roughly eleven 4th
cousins listed, and I have five 4th cousin relationships
listed above. Otherwise, Jim Owston has multiple dozens of
relationships listed at 8th cousins or more.
Jim
Owston now has 253 relationships 43 markers and 153 relationships at 37
markers on record, compared to 59 kits in the HAM DNA Project, and 17
autosomal kits in the HAM DNA Group #1 study. I do not know off hand how many
relationships that represents for the HAM Group #1, but a reasonable
guess would be roughly two dozen. Tiny in comparison Jim Owston.
In
an effort to obtain a better TMRCA, Jim Owston is considering a study
of the BigY results (the BigY-500
product provides over 500 STRs, and is largely based on SNPs).
For
an improved TMRCA, I have been looking at autosomal results. There
are 16 kits in Group #1 now participating in the autosomal study,
with at least 7 kits from the Grayson County line. My initial
autosomal DNA studies indicate that the autosomal DNA may deliver
better TMRCA results than does up to 111 Y-DNA STR markers.
However,
for the autosomal DNA, the immediate issues include the apparent
removal of “Excess IBD” segments from GEDMatch reports, vendor
conversion issues (such as 23andMe conversion issues), or slight
differences in starting locations when compared to the vendor, and
‘How To’ verify data that falls below the vendor’s lowest
threshhold, privacy issues, etc. It is not yet known if the autosomal
DNA will hold up any accuracy when taken to the 13th
cousin level that Jim Owston has in his study. According to the
Autosomal Half Life Equation, the threshholds would have to be taken
down to about 0.01 cMs in order to deliver 14th cousin
relationships. GEDMatch cannot bet set lower than 1 cM (about 8th
or 9th cousin level, according to the Half Life Equation).
If concepts such as the “EndogamyFactor” could be considered to be a valid evaluation, then
perhaps the lowest 1 cM threshhold at GEDMatch may deliver results
even further back than 9th cousin.
Related
Topics:
Y-DNA
Mutation Rates – A Case Study
Y-DNA
Project Grouping with Genetic Distance
Tree
Building for Y-DNA Surname Projects
HAM
DNA Output From Dean McGee’s Y-DNA Utility
Is
Genetic Distance an Adequate Predictor of Relationships?
Autosomal
Small Segment Triangulation HAM DNA Group #1
Autosomal
Small Segment Phylogenetic Tree
Autosomal
DNA Half Life Equation
FTDNA's
Interpreting Genetic Distance for 37 Markers
FTDNA's
Interpreting Genetic Distance for 67 Markers
FTDNA's
Interpreting Genetic Distance for 111 Markers
FTDNA
BigY-500 product
GEDMatch