Showing posts with label Family Finder Checklist. Show all posts
Showing posts with label Family Finder Checklist. Show all posts

Thursday, February 1, 2018

Autosomal DNA Half Life Equation

Autosomal DNA Half Life Equation

Feb 1, 2018


In the previous article, I had talked about creating a phylogenetic tree for autosomal data.
 
Andrew Millard suggested that because the Genetic Distance value of 1/cMs is not linear, I might try an exponential equation of the form -ln(cm/7200).

That formula suggestion was tried that, and I was not real happy with the results. The Genetic Distance did not correspond well to cousin level, and the upper limit for data greater than 1 cM appeared to be about 8 (generations), even for the results for Lauren Shutt (M804xxx), who was not even a match to the group.

Andrew had also suggested that I do not use kitsch, but I have not yet found different program that works with Genetic Distance.

So, here, I have used the equation for the half life decay rate, typically used in radioactive decay under the presumption that we are looking at Half Identical Regions (HIRs). The results mostly returned the expected cousin level (or number of generations).



Individual Segments:


Hypothetical Genetic Distance derived from half life decay rate: 

Nt = No*e^kt

Solvng for t, Hypothetical Genetic Distance for an individual segment is given as: 

t = -1*ln(cMs*.0035524)/0.693147

where 'ln' is the natural log function. 


where 'F1234" would be the location in cMs on line (row) 1234.

The largest segment (i.e., most obvious) for mother/child or parental half life calculations (for size is initially at 281.5 cMs at current GEDMatch parameters):

a = 1/281.5 cMs = 0.0035524

and for parent/child half life decay:


k = ln(N0/Nt) = ln(1/2) = 0.693147


Excel spreadsheet:

For a spreadsheet, the equation for individual segments should be something like this:

=-LN(F1234*0.0035524)/0.693147


 - Where 'LN' is the natural log function 
 - F1234 is the size of the segment in cMs in column "F" on line (row) 1234.

Comparison to 23AndMe data:

The default setting of 500 SNPs do not usually generate sufficient total size for the vendor 23AndMe. Otherwise, the results will not be compatible with the individual segment equation, and may generate poor results. This is a GEDMatch vendor conversion issue. If you need to compare kits from 23AndMe (kits Mxxxx at GEDMatch), I would suggest lowering the SNP limit to 250 instead of the default 500 SNPs.

Below is the link to the data and phylogenetic tree resulting from the use of the half life decay rate calculations. Basically, an update to the previous article by applying the autosomal half life decay equation.

Article:  Autosomal Half Life Equation

 
Autosomal Half Life Equation Largest Segment Table

Autosomal Half Life Equation Phylogenetic Tree


See below for more information about the "Endogamy Correction Factor."

Total SUMS in cMs:

The Total Sum of segments (i.e., most obvious) for mother/child or parental half life calculations (for size is initially at about 3585 cMs at current GedMatch thresholds*):

a = 1/3585 cMs = 0.00027894

and for parent/child half life decay:


k = ln(N0/Nt) = ln(1/2) = 0.693147


Total Sums Excel spreadsheet:

For a spreadsheet, the equation for total sum of all segments should be something like this:

=-LN(F1234*0.
00027894)/0.693147


 - Where 'LN' is the natural log function 
 - F1234 is the size of the segment in cMs in column "F" on line (row) 1234.


* You will need to modify the SNP limit and cMs to 1 cM in order to use the "Total SUM" version of the equation. I am not getting total sums consistent with this equation due to either:

a) GEDMatch under reports matching segments.  GEDMatch has apparently attempted to remove some "Excess IBD" areas, which will affect the total sum of segments.

b) GEDMatch has a vendor conversion problem with vendor 23AndMe.
c) Therefore, I have not been able to adequately test the "Total SUM" version of the Half Life equation.

Two things that you should know when using Total SUMs:

 a) The default setting of 500 SNPs do not usually generate sufficient total size for the vendor 23AndMe. Otherwise, the results will not be compatible with the individual segment equation, and may generate poor results. This is a GEDMatch vendor conversion issue. If you need to compare kits from 23AndMe (kit Mxxxx at GEDMatch), I would suggest lowering the SNP limit to 250 instead of the default 500 SNPs.


 b) The "Total SUM" natural log equation is not delivering adequate results from the data, as given by GEDMatch.  By comparing  my data from 2015 to today's data, it appears that GEDMatch has made an effort to NOT report some of the "Excess IBD" areas with the results. That will affect the Half Life equation for Total SUMs, because the sums are now under reported at GEDMatch.

ISOGG message from CeCe Moore on Thu, Jun 10, 2010:

"Hi All,
    I had a very fascinating interview with Bennett today and wanted to share something very important that I learned since I know it has been debated here quite a bit. I asked him about the reliability of using the combined smaller segments in "Total cMs" to predict relatedness. He stated that FTDNA only uses "Total cMs" for relationship predictions of 2nd cousin once removed and closer. From that point on, they only use the longest blocks to predict relationship. The "Total cMs" is only included in FF summaries because it was something that many people were interested in seeing.
    CeCe"



Per Chromosome Maximum


The equation can be customized per chromosome by using the maximum value of centimorgans per chromosome. Ann Turner has explained that you can get this by comparison to yourself. This can be done programmatically. If you are not a whiz on a spreadsheet, you can create a column for these values for each chromosome, then refer the Half Life equation to the "max cMs" column, as such:

=-1*(LN(F5/J5))/LN(2)

where F5 is the segment value in centimorgans in column F on line 5
and J is the maximum cMs on that chromosome in column J for line 5


Chromosome    FTDNA [A]  GEDMatch [B}  23andMe [C]

  1                         267.21      281.5         284
  2  
                     253.06      263.7         269
  3  
                     219.1        224.2         223
  4  
                     206.75      214.4         214
  5   
                    199.6        209.3         204
  6 
                      189.14      194.1         192
  7       
                180.79      187.0         187
  8       
                161.76      169.2         168
  9     
                  160.36      167.2         166
10   
                    176.25      174.1         181
11    
                   155.78      161.1         158
12   
                    167.39      176.0         175
13   
                    126.48      131.9         126
14   
                    111.66      125.2         119
15    
                   118.07      132.4         141
16    
                   131.90      133.8         134
17     
                  124.33      137.3         128
18    
                   119.39      129.5         117
19    
                     99.07      111.1         108
20        
               104.20      114.8         108
21         
               58.99        70.1          62.7
22        
                53.03        79.1          72.7
 
Warning: Chromosomes 21 and 22 have fairly low maximum values, and may require a different treatment because sizes can get large quickly, as in an 'Excess IBD' region or a 'Recombination' area. The idea with using individual chromosome maximum cMs is to apply it to all, then take the average.


NOTES: 

The 23AndMe vendor does not generate sufficient results for a valid comparison in many instances. Currently, 23AndMe will only generate one small segment, and does not supply enough information from vendor conversion for sites like GEDMatch to make a good comparison. Try lowering the limit for SNPs to 250 for the vendor 23AndMe.

Removal of the Excess IBD regions has about the same effect on individual segments as that of using the "Endogamy Correction Factor." Either will produce some error for various reasons. However, if the Excess IBD regions have been removed, then this will affect how the Total SUM version of Half Life equation works.

If you want to use the  "Endogamy Correction Factor" on the excess IBD segments instead of removing them:

- Endogamy Correction Factor:     [(100*cMs)/SNPs] 

t = -1*ln[(cMs*.0035524*100*cMs)/SNPs]/0.693147 


- for Size in cMs and number of SNPs
- for Size in cMs EQ 0: set to 11 for an arbitrary upper limit
 




Updated 10/20/2018to include table of maximum cMs per chromosome.
Updated 02/26/2018 arbitrary upper limit changed from 14 to 11, in order to avoid exponential results at the upper limit of phylogenetic trees.
Updated 02/26/2018 to add the equation for total sums in cMs and link to reference table.
Updated 02/17/2018 to add spreadsheet version of the equation.
Updated 03/27/2018 to add SNP parameters for Total SUM calculation and a note about 23andMe problems.
Updated 03/29/2018 Correct the reference regarding MyHeritage to 23AndMe (vendor indicated at GedMatch starting with an "M"). Note that Total SUMs is not giving adequate results. Added a quote from a public post by CeCe Moore from the ISOGG email list.
 Updated 04/03/2018Corrected to report that GEDMatch does not report out "Total SUMs" properly, due to an apparent removal of Excess IBD regions. Included equation for "Endogamy Correction Factor."




References:


HAM Group #1 Information

HAM Y-DNA Project Phylogenetic Tree

HAM Group #1 Initial Tiny Autosomal Segment Triad Study 


ISOGG Autosomal DNA statistics


Maximum Values for Centimorgans

cM Values Per Chromosome  (table by Ann Turner)

GEDMatch


FamilyTreeDNA

HAM DNA Project Dean McGee's Utility output

HAM DNA Project Y-DNA Results at HAM Country

HAM DNA Project at FTDNA

How to Read HAM DNA Phylograms
    (video)





  
  
 

Monday, May 26, 2014

Family Finder Checklist

 

  Family Finder Checklist
  

If you are new to Family Finder product from Family Tree DNA, here's a handy checklist that might help you move along a little more quickly.

I think the first thing that strikes Family Finder participants is the absolutely huge amount of work yet to be done. The second thing that strikes them is that although FTDNA will provide the data, along with some brief analysis and basic tools, the bulk of the data remains in uncharted territory.

Because of the very large amount of work left yet to do, I would like to review what the goals are in DNA testing, and what it is that folks expect to get out of the testing. Then briefly review some of the things that can be obtained from the data that you may or may not have pursued.

As for the goal of DNA research, there are a couple of things that I have noticed abut the DNA participants. For the most part, there are mainly three items that people are looking for:

1) Information about their heritage. What, in general, can the DNA tell us about our ancestral origins?
2) Is there any way that the DNA can confirm Native American heritage. Believe it or not, a lot of folks have asked me that question.
3) Can the DNA confirm or deny relationships where no paper trail exists? For many, this is equivalent to getting beyond the end of the paper trail, which usually ends in a brick wall. For most, this is getting further back to their origins, but for some, this also means confirming a relationship when the paper trail just does not quite fit.


As with all DNA, the most important item is to find your matching DNA. For Y-DNA, this means finding your matching STR and SNP haplotype group. For mtDNA, this means finding your matching Full Genome Sequence haplotype group. For Family Finder, this means finding your matching Centimorgans and starting and ending locations.

It all starts with downloading your data.

After you have checked out your genetic origins, what's next?

Family Tree DNA provides a chromosome browser with limited functionality. Basically, it allows you to compare your autosomal results to about 5 other people in a browser that displays matching locations on the chromosome. However, when you have literally thousands of matching segments to sort out, this is of limited value. What you want to do first is download your data.

There are two types of data to download from FTDNA. One is your raw results. The second is your matching chromosome browser results. You will want to download your raw data in order to use other utilities, such as uploading to Gedmatch.com, for example. You will also want to download all of your matching chromosome browser data so that you can work on matching DNA segments from a spreadsheet at home.

Next Step: Upload your GEDCOM file.

You will want to update your GEDCOM file for all of your lines, and upload the file to Family Tree DNA. This will be useful for people that match your chromosome segments for two main purposes.

1) To determine which ancestor the matching segment applies to (or which it does not apply to), and
2) To map the ancestor to your Master spreadsheet to in order to track your progress.

Use the Utilities on the Internet

DNAGedcom

You will find it helpful to locate and sort matching data by chromosome, starting location, and size of matching segments (in CentiMorgans). The web site DNAGedcom will do this for you and can return the results in a nice graphical display. The results come in the form of HTML, which you can save off to your desktop at home. They also include data from 23andMe, so you get results from more than one vendor. Very useful in quickly locating areas to work on.

Once you have your autosomal data from Family Tree DNA, sign onto DNAGedcom and upload your data.

  http://www.dnagedcom.com/

You will want to run the "Autosomal Segment Analyzer" and start searching for common ancestrors by using the largest segments.
Those that align along the diagonal should be recognizable common ancestors. You may notice that not all segments fall along the diagonal, and this may be an example of endogamous relationships, and will usually be difficult to track down.

Therefore, you want to focus on the largest matching segments along the diagonal.




DNAGedcom Autosomal DNA Segment Analyzer
DNAGedcom Autosomal DNA Segment Analyzer


 

Gedmatch

Also very helpful is the GedMatch web site. This web site does not provide the same sorting function that I like from DNAGedcom, but it gives me something else that DNAGedcom does not. In fact, it also provides utilities that FTDNA does not. By far, my favorite on this site is the 'Admixture' utilities. These utilities provide two views, one for 'Admixture Proportions,' and another for 'Admixture Chromosome Painting.'

   http://gedmatch.com/

The Admixture Proportions can approximate the proportions that you get from the National Genographic Project (minus the supporting information that Geno 2.0 provides). If you have not yet tested with Geno 2.0, ad would like a preview using your autosomal data from FTDNA, then it is worthwhile to take a look at this Gedmatch utility.

But, what impresses me the most about Gedmatch is it's ability to pick up my Native American heritage (from the Family Finder data), whereas Family Tree DNA may not do this. Although you are using data from FTDNA, this utility can help you identify something about the heritage of your matching segments. I use the Admixture Chromosome Painting feature to map out the segments containing Native American heritage, and add a column on my Master spreadsheet so that I can identify matches that may contain Native American heritage.

To briefly summarize, for me, the Family Finder product did not identify any Native American heritage at all, but Geno 2.0 shows me to have 2 % Native American. The Gedmatch utility has helped my break that down into Native American components: 

   - Native American (i.e., Harrapaworld analysis says Oklahoma Cherokee)
   - Berigian, and
   - SouthEast Asian components.

Pretty sweet.



GedMatch Chromosome Painting Example
GedMatch Admixtur Chromosome Painting - Harrapaworld view
   

  


An example of Chromosome Painting from GedMatch, with the color coding insert. Here, I am using the Harrapaworld view so that I can identify the Native American segments. For me, I would expect to see Native American segments, originating components from:

 - NE Asia
 - SE Asia
 - American (or Amerindian, Oklahoma Cherokee)
 - Beringian
 - Siberian

What I can then do with it is add descriptions of the segments on my Master spreadsheet that match the Native American locations, in the hope that I will be able to identify the Native American segments (start/end locations) with specific ancestors.



Look at an Existing Example of Ancestor Mapping from Tim Janzen:

Tim Janzen provides an example of his autosomal spreadsheet online.
If you are mapping your chromosomes and specific regions of your autosomal chromosomes
to specific ancestors using data from 23andMe and FTDNA's Family Finder.
Tim has been working on this for his mother's phased data. He has uploaded a zipped Excel file that shows his mother's chromosomes correlated with the ancestors she received them from to his Dropbox account at:

  http://dl.dropbox.com/u/21841126/chromosome%20map%20Betty%20Janzen.zip

Tim also has a phasing utility (for two parents and a child), similar to David Pike's utility (below). Instructions to Tim Janzen's phasing utility:

  http://dl.dropbox.com/u/21841126/phasing%20program%20instructions.rtf

Phasing Utilities

David Pike provides some utilities for examination of autosomal data.
David A. Pike, PhD, FTICA, Department of Mathematics and Statistics, Memorial University of
Here's a link to his utilities:

    http://www.math.mun.ca/~dapike/FF23utils/



Get familiar with the terminology

You will find that the best place to start is with the matches that have the largest value in Centimorgans. Then, you will want to sort those largest matches by chromosome start and ending locations. That will help you triangulate which starting and ending location matches to a particular ancestor. And you will want to record your findings as you progress.

Get familiar with what is meant by IBD (Identical by Descent), IBS (Identical by Segment), ICW (In Common With), phasing, triangulation, CentiMorgans, etc. You will make the most progress if you start with the largest matching segments, and track which ancestor matches to those larger segments.

Keep a separate 'Master' spreadsheet.

In order to track you progress, you will want to keep a record of your work in a separate spreadsheet. This is because you don't want to lose track of the work that you have already done, and because there are new people signing up for autosomal DNA every day. So the data containing your matching chromosome browser results will continue to be updated. You will want to keep your results separate from the updated data.

Visit Helpful DNA blogs

There have been several blogs that have posted helpful information about autosomal DNA. The three that come to mind who have been most helpful to me have been Roberta Estes, CeCe Moore, Debbie Kennett, and on the technical side, Dienekes' blog.

Roberta Estes maintains "DNA Explained' http://dna-explained.com/category/family-finder/
On the bottom right of her screen she lists "Categories" which is helpful in looking up information about how to use Family Finder, what to expect from it, etc.

CeCe Moore maintains a blog with some helpful information. CeCe Moore is an independent professional genetic genealogist and television consultant.

  http://www.yourgeneticgenealogist.com/


Dienekes Pontikos and provides a blog is dedicated to human population genetics, physical anthropology, archaeology, and history. Dienekes' Anthropology Blog often carries short notes about the technical studies in the field of human DNA, and is interesting to read for the scientific content.

  http://dienekes.blogspot.com/

Sign up for the Autosomal DNA email lists


DNA: GENEALOGY-DNA Mailing List

Anyone with DNA (i.e., anyone!) who would like to ask questions, discuss methods and share results of DNA testing as applied to genealogical research:

http://lists.rootsweb.ancestry.com/index/other/DNA/GENEALOGY-DNA.html

DNA: AUTOSOMAL-DNA Mailing List

A mailing list for the discussion of the various aspects of autosomal DNA testing for genealogical purposes:

http://lists.rootsweb.ancestry.com/index/other/DNA/AUTOSOMAL-DNA.html

Finally, keep a record of your matches

It is important to track your progress. Keep a running record of those that agreed upon a Common Ancestor.
The best way to do this is by having a column on your Master spreadsheet for a Common Ancestor for the matching segments.
Tim Janzen shows an example of 'How To' do this. Personally, my Master spreadsheet is split up by:

a) Chromosome
b) Sorted by matching segment Starting and Ending locations.

This is in the style that you see from the Autosomal Segment Analyzer from DNAGedcom's "Autosomal DNA Segment Analyzer."

Expect this step to take time.

Expect this part to take time. Jim Barrett says that he now has 200 ancestors identified and mapped out on his Master spreadsheet.
This was done one ancestor at a time, comparing matching segments and contacting the matching participants. Jim says that you will have some false starts and other problems along the way, so expect it to take a long time to map out all of your ancestors' matching autosomal segments.