Monday, May 26, 2014

Family Finder Checklist

 

  Family Finder Checklist
  

If you are new to Family Finder product from Family Tree DNA, here's a handy checklist that might help you move along a little more quickly.

I think the first thing that strikes Family Finder participants is the absolutely huge amount of work yet to be done. The second thing that strikes them is that although FTDNA will provide the data, along with some brief analysis and basic tools, the bulk of the data remains in uncharted territory.

Because of the very large amount of work left yet to do, I would like to review what the goals are in DNA testing, and what it is that folks expect to get out of the testing. Then briefly review some of the things that can be obtained from the data that you may or may not have pursued.

As for the goal of DNA research, there are a couple of things that I have noticed abut the DNA participants. For the most part, there are mainly three items that people are looking for:

1) Information about their heritage. What, in general, can the DNA tell us about our ancestral origins?
2) Is there any way that the DNA can confirm Native American heritage. Believe it or not, a lot of folks have asked me that question.
3) Can the DNA confirm or deny relationships where no paper trail exists? For many, this is equivalent to getting beyond the end of the paper trail, which usually ends in a brick wall. For most, this is getting further back to their origins, but for some, this also means confirming a relationship when the paper trail just does not quite fit.


As with all DNA, the most important item is to find your matching DNA. For Y-DNA, this means finding your matching STR and SNP haplotype group. For mtDNA, this means finding your matching Full Genome Sequence haplotype group. For Family Finder, this means finding your matching Centimorgans and starting and ending locations.

It all starts with downloading your data.

After you have checked out your genetic origins, what's next?

Family Tree DNA provides a chromosome browser with limited functionality. Basically, it allows you to compare your autosomal results to about 5 other people in a browser that displays matching locations on the chromosome. However, when you have literally thousands of matching segments to sort out, this is of limited value. What you want to do first is download your data.

There are two types of data to download from FTDNA. One is your raw results. The second is your matching chromosome browser results. You will want to download your raw data in order to use other utilities, such as uploading to Gedmatch.com, for example. You will also want to download all of your matching chromosome browser data so that you can work on matching DNA segments from a spreadsheet at home.

Next Step: Upload your GEDCOM file.

You will want to update your GEDCOM file for all of your lines, and upload the file to Family Tree DNA. This will be useful for people that match your chromosome segments for two main purposes.

1) To determine which ancestor the matching segment applies to (or which it does not apply to), and
2) To map the ancestor to your Master spreadsheet to in order to track your progress.

Use the Utilities on the Internet

DNAGedcom

You will find it helpful to locate and sort matching data by chromosome, starting location, and size of matching segments (in CentiMorgans). The web site DNAGedcom will do this for you and can return the results in a nice graphical display. The results come in the form of HTML, which you can save off to your desktop at home. They also include data from 23andMe, so you get results from more than one vendor. Very useful in quickly locating areas to work on.

Once you have your autosomal data from Family Tree DNA, sign onto DNAGedcom and upload your data.

  http://www.dnagedcom.com/

You will want to run the "Autosomal Segment Analyzer" and start searching for common ancestrors by using the largest segments.
Those that align along the diagonal should be recognizable common ancestors. You may notice that not all segments fall along the diagonal, and this may be an example of endogamous relationships, and will usually be difficult to track down.

Therefore, you want to focus on the largest matching segments along the diagonal.




DNAGedcom Autosomal DNA Segment Analyzer
DNAGedcom Autosomal DNA Segment Analyzer


 

Gedmatch

Also very helpful is the GedMatch web site. This web site does not provide the same sorting function that I like from DNAGedcom, but it gives me something else that DNAGedcom does not. In fact, it also provides utilities that FTDNA does not. By far, my favorite on this site is the 'Admixture' utilities. These utilities provide two views, one for 'Admixture Proportions,' and another for 'Admixture Chromosome Painting.'

   http://gedmatch.com/

The Admixture Proportions can approximate the proportions that you get from the National Genographic Project (minus the supporting information that Geno 2.0 provides). If you have not yet tested with Geno 2.0, ad would like a preview using your autosomal data from FTDNA, then it is worthwhile to take a look at this Gedmatch utility.

But, what impresses me the most about Gedmatch is it's ability to pick up my Native American heritage (from the Family Finder data), whereas Family Tree DNA may not do this. Although you are using data from FTDNA, this utility can help you identify something about the heritage of your matching segments. I use the Admixture Chromosome Painting feature to map out the segments containing Native American heritage, and add a column on my Master spreadsheet so that I can identify matches that may contain Native American heritage.

To briefly summarize, for me, the Family Finder product did not identify any Native American heritage at all, but Geno 2.0 shows me to have 2 % Native American. The Gedmatch utility has helped my break that down into Native American components: 

   - Native American (i.e., Harrapaworld analysis says Oklahoma Cherokee)
   - Berigian, and
   - SouthEast Asian components.

Pretty sweet.



GedMatch Chromosome Painting Example
GedMatch Admixtur Chromosome Painting - Harrapaworld view
   

  


An example of Chromosome Painting from GedMatch, with the color coding insert. Here, I am using the Harrapaworld view so that I can identify the Native American segments. For me, I would expect to see Native American segments, originating components from:

 - NE Asia
 - SE Asia
 - American (or Amerindian, Oklahoma Cherokee)
 - Beringian
 - Siberian

What I can then do with it is add descriptions of the segments on my Master spreadsheet that match the Native American locations, in the hope that I will be able to identify the Native American segments (start/end locations) with specific ancestors.



Look at an Existing Example of Ancestor Mapping from Tim Janzen:

Tim Janzen provides an example of his autosomal spreadsheet online.
If you are mapping your chromosomes and specific regions of your autosomal chromosomes
to specific ancestors using data from 23andMe and FTDNA's Family Finder.
Tim has been working on this for his mother's phased data. He has uploaded a zipped Excel file that shows his mother's chromosomes correlated with the ancestors she received them from to his Dropbox account at:

  http://dl.dropbox.com/u/21841126/chromosome%20map%20Betty%20Janzen.zip

Tim also has a phasing utility (for two parents and a child), similar to David Pike's utility (below). Instructions to Tim Janzen's phasing utility:

  http://dl.dropbox.com/u/21841126/phasing%20program%20instructions.rtf

Phasing Utilities

David Pike provides some utilities for examination of autosomal data.
David A. Pike, PhD, FTICA, Department of Mathematics and Statistics, Memorial University of
Here's a link to his utilities:

    http://www.math.mun.ca/~dapike/FF23utils/



Get familiar with the terminology

You will find that the best place to start is with the matches that have the largest value in Centimorgans. Then, you will want to sort those largest matches by chromosome start and ending locations. That will help you triangulate which starting and ending location matches to a particular ancestor. And you will want to record your findings as you progress.

Get familiar with what is meant by IBD (Identical by Descent), IBS (Identical by Segment), ICW (In Common With), phasing, triangulation, CentiMorgans, etc. You will make the most progress if you start with the largest matching segments, and track which ancestor matches to those larger segments.

Keep a separate 'Master' spreadsheet.

In order to track you progress, you will want to keep a record of your work in a separate spreadsheet. This is because you don't want to lose track of the work that you have already done, and because there are new people signing up for autosomal DNA every day. So the data containing your matching chromosome browser results will continue to be updated. You will want to keep your results separate from the updated data.

Visit Helpful DNA blogs

There have been several blogs that have posted helpful information about autosomal DNA. The three that come to mind who have been most helpful to me have been Roberta Estes, CeCe Moore, Debbie Kennett, and on the technical side, Dienekes' blog.

Roberta Estes maintains "DNA Explained' http://dna-explained.com/category/family-finder/
On the bottom right of her screen she lists "Categories" which is helpful in looking up information about how to use Family Finder, what to expect from it, etc.

CeCe Moore maintains a blog with some helpful information. CeCe Moore is an independent professional genetic genealogist and television consultant.

  http://www.yourgeneticgenealogist.com/


Dienekes Pontikos and provides a blog is dedicated to human population genetics, physical anthropology, archaeology, and history. Dienekes' Anthropology Blog often carries short notes about the technical studies in the field of human DNA, and is interesting to read for the scientific content.

  http://dienekes.blogspot.com/

Sign up for the Autosomal DNA email lists


DNA: GENEALOGY-DNA Mailing List

Anyone with DNA (i.e., anyone!) who would like to ask questions, discuss methods and share results of DNA testing as applied to genealogical research:

http://lists.rootsweb.ancestry.com/index/other/DNA/GENEALOGY-DNA.html

DNA: AUTOSOMAL-DNA Mailing List

A mailing list for the discussion of the various aspects of autosomal DNA testing for genealogical purposes:

http://lists.rootsweb.ancestry.com/index/other/DNA/AUTOSOMAL-DNA.html

Finally, keep a record of your matches

It is important to track your progress. Keep a running record of those that agreed upon a Common Ancestor.
The best way to do this is by having a column on your Master spreadsheet for a Common Ancestor for the matching segments.
Tim Janzen shows an example of 'How To' do this. Personally, my Master spreadsheet is split up by:

a) Chromosome
b) Sorted by matching segment Starting and Ending locations.

This is in the style that you see from the Autosomal Segment Analyzer from DNAGedcom's "Autosomal DNA Segment Analyzer."

Expect this step to take time.

Expect this part to take time. Jim Barrett says that he now has 200 ancestors identified and mapped out on his Master spreadsheet.
This was done one ancestor at a time, comparing matching segments and contacting the matching participants. Jim says that you will have some false starts and other problems along the way, so expect it to take a long time to map out all of your ancestors' matching autosomal segments.
















Monday, March 24, 2014

Autosomal DNA Cousin Calculator

Autosomal DNA Cousin Calculator


Can you calculate your genetic cousins?






The thought occurred to me to respond to a query from Lucy Sinkular on the Rootsweb Genealogy-DNA email list regarding matching autosomal chromosome segments to an adopted person, along with her known cousins. I thought I would mention something that I had put into my autosomal DNA spreadsheet to estimate cousin relationships. I used my "CousinCalc" equation (from my spreadsheet) to informed her that I got an estimation of 10th cousin for her genetic cousin, who was adopted.


I used my "CousinCalc" to estimate that her adopted genetic cousin was on the order of 10th cousin.

Back on September 28, 2011 Jared Roach, M.D., Ph.D. Senior Research Scientist Institute for Systems Biology posted a note on the Genealogy-DNA email list for the logic behind the prediction of cousin relationships. The theory is that the number of segments, when combined with the size of the segments, can be used to estimate distant relationships.


"Maximum-likelihood Estimation of Recent Shared Ancestry (ERSA)," Genome Res., May 21, 2011. ( see http://genome.cshlp.org/content/21/5/768/F1.expansion.html)


Long autosomal segments are unlikely to be from distant relationships, and short segments can either be from close or distant relationships. The equation given in the paper is based on an equation given by Thomas in 1994:


                  (dt/100)

   P(t) = e^



d = number of meiosis
t = length of segment in cM



For these past 20 years or so, the "number of meiosis" has been taken to be the number of segments. Terms have been introduced to define valid segments (Identical By Descent, or IBD) and invalid segments (Identical By State, or IBS). Segments are considered to be IBS if, in general, they are small (less than 5 to 7 cM, or less than 500 SNPs).



The above equation does not always work well, so a large number of probability distribution functions and Monte Carlo simulations have been invented in order to help make some reasonable estimates of relatedness between two matching individuals. The topic is popular because the predicting relatedness has a number of applications, from family history to medicine.


However, the thought that was nagging me was why were these scientists not using SNP's vs. cMs?? 


So, I thought I would try to find out. Upon investigating, I found that this equation worked for predicting my 4th and fifth cousins:


      CousinCalc = (1,000 x ToTal_cMs)/ToTal_SNPs


When I ignore the concept of IBD and IBS (and just use the figures as given by Family Tree DNA in their Family Finder product), this equation works for the distant cousins that I knew about thus far. IBD and IBS at present are terms derived from the use of segment counts, the 'CousinCal' equation does not use segment counts, so my thoughts are that the current definition of IBD and IBS do not apply to the use of this equation.

But, the questions that bothered me was, 'is my sample too small??'
Would this be a statistically valid equation?



I don't have enough data to answer that, so I asked around.

I looked at Tim Janzen's autosomal segment matches to his mother, which has made publicly available. My "CousinCalc" came back with a cousin estimation of 6th cousin. Clearly, my "CousinCalc" equation does not work for cousins less than 1. But, to be fair, Tim Janzen does not list his IBS segments, so the sum of the IBS segments is an unknown part of the equation. Yet, in use of my equation, I do find that the sum of all segments delivers nearly the same result as the sum of IBD segments, so I presume that the expectation is valid that the "CousinCalc" equation will not to hold for cousins less than 1.

For more distant relationships, you should begin to see a departure from IBD reflecting the results of this equation. So, be careful about using IBD instead of sums.

Ann Turner had the most patience with this idea. However, she wasn't exactly warming up to the idea of using this new equation. She explained that 23AndMe uses segments vs. cMs, as in the article Cryptic Distant Relatives are Common in both Isolated and Cosmopolitan Samples” and the chart (Fig, 3) is given on this page:

   http://blog.23andme.com/news/announcements/how-many-relatives-do-you-have/


Simulated data showing the relationship between shared identical segments of DNA (IBD-half) and # of shared segments for different degrees of relatedness in a population with European ancestry. http://blog.23andme.com/news/announcements/how-many-relatives-do-you-have/
In short, this is the basis for the "Relative Finder" tool available at 23AndMe, as described here:     ( https://www.23andme.com/ancestry/relfinder/ )

I should note that the Team at Huff Lab has a web site available that enables you to plug in your autosomal segment information, and they will do the calculations for you. The ERSA software is freely available for download from their web site:


The data chosen for the chart were IBD segments, which basically means that the chart includes matching segments that are larger than 10 cMs. Which means, the number of IBD segments in the matches to your autosomal DNA that are over 10 cMs is an indication of how related they might be to you. Because segments do not fit very well, there are some fairly heavy duty probability equations behind the above chart. 

Then I thought, well, let me draw up what SNP's vs. cMs might look like.

Here is what my fourth cousin's matching segments look like if the segments are not summed up:


and here is what my fifth cousin's matching segments look like if the segments are not summed up:

Individual autosomal segments vs. Centimorgans (cMs)


 Looked like a direct (linear) relationship between SNP's and cMs. However, perhaps not statistically valid (not enough sample data). So, I thought I would see if Family Tree DNA's Chromosome Browser matching segments might give enough data to support the same direct (linear) relationship between SNP's and cMs. That produced the following chart:


When matching autosomal DNA segments are summed, the sums produce a chart that shows a direct relationship between total SNP's and total cMs. 1076 individual segments where SNP's and cM's have been summed per matching person.




Elizabeth Harris wrote me to say that she did not use SNP's because they did not work for her. Basically, she tested with 23AndMe, and the calculation did not work for 4 of her cousins with matching segments on chromosome 15.The 'CousinCalc' equation came out to be 33rd cousins for that segment, and the math is rather tortuous if you want the equation to come out as fourth cousins for that particular segment on chromosome 15.


Ann Turner cited this chart from Rutgers' University:


http://web.archive.org/web/20070113005025/http://compgen.rutgers.edu/maps/compare.pdf


The basic point there being that the marker position along chromosome 15 begins at about 20 MB. As does chromosome 13 and 14, but it is not a common phenomena among chromosome measurements.


However, I should point out that unfortunately, Elizabeth only gave the values for chromosome 15, and I did not get to see what 'CousinCalc' loks like for her 4th cousins using the sums across all matching chromosomes. Elizabeth did not provide data for any other matching chromosome, so I did not get to see what the data looks like from 23AndMe.


And finally, Ann Turner also pointed out that chromosome 15 has some poor regions being reported out, and a number of other chromosomes have the same problem. She cited Table 3 of this article: "Relationship Estimation from Whole-Genome Sequence Data," Hong Li, et. al. Jan 2014.


  http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pgen.1004144&representation=PDF


That Table 3 shows two segments on chromosome 15 - one between starting location of 20,967,673 and ending at 25,145,260 that show a length of 10.46 cMs and the other starting at 27,115,823 and ending at 30,295,750 with a length of 9.29 cMs. That's two to three times what you might expect to be the length in cMs.Other chromosomes showing this type of anomaly include chromosomes 1, 2, 6, 8, 9, 10, 16, 17, 21, and 22. A total of some 14 regions.


Finakky, the GedMatch.com web site has a utility that plugs relationship calculations into some of their reports. However, I think that Tim Janzen mentioned that GedMatch has not yet converted to Build 37. Family Tree DNA is now at Build 37, so the results may look slightly different at GedMatch than what you may see at your vendor.


     http://gedmatch.com/


Having given an overview regarding why SNP's are not used vs. Centimorgans, here is a display of what the cousin calculator equation look like for a few of my cousins:


Autosomal DNA 'CousinCalc' equation for fourth cousin using SNPs vs. cMs.



Autosomal DNA 'CousinCalc' equation for fourth cousin, once removed using SNPs vs. cMs.

Autosomal DNA 'CousinCalc' equation for fourth cousin using SNPs vs. cMs.

Autosomal DNA 'CousinCalc' equation for fifth cousin using SNPs vs. cMs.


The next step should be to try to gather enough data regarding the results of this equation in order to determine if this is a valid calculation that I can use in my spreadsheet. If the statistics do not bear it out as valid, then the following step should be to determine if the problems mentioned above could be remedied (or avoided) by use of a program.


Updated 03/27/2014 - fix for math error in table for Frank, made hyperlinks active.













Sunday, August 18, 2013

British Monarchy / HAM Y-DNA Comparison Study



British Monarchy / HAM Y-DNA Comparison Study



August, 2013



  
In August, 2013 Bradley T. Larkin released his study of the “Y-DNA of the British Monarchy.” The first such study of its kind, the review attempted to identify the Y-DNA of some major branches of the British Monarchy. The main lines with Y-DNA allele values given were Mountbatten, Stuart, and Windsor. The data was composed of roughly 24 Y-DNA allele (repeat) values, and described as R1b with terminal SNP’s appearing to match either U106 or L21.
  

Whereas 24 Y-DNA marker values are notoriously insufficient for an accurate comparison analysis, it may nevertheless be interesting to see who matches within the HAM DNA Project.
  

One final cautionary note regarding possible errors could be attributed to some error in sampling assumptions by Mr. Larkin.
  

Examination of the suggested Monarchy haplotypes found that there was a distant match to the largest U106 HAM DNA Group (#2), but detailed examination showed that the closest matches in our project appear to be HAM DNA Groups 3, 6, 8, and 12. However, to compound the problem of accuracy, the matching groups generally only tested to 25 to 37 markers.
  
  


HAM DNA Project / British Monarchy Comparison


 


  
  
It appears that there were not enough markers given for the modals of U106 and L21 to provide much resolution between the two.
  

Given the above phylogenetic graph, it is immediately apparent that the area of interest is the following area:
 




 

Genetic Distance







The closest Genetic Distance overall was from HAM DNA Group #3. 
  


The next illustration shows the closest matches within the HAM DNA Project to the proposed British Monarchy:






The HAM DNA participant with the closest Genetic Distance is found to be kit 43250, but that kit has only tested to 25 markers. The number of markers, as we see later, affects the accuracy of the TMRCA.
  
  
  
TMRCA
  
   
  
HAM British Monarchy TMRCA Table 1

 

Of the closest matching groups in the current HAM DNA Project, it was found that Group #2 had a closest match to Windsor with an estimated TMRCA of 1575 Years before present (+/- 354 years).
  

Group #3 had a closest match to Stuart with an estimated TMRCA of 1150 Years before present (+/- 259 years).
  

Group #6 had a closest match to Stuart with an estimated TMRCA of 950 Years before present (+/- 214 years).
  

Group #8 had a closest match to Windsor with an estimated TMRCA of 1150 Years before present (+/- 259 years).
  

Group #12 had a closest match to Stuart with an estimated TMRCA of 1550 Years before present (+/- 349 years).

 



It would appear that given the information that we have today, kit 82227 matches the most recent connection to the Stuart British Monarchy.
  
The group with the closest Genetic Distance would be Group #3.

 

 

Sources:



Larkin, Bradley T. “Y-DNA of the British Monarchy” August, 2013

HAM DNA Project (Y-DNA results from Family Tree DNA)

Dean McGee’s Y-DNA Comparison Utility

Phylip software, Kitsch package, Fitch-Margoliash method