Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Homozygous calling from Exome DNA-seq

    Working with matched normal tumor samples from TCGA breast to determine percentage of samples with a germline homozygous deletion.

    From 1000 genome project and other publications there are known genes that have a homozygous deletion in germline.

    For example the particular gene I am looking at has been published based on PCR to be homozygous deleted in a large percentage of caucasians. The deletion based on 1000 genome data is very precise removal of the gene with minimal impact to neighboring genes and is called a homozygous deletion in a large percentage of the 1000 genome samples. The view is that this deletion created an advantage at some point in our ancestry.

    I am trying to determine if this deletion is protective of cancer where using matched normal tumor TCGA breast data I want to find the percentage of samples that have the homozygous deletion.

    Using samtools I did a sequence read count for the gene of interest as well as very close neighboring genes. This particular region has a high number of genes. Using what is known about this particular germline deletion from 1000 genome you would expect that if a sample has a homozygous deletion of the gene when pulling reads from this region you would have some number of samples with 0 reads. If I did get X number of samples with 0 reads then that would indicate the deletion of this gene is not protective of cancer if the percentage of samples with the deletion matched what is expected by chance.

    Here is the problem that I need help on ways to continue to challenge the findings. Of 60 matched normal tumor samples analyzed so far they all contain reads from the gene. This gives strong support for the hypothesis that those who are missing both copies of this gene will not get cancer. Bold statement that needs challenging.

    Using a neighboring gene that is roughly the same size and has similar exon/intron patterns as the control I normalize the read count returned by sam by the read count of the neighboring gene. Need to do a more formal RPKM number and filtering on phrep score but quick comparison is that 31 of the samples have a sequence read ratio of 20%, 7+ samples at 50% and 20+ samples > 80% compared to the neighboring gene. Tempting to call the 20+% group samples as a heterozygous germline deletion but would feel much better if it was a 50% ratio. I suspect that doing RPKM will raise the percentage closer to the expected 50%. Average # of reads across all samples for the region that is known to be deleted is 5800. For the 20% ratio group the Average # of reads in the region known to be deleted is 2000 and the minimum is 658. For the neighboring gene used to normalize the percentage the average number of reads is 12,000.

    Is it reasonable to assume that a homozygous deletion in germline should result in 0 sequences read for that region?

    Contamination is a issue but not expecting that it would be 20% in almost half the samples.

    The reads could be originating from a pseudogene/or other gene with sequence homology and the exon capture library is not precise enough.

    I took a couple of the reads from the deleted region of interest and did a blast search and they hit the expected gene 100%. This tells me the coordinates are correct.

    The mapping quality for some of the reads are not very good but they have phred scores of 30+ so given they actually map to a known sequence indicates the reads are probably valid.

    The BAM files were mapped by TCGA.

    Looking at the region with IGV the reads have good exon distribution with peaks in the middle of the exon.

    Welcome any feedback or advice on what else I can do to validate that having X number of sequences in a region means that the region is not a homozygous deletion.

    If you have a specific area of expertise in this area and can contribute to the data analysis looking for co-authors.

  • #2
    The best way to confirm your hypothesis would be to have several tumor samples were homozygous or heterozygous that you can use as controls in your experiments. This could be possible since the deletion can reduces cancer risk but it can't be 100% reduction. Your data point that the region is not deleted but you can't be sure because there can be sequences misaligned. For example, if you look for chromosome Y genes in female exome you always find aligned reads there, but they are usually in lower coverage counts than in male.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    9 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    67 views
    0 likes
    Last Post seqadmin  
    Working...
    X