Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Allelic Imbalance with GSNAP

    Hi,

    I'm working on creating simple pipeline for allelic imbalance analysis for our lab. We want to analyze allelic imbalance for F1 crosses between two mice strains (CAST-EiJ and 129S1-SvlmJ). So one allele comes from CAST-EiJ genome, the second one from 129S1-SvlmJ. I need to map reads to two alleles and then run different tests.
    The first issue is reference mapping bias. To overcome this bias, I want to use variant-aware aligner GSNAP. It requires one fasta sequence (considered as 'reference' for this task) and a list of SNPs between two alleles (it my case, this is the same as SNPs between two strains). I have two fasta files with sequences for mice strains. I also downloaded VCF files for both strains, but these VCF files describe difference between our strains and reference genome, mm10. So I probably need to create my own VCF file just from two fasta sequences. Could you please help me here? Do I need to write my own script to do this (probably align and list differences) or there is a program to do this?
    Or maybe I'm complicating things and all this can be done other way? Thank a lot!

  • #2
    I think, maybe, you are over-complicating it. The point of mapping is to place reads on their origin; discarding reads with low identity to their origin incurs ref-bias. So, reference-bias is generally an artifact of mapping programs that have insufficient sensitivity. I suggest you try BBMap, which has very high sensitivity (meaning, it can align reads with low identity to the reference).

    In your specific case, since you have fasta files for two different mouse strains, I suggest using BBSplit to allocate the reads to the different strains, then use BBMap to map to each strain independently.

    Comment


    • #3
      Brian, thank you for the answer!

      Reference bias is generally not about sensitivity. It is allelic mapping bias: read carrying the alternative allele of a variant has at least one mismatch, and thus have lower probability to align correctly that the reference reads. And this would be true regardless overall sensitivity of aligner.

      BBMap sounds good, thank you. You write

      "I suggest using BBSplit to allocate the reads to the different strains".

      What do you mean by "allocate"? You suggest to map reads to individual genomes, right? And is Pileup.sh able to calculate coverage for two alleles then? Thank you!

      Comment


      • #4
        Originally posted by kintany View Post
        Reference bias is generally not about sensitivity. It is allelic mapping bias: read carrying the alternative allele of a variant has at least one mismatch, and thus have lower probability to align correctly that the reference reads.
        I'm not sure I agree with that. Basically, a perfect aligner would map all reads somewhere. So even if a read has some mismatches, it should get mapped to its origin, as long as the sensitivity is sufficient. In rare cases, changes would make it map to somewhere else better, which would incur ref bias; but in my experience, the leading cause of ref bias is mapper insensitivity (meaning, reads that don't match the reference simply don't get mapped), rather than coincidental matches to other parts of the genome due to mutations or errors.

        BBMap sounds good, thank you. You write

        "I suggest using BBSplit to allocate the reads to the different strains".

        What do you mean by "allocate"? You suggest to map reads to individual genomes, right? And is Pileup.sh able to calculate coverage for two alleles then? Thank you!
        If you give BBSplit multiple reference fastas, it will take a single input fastq (or two paired fastqs) and produce multiple output fastqs, one per reference. The outputs will be the reads that best match each reference. You can specify what should be done with reads matching multiple references equally well with the "ambig2" flag (ambig2=toss, ambig2=all, etc).

        As for Pileup - all it does is calculate the coverage according to a sam/bam file. So, for example, if all reads were mapped correctly:

        Code:
        pileup.sh in=mapped.sam out=stats.txt
        That would tell you the coverage on a per-scaffold basis. It does not have any understanding of multiple alleles, but it will correctly report the coverage of a sam/bam file that was mapped to multiple concatenated references. representing different alleles.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X