Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • R diggity
    Member
    • Jun 2010
    • 12

    Aligning numerous reads to several small references

    Hi all,

    I'm trying to assemble a large set of illumina reads (over 18 million) to a reference. My reference consists of multiple candidate sequences varying in size and location across the genome. I used Maq to map my paired-end reads to just one of these individual sequences but was only able to map around 10% of the reference. I just began this project and am wondering:

    -How should I approach the preparation of my reference file(s)?
    -Should I narrow my reads?
    -What is the scope of a typical project in Maq in terms of read number and reference size?

    Any advice or helpful tutorials/references would be greatly welcome!
  • adamdeluca
    Member
    • Jul 2010
    • 95

    #2
    Originally posted by R diggity View Post
    My reference consists of multiple candidate sequences varying in size and location across the genome.
    Careful, if you are only aligning to your regions of interest you will often end up with false mappings. Generally the best approach is to map to the entire genome, and filter the results to your regions of interest.

    10% mapping is not surprising for a hybridization based capture of a small region (I am assuming this is what you are doing). I did an Agilent capture / GA2 sequencing in human and got 16% mapping to the 0.3Mbase of target regions.

    Comment

    • R diggity
      Member
      • Jun 2010
      • 12

      #3
      Thanks for the advice. I suppose I will have to construct my reference genome from quite a few separate linkage groups. Given that my reads are 75bp in length, will I have to manually manipulate the reference sequence such that it has gaps greater that 75bp between chromosomes?

      Edit: I found a FASTA file containing the entire genome with the linkage groups treated as separate sequences. Does Maq understand this?

      Edit2: I used easyrun to map paired ends to the genome, and only mapped 18.24%. I'm fairly certain I'm doing something incorrectly.
      Last edited by R diggity; 07-10-2010, 02:05 PM.

      Comment

      • Nomijill
        Member
        • Sep 2009
        • 25

        #4
        multiple reference sequences

        I do not know if you have tried the CLC bio software at all, but it should be able to handle your data in a variety of ways. First, you can easily map your Illumina reads to multiple reference sequences. If these reference sequences are a subset of a larger genome, you can also use our targeted resequencing tool to get a report of the mapping of your reads to the targeted area vs the non targeted area. The tools are pretty flexible, so there are a lot of different ways that you can apply them to your data. The software is commercial, but you can use the trial for two weeks to see if it is able to solve any of your problems. The download is available from the CLC web site: http://clcbio.com/index.php?id=1240 I hope you'll try it.

        Note: I work for CLC.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        49 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        57 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        50 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        201 views
        0 reactions
        Last Post seqadmin  
        Working...