Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligning numerous reads to several small references

    Hi all,

    I'm trying to assemble a large set of illumina reads (over 18 million) to a reference. My reference consists of multiple candidate sequences varying in size and location across the genome. I used Maq to map my paired-end reads to just one of these individual sequences but was only able to map around 10% of the reference. I just began this project and am wondering:

    -How should I approach the preparation of my reference file(s)?
    -Should I narrow my reads?
    -What is the scope of a typical project in Maq in terms of read number and reference size?

    Any advice or helpful tutorials/references would be greatly welcome!

  • #2
    Originally posted by R diggity View Post
    My reference consists of multiple candidate sequences varying in size and location across the genome.
    Careful, if you are only aligning to your regions of interest you will often end up with false mappings. Generally the best approach is to map to the entire genome, and filter the results to your regions of interest.

    10% mapping is not surprising for a hybridization based capture of a small region (I am assuming this is what you are doing). I did an Agilent capture / GA2 sequencing in human and got 16% mapping to the 0.3Mbase of target regions.

    Comment


    • #3
      Thanks for the advice. I suppose I will have to construct my reference genome from quite a few separate linkage groups. Given that my reads are 75bp in length, will I have to manually manipulate the reference sequence such that it has gaps greater that 75bp between chromosomes?

      Edit: I found a FASTA file containing the entire genome with the linkage groups treated as separate sequences. Does Maq understand this?

      Edit2: I used easyrun to map paired ends to the genome, and only mapped 18.24%. I'm fairly certain I'm doing something incorrectly.
      Last edited by R diggity; 07-10-2010, 02:05 PM.

      Comment


      • #4
        multiple reference sequences

        I do not know if you have tried the CLC bio software at all, but it should be able to handle your data in a variety of ways. First, you can easily map your Illumina reads to multiple reference sequences. If these reference sequences are a subset of a larger genome, you can also use our targeted resequencing tool to get a report of the mapping of your reads to the targeted area vs the non targeted area. The tools are pretty flexible, so there are a lot of different ways that you can apply them to your data. The software is commercial, but you can use the trial for two weeks to see if it is able to solve any of your problems. The download is available from the CLC web site: http://clcbio.com/index.php?id=1240 I hope you'll try it.

        Note: I work for CLC.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Innovations in Spatial Biology
          by seqadmin


          Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

          3D Genomics
          While spatial biology often involves studying proteins and RNAs in their...
          01-01-2025, 07:30 PM
        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 01-09-2025, 04:04 PM
        0 responses
        432 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-09-2025, 09:42 AM
        0 responses
        441 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-08-2025, 03:17 PM
        0 responses
        453 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-03-2025, 11:18 AM
        1 response
        50 views
        1 like
        Last Post Tonia
        by Tonia
         
        Working...
        X