Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligning numerous reads to several small references

    Hi all,

    I'm trying to assemble a large set of illumina reads (over 18 million) to a reference. My reference consists of multiple candidate sequences varying in size and location across the genome. I used Maq to map my paired-end reads to just one of these individual sequences but was only able to map around 10% of the reference. I just began this project and am wondering:

    -How should I approach the preparation of my reference file(s)?
    -Should I narrow my reads?
    -What is the scope of a typical project in Maq in terms of read number and reference size?

    Any advice or helpful tutorials/references would be greatly welcome!

  • #2
    Originally posted by R diggity View Post
    My reference consists of multiple candidate sequences varying in size and location across the genome.
    Careful, if you are only aligning to your regions of interest you will often end up with false mappings. Generally the best approach is to map to the entire genome, and filter the results to your regions of interest.

    10% mapping is not surprising for a hybridization based capture of a small region (I am assuming this is what you are doing). I did an Agilent capture / GA2 sequencing in human and got 16% mapping to the 0.3Mbase of target regions.

    Comment


    • #3
      Thanks for the advice. I suppose I will have to construct my reference genome from quite a few separate linkage groups. Given that my reads are 75bp in length, will I have to manually manipulate the reference sequence such that it has gaps greater that 75bp between chromosomes?

      Edit: I found a FASTA file containing the entire genome with the linkage groups treated as separate sequences. Does Maq understand this?

      Edit2: I used easyrun to map paired ends to the genome, and only mapped 18.24%. I'm fairly certain I'm doing something incorrectly.
      Last edited by R diggity; 07-10-2010, 02:05 PM.

      Comment


      • #4
        multiple reference sequences

        I do not know if you have tried the CLC bio software at all, but it should be able to handle your data in a variety of ways. First, you can easily map your Illumina reads to multiple reference sequences. If these reference sequences are a subset of a larger genome, you can also use our targeted resequencing tool to get a report of the mapping of your reads to the targeted area vs the non targeted area. The tools are pretty flexible, so there are a lot of different ways that you can apply them to your data. The software is commercial, but you can use the trial for two weeks to see if it is able to solve any of your problems. The download is available from the CLC web site: http://clcbio.com/index.php?id=1240 I hope you'll try it.

        Note: I work for CLC.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X