Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matepair BWA vs Ssaha

    Hey !
    We have to map some Illumina Matepair reads and we have problems with the results that we are getting from different mappers. The insert size is 2000 BP per Matepair read.
    We have tried BWA, Ssaha2 and the brand new razerS3. So far we have results from Ssaha2 and BWA. BWA was mapped as reverse complement and normal reads as well in order to determine if the mapper works correctly with the Matepairs.

    We are aware of the possible paired end contamination. In total we have 80 Million reads and roughly 8 million reads with BWA (normal reads) and 7.7 million (reverse complemented reads) could be identified as Matepairs.

    Ssaha2 on the other hand detects 48 million reads with normal reads (reverse complement not finished yet).



    Here are some quick statistics:

    BWA
    Genome RC vs Genome N
    # N mapped reads 8.262.260
    # RC mapped reads 7.765.615
    # similar reads (name,diff of distances <=5) 7.178.290 (92% of RC reads,87% of N)

    Ssaha vs BWA
    Genome_ssaha vs Genome_bwa Normal Reads
    # mapped reads in BWA 8.262.260
    # mapped reads in Ssaha 48.340.873
    # similar reads (name, diff of distances <=5) 7.402.903

    What do you guys think of the results? Can BWA work with Matepair reads? How can we verify our results? Is the amount of Matepairs mapped by Ssaha2 belivable?

    Thank you in advanced

  • #2
    It's hard to judge without knowing what command lines you used for each software.

    Comment


    • #3
      bwa: sample -a 3000 -o 1 -P -n 3
      ssaha: -solexa -pair 100,2000

      Comment


      • #4
        try the SMALT aligner

        What rate of variation do you expect between the DNA you are sequencing and the genomic reference used for mapping? The sensitivity of BWA starts to deteriorate for variation (error) rates above 2%.

        You may want to try the SMALT aligner. It uses an approach similar to ssaha2, but is much more efficient for most applications.
        Wellcome Sanger Institute tools directory


        You build an index with 'smalt index <index_name> <genome_fasta>'.
        Then map with 'smalt map -i 4000 -j 100 <index_name> <fastq_mate_1> <fastq_mate_2>

        Comment


        • #5
          Hey.
          We expect much lower variation rates then 2%. We are comparing the genome of the domesticated Guineapig (reference, fully annotated) with the wild Guineapig.

          Comment


          • #6
            You could give RTG (http://www.realtimegenomics.com/) ago, I've used their software to map 2Kb, 5Kb and 10Kb mate pair reads for cows. They have a free version for researchers which would probably suit your needs.

            You'll need to tweak their parameters to allow the MP's a distance of 2Kb to be paired (say -m 1500 -M 2500 It's all explained in their manual), with 80M reads it should only take a few hours to run (depending on your hardware maybe less than an hour).

            Also what length reads are you using? I know when our mate pair data arrived illumina supplied 100bp and trimmed 50bp reads and recommended we only use the 50bp reads. As there was a reasonable chance the 100bp reads may have read into the adapter sequence and thus would cause problems when attempting to map them or use them.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X