Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does BWA deal with reads that are shorter than expected?

    Hello all,

    I am very new to bioinformatics so this might just be a stupid question....

    I am using BWA and Samtools to map my sequence reads on a reference mitogenome, but the number of reads that can be mapped is very low - only about 100 reads. I suspect that BWA is only mapping the "complete" reads, so the reads that are 100bp (we did PE reads of 100bp on an Illumina HiSeq).

    However, since we are working with highly degraded samples, we expect most of the fragments to be not longer than ~70bp. So if shorter reads are automatically kicked out, I am loosing >90% of my reads.

    Does anyone know if it is the BWA script that kicks "shorter-than-expected" reads? Or how I can get to the shorter reads? Any help, suggestions, thoughts, comments, ect, are highly appreciated!

    Cheers,
    Johanna

  • #2
    Can you show us your bwa commands ?
    Did you index the "mitogenome" using the same BWA version that you aligned with? Mixing version <6 with >= 6 is wrong.

    How many input reads?

    Do the input reads look good ? Try hand BLATing a few hundred random reads using command line blat against your custom mitogenome? Not getting any really good hits ?

    Try these commands to do a little QA ...

    wc input.fastq # divide by 4 for number of reads

    grep "NNNNNNN" input.fq | wc
    # check for "NNNNN" (bad) reads; you might get a few near start/end

    Comment


    • #3
      Originally posted by Jlap View Post
      ...
      (we did PE reads of 100bp on an Illumina HiSeq).

      However, since we are working with highly degraded samples, we expect most of the fragments to be not longer than ~70bp.
      ...
      Did you clip adaptor sequences from your reads? If you mean that you have a lot of fragments around 70 bp, and you did paired end 100 bp sequencing, you would have read through your fragments into the adaptors; and these sequences might prevent your reads from aligning to the reference. If you didn't already, have a look at trying a paired end clipper/trimmer such as Trimmomatic.

      Comment


      • #4
        Thanks for the posts -

        In answer to the first question, I don't think I'm using different versions of BWA - I was only introduced to it a couple of days ago.. Basically, I'm following this manual for the BWA commands: http://sourceforge.net/apps/mediawik...stall_software. (BWA/Samtools for dummies). So that's indexing the reference, sampe the reads, converting to .sam files and then to .bam to view with Tablet.

        Thanks for the QA commands; the number of reads comes up around 370,000 per file (8 files per read). It doesn't find any NNNN reads - I can only assume those were already taken out before I got my hands on the data.

        According to Arvid's suggestion, I've been trying to get Trimmomatic to work, but it keeps dropping 100% of the reads. I have the feeling this is because I am doing something wrong in creating the adapter fasta.. I've tried half a dozen different set-ups of the fasta (forward, reverse, merged, not merged, with /1 - /2, without, ect). I'm not sure how many other creative ideas I can apply here.. Does anybody have some pointers here?

        Comment


        • #5
          Just in case anyone is interested: I came across an adapter-trimming tool which performs really well to trim read-through adapters, and it's very easy to use. It's called Cutadapt and can be found here:



          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X