Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low overall alignment rate

    Dear All,
    I did run a file.fasta using bowtie2 with a command bellow

    ./bowtie2 -x ~/tan_analysis/rice1 -U ~/tan_analysis/GBS20130709_S1_L001_R1_001.fastq -S GBS_test.sam
    1604137 reads; of these:
    1604137 (100.00%) were unpaired; of these:
    1583639 (98.72%) aligned 0 times
    16736 (1.04%) aligned exactly 1 time
    3762 (0.23%) aligned >1 times
    1.28% overall alignment rate

    my question is why the overall alignment rate very low/

  • #2
    It's rather difficult to say without seeing the data. Perhaps you need to trim your reads. Perhaps the reference is just not that similar. Perhaps your samples were swapped with someone elses. Try doing local alignment and see if that helps. Alternatively, blast a few of the reads and see what you get.

    Comment


    • #3
      Dear dpryan,
      Thank you very much for your suggestions

      i thinhk i should trim my reads, because i did not trim.

      i am looking for how to trim.

      do you have any suggestion how to trim barcode and low alignment rate?

      Comment


      • #4
        Are you aligning to the correct reference/genome? Take some of the sequences that Bowtie isn't aligning and BLAT them, do they align?

        Comment


        • #5
          I don’t think a 1% alignment rate will be fixed by trimming. For more help you should put an example output from fastqc up. That will usually tell you if your reads need trimming.

          If you’re using a reference genome that is even a little bit diverged from your species, you’ll need to loosen the parameters for bowtie (even some out-bread populations in certain species have enough sequence diversity that this can be an issue). You should also use the local alignment dpryan suggests, its more flexible than end-to-end and could negate the need to trim.

          Comment


          • #6
            Total of 1.2 Million reads seems to be a very small number when considering that this is rice genome OP is aligning to (what kind of an experiment is this BTW?).

            As others have said you should do some QC (if needed trimming) before doing the alignments. It would not hurt to take a set of reads (convert them to fasta) and just blast them against genbank to see if you have the right sequence (i.e. rice).

            Comment


            • #7
              Thank you very much for your suggestion,
              After I removed barcode and Illumina sequence in the data, i got these result

              2398641 reads; of these:
              2398641 (100.00%) were unpaired; of these:
              269976 (11.26%) aligned 0 times
              1542588 (64.31%) aligned exactly 1 time
              586077 (24.43%) aligned >1 times
              88.74% overall alignment rate

              it look better than previous one

              Comment


              • #8
                very low alignment rates with bowtie2 and bwa

                Hi,

                I am getting really low alignment rates too.

                Bowtie2 gives me the following output:

                #map the reads
                -bash-4.1$ ./bowtie2 -p 1 -x AER -1 S25_R1_001.fastq -2 S25_R2_001.fastq > S25_bowtie2.sam

                #output
                4240966 reads; of these:
                4240966 (100.00%) were paired; of these:
                4240777 (100.00%) aligned concordantly 0 times
                161 (0.00%) aligned concordantly exactly 1 time
                28 (0.00%) aligned concordantly >1 times
                ----
                4240777 pairs aligned concordantly 0 times; of these:
                10902 (0.26%) aligned discordantly 1 time
                ----
                4229875 pairs aligned 0 times concordantly or discordantly; of these:
                8459750 mates make up the pairs; of these:
                8168790 (96.56%) aligned 0 times
                49047 (0.58%) aligned exactly 1 time
                241913 (2.86%) aligned >1 times
                3.69% overall alignment rate

                and bwa gives me the following output (I ran samtools flagstat command to see % overall alignment rate, which is 47% if I understand the output correctly)

                #map the reads
                -bash-4.1$ bwa mem -t 4 AER.fasta S25_R1_001.fastq S25_R2_001.fastq > S25_bwa.sam
                #convert to bam
                -bash-4.1$ ./samtools view -bS S25_bwa.sam > S25_bwa.bam
                #get flagstats
                -bash-4.1$ ./samtools flagstat S25_bwa.bam

                #output
                8816220 + 0 in total (QC-passed reads + QC-failed reads)
                0 + 0 secondary
                334288 + 0 supplementary
                0 + 0 duplicates
                4153225 + 0 mapped (47.11%:-nan%)
                8481932 + 0 paired in sequencing
                4240966 + 0 read1
                4240966 + 0 read2
                2605074 + 0 properly paired (30.71%:-nan%)
                3513674 + 0 with itself and mate mapped
                305263 + 0 singletons (3.60%:-nan%)
                901042 + 0 with mate mapped to a different chr
                41706 + 0 with mate mapped to a different chr (mapQ>=5)

                I have read that bwa me is generally a very aggressive aligner and that probably explains the 47% rate.

                I am looking into how to extract unmapped reads so that I can blast them to see any contamination issues.
                The fastqc reports look fine (no red crosses), especially after I trim the first ~10 and last ~3 bases. Is there any other quality control I should be doing before mapping? What are all of the reasons for low alignment rates?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                57 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                56 views
                0 likes
                Last Post seqadmin  
                Working...
                X