Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNASeq: Expected bowtie alignment percent to transcriptome?

    Hi all,

    I'm using Bowtie in Trinity to align PE RNASeq data to a reference transcriptome that was published on my study species. For each of my 24 individuals, I get alignment percentages around 50%. It may just be my naivety showing, but I expected more reads to be aligned to the reference transcriptome (albeit one assembled from other data). Does the 50% alignment rate sound reasonable or is there something else going on?

    Also, following rRNA filtering and adapter/quality trimming, the FastQC report looks great, so I shouldn't be having a quality issue. Thanks for your advice!

    Trinity command:
    ./align_and_estimate_abundance.pl --transcripts ./reference_transcriptome.fa --seqType fq --left ./R1_file.fasta --right ./R2_file.fasta --est_method RSEM --aln_method bowtie --output_prefix aligned_file --output_dir ./output_dir --max_ins_size 800 --trinity_mode

  • #2
    When something like this happens (50% sounds low) it is always good to capture reads that don't align and then check a few of them by blast@NCBI to see if there is any unexpected contamination.

    Comment


    • #3
      I've had a chance to blast some of the sequences that did not align and I'm not seeing clear signs of contamination. The majority (~70%) of top hits are poor matches that best align to a wide variety of plants and animals. Some sequences (~10%) have alignments that make sense with contamination with species that may co-occur with my study species (an ocean-dwelling crustacean). The rest (~20%) come back with my study species as one of the top hits, but must be different enough from the reference transcriptome that they don't align via Bowtie.

      I'd appreciate any insight that anyone can offer on this situation!!

      Comment


      • #4
        This still does not sound right. If you have a transcriptome available for your exact species then the alignment should be better. Unless the submitted transcriptome has errors in it.

        Did you assemble a transcriptome from your own data using trinity? Not clear from your original post.

        Comment


        • #5
          No, I did not assemble a reference transcriptome using my own data. I am only aligning against a published transcriptome for my exact species. Perhaps there are errors in that transcriptome and it would be wise for me to reach out to the authors.

          I did not try assembling a transcriptome from my data because my understanding is that because my data are from a MiSeq, I likely don't have the sequencing depth to do so.

          Comment


          • #6
            Before reaching out to the authors, I'd suggest that you post here the BOWTIE command you ran (there must be somethign in the trinity log), to add info on the one from trinity you posted.
            - How many mismatches did you set? (default is 2 if I don't go wrong)
            - How did you write the read files in the command?
            - Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
            - If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?
            Last edited by Macspider; 08-23-2016, 12:15 AM.

            Comment


            • #7
              - How many mismatches did you set? (default is 2 if I don't go wrong)
              I kept the default, which is indeed 2 mismatches.

              - How did you write the read files in the command?
              I'm sorry but I'm not certain what you mean besides what I indicated in the Trinity command in my initial post.

              - Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
              Because Trinity seems limited on available options for Bowtie, I ran the data through Bowtie and Bowtie2 outside of Trinity for the sake of addressing your question. I just ran default values in both. The code and the results for the same individual using each program are below.

              I also want to note that my current understanding is that when aligning to a reference transcriptome, an ungapped aligner (i.e., Bowtie) is preferable to a gapped aligner (i.e., Bowtie2). However, I see people using Bowtie2 for reference transcriptome alignments in the literature so perhaps I am wrong.

              Bowtie

              ./bowtie REF -1 R1_reads -2 R2_reads --un unaligned --time

              # reads processed 659098
              # reads with at least one reported alignment: 281199 (42.66%)
              # reads that failed to align: 377899 (57.34%)
              Reported 281199 paired-end alignments to 1 output stream(s)


              Bowtie2

              ./bowtie2 -x REF_Bowtie2 -1 R1_reads -2 R2_reads -S Results_Bowtie2 --un unaligned_Bowtie2

              659098 (100.00%) were paired; of these:
              299825 (45.49%) aligned concordantly 0 times
              248293 (37.67%) aligned concordantly exactly 1 time
              110980 (16.84%) aligned concordantly >1 times
              ----
              299825 pairs aligned concordantly 0 times; of these:
              39197 (13.07%) aligned discordantly 1 time
              ----
              260628 pairs aligned 0 times concordantly or discordantly; of these:
              521256 mates make up the pairs; of these:
              385635 (73.98%) aligned 0 times
              71968 (13.81%) aligned exactly 1 time
              63653 (12.21%) aligned >1 times

              70.75% overall alignment rate


              - If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?
              To check into insert sizes, I ran Bowtie with 3 different insert size values.

              250 (default): 42.66% alignment (same run as above)
              800: 45.88% alignment
              1500: 45.88% alignment

              Looks like insert size of 800 isn't an issue! I appreciate any additional insight that anyone can provide!!

              Comment


              • #8
                I am surprised by the fact that bowtie2 maps ~ 70% of the data while bowtie maps only 45%. I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.

                What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases. Is your species allotetraploid? (it might help)

                Comment


                • #9
                  What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases.
                  Unfortunately, Bowtie only allows up to 3 mismatches. I tried increasing the number of mismatches from 2 to 3 and it resulted in about a 1% alignment increase. Bowtie2 only allows 0 or 1 mismatches


                  Is your species allotetraploid? (it might help)
                  No, it is not, but your question led me to some new information. The species has a large number of small chromosomes (>100) and evidence of supernumerary chromosomes. Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?

                  I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.
                  Please correct me if I'm wrong, but I'm not sure this would be helpful because the programs work in such different ways. In particular, I think the big increases for alignments in Bowtie2 is due to it allowing for gapped alignments (http://bowtie-bio.sourceforge.net/bo...-from-bowtie-1). This is why I think Bowtie2 inflates the number of alignments- since I'm aligning transcripts to a transcriptome, there should be no gaps (right?).

                  However, looking through default settings for both programs, it appears the only major difference is in insert sizes, which we've already eliminated as an issue.

                  Comment


                  • #10
                    Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?
                    I'm not sure about this. Because the supernumerarity itself doesn't affect the alignment if the sequence is the same for each supernumerary chromosome. If the sequence diverges, then you lose a lot of reads because you allow only 3 mismatches and many of your reads were obtained from supernumerary chromosomes.

                    Do you know if the supernumerarity comes with a divergence in the sequence?
                    May I suggest you to try to align them with any other alignment program that allows you to provide any number of mismatches? I don't know which one could do the trick, though!

                    EDIT: If I were you I would give BLAT a try. It might be slow but you can specify the minimum sequence identity between target and query and if you do some parallel tryouts like at 80%, 85%, 90%, 95%, 99% then you will see at which of these steps you gain mapped reads.
                    Last edited by Macspider; 08-23-2016, 08:12 AM.

                    Comment


                    • #11
                      @michfish86: I am going to suggest that you try BBMap for these alignment. There is a separate thread for the program (suite) here. @Brian (author of BBMap) and I can help if you run into problems.

                      Comment


                      • #12
                        GenoMax: Thanks for suggesting BBMap. It seems like a very nice program/suite and quite intuitive to get running. Instead of retyping it all, I've attached a screenshot of my BBMap output. It appears to have mapped about 78% of the reads. I'm still trying to learn about the BBMap output so I'd appreciate any feedback you have.

                        Thanks again for helping me!

                        Edit: Just noticed I left off the read 2 output stats. They are very similar to read 1's output.
                        Attached Files
                        Last edited by michfish86; 08-23-2016, 01:11 PM.

                        Comment


                        • #13
                          It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.

                          Comment


                          • #14
                            For trimming, I used Trimmomatic. I used two different TruSeq3 Illumina adapter sequence sets that come with the program to remove the correct adapters. To verify that it worked, I ran FastQC before and after trimming. Before trimming, the "over-represented sequences" output was dominated by the adapter sequences and after trimming there were no adapter sequences listed.

                            Edit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?
                            Last edited by michfish86; 08-24-2016, 04:04 AM.

                            Comment


                            • #15
                              Originally posted by Macspider View Post
                              It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.
                              Where do you see the 20% map?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X