Seqanswers Leaderboard Ad

**GenoMax** · 08-19-2016, 08:20 AM

When something like this happens (50% sounds low) it is always good to capture reads that don't align and then check a few of them by blast@NCBI to see if there is any unexpected contamination.

**michfish86** · 08-22-2016, 06:27 AM

I've had a chance to blast some of the sequences that did not align and I'm not seeing clear signs of contamination. The majority (~70%) of top hits are poor matches that best align to a wide variety of plants and animals. Some sequences (~10%) have alignments that make sense with contamination with species that may co-occur with my study species (an ocean-dwelling crustacean). The rest (~20%) come back with my study species as one of the top hits, but must be different enough from the reference transcriptome that they don't align via Bowtie.

I'd appreciate any insight that anyone can offer on this situation!!

**GenoMax** · 08-22-2016, 06:53 AM

This still does not sound right. If you have a transcriptome available for your exact species then the alignment should be better. Unless the submitted transcriptome has errors in it.

Did you assemble a transcriptome from your own data using trinity? Not clear from your original post.

**michfish86** · 08-22-2016, 07:06 AM

No, I did not assemble a reference transcriptome using my own data. I am only aligning against a published transcriptome for my exact species. Perhaps there are errors in that transcriptome and it would be wise for me to reach out to the authors.

I did not try assembling a transcriptome from my data because my understanding is that because my data are from a MiSeq, I likely don't have the sequencing depth to do so.

**Macspider** · 08-23-2016, 12:12 AM

Before reaching out to the authors, I'd suggest that you post here the BOWTIE command you ran (there must be somethign in the trinity log), to add info on the one from trinity you posted.
- How many mismatches did you set? (default is 2 if I don't go wrong)
- How did you write the read files in the command?
- Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
- If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?

**michfish86** · 08-23-2016, 05:52 AM

- How many mismatches did you set? (default is 2 if I don't go wrong)

I kept the default, which is indeed 2 mismatches.

- How did you write the read files in the command?

I'm sorry but I'm not certain what you mean besides what I indicated in the Trinity command in my initial post.

- Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?

Because Trinity seems limited on available options for Bowtie, I ran the data through Bowtie and Bowtie2 outside of Trinity for the sake of addressing your question. I just ran default values in both. The code and the results for the same individual using each program are below.

I also want to note that my current understanding is that when aligning to a reference transcriptome, an ungapped aligner (i.e., Bowtie) is preferable to a gapped aligner (i.e., Bowtie2). However, I see people using Bowtie2 for reference transcriptome alignments in the literature so perhaps I am wrong.

Bowtie

./bowtie REF -1 R1_reads -2 R2_reads --un unaligned --time

# reads processed 659098
# reads with at least one reported alignment: 281199 (42.66%)
# reads that failed to align: 377899 (57.34%)
Reported 281199 paired-end alignments to 1 output stream(s)

Bowtie2

./bowtie2 -x REF_Bowtie2 -1 R1_reads -2 R2_reads -S Results_Bowtie2 --un unaligned_Bowtie2

659098 (100.00%) were paired; of these:
299825 (45.49%) aligned concordantly 0 times
248293 (37.67%) aligned concordantly exactly 1 time
110980 (16.84%) aligned concordantly >1 times
----
299825 pairs aligned concordantly 0 times; of these:
39197 (13.07%) aligned discordantly 1 time
----
260628 pairs aligned 0 times concordantly or discordantly; of these:
521256 mates make up the pairs; of these:
385635 (73.98%) aligned 0 times
71968 (13.81%) aligned exactly 1 time
63653 (12.21%) aligned >1 times

70.75% overall alignment rate

- If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?

To check into insert sizes, I ran Bowtie with 3 different insert size values.

250 (default): 42.66% alignment (same run as above)
800: 45.88% alignment
1500: 45.88% alignment

Looks like insert size of 800 isn't an issue! I appreciate any additional insight that anyone can provide!!

**Macspider** · 08-23-2016, 07:15 AM

I am surprised by the fact that bowtie2 maps ~ 70% of the data while bowtie maps only 45%. I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.

What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases. Is your species allotetraploid? (it might help)

**michfish86** · 08-23-2016, 08:03 AM

What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases.

Unfortunately, Bowtie only allows up to 3 mismatches. I tried increasing the number of mismatches from 2 to 3 and it resulted in about a 1% alignment increase. Bowtie2 only allows 0 or 1 mismatches

Is your species allotetraploid? (it might help)

No, it is not, but your question led me to some new information. The species has a large number of small chromosomes (>100) and evidence of supernumerary chromosomes. Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?

I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.

Please correct me if I'm wrong, but I'm not sure this would be helpful because the programs work in such different ways. In particular, I think the big increases for alignments in Bowtie2 is due to it allowing for gapped alignments (http://bowtie-bio.sourceforge.net/bo...-from-bowtie-1). This is why I think Bowtie2 inflates the number of alignments- since I'm aligning transcripts to a transcriptome, there should be no gaps (right?).

However, looking through default settings for both programs, it appears the only major difference is in insert sizes, which we've already eliminated as an issue.

**Macspider** · 08-23-2016, 08:08 AM

Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?

I'm not sure about this. Because the supernumerarity itself doesn't affect the alignment if the sequence is the same for each supernumerary chromosome. If the sequence diverges, then you lose a lot of reads because you allow only 3 mismatches and many of your reads were obtained from supernumerary chromosomes.

Do you know if the supernumerarity comes with a divergence in the sequence?
May I suggest you to try to align them with any other alignment program that allows you to provide any number of mismatches? I don't know which one could do the trick, though!

EDIT: If I were you I would give BLAT a try. It might be slow but you can specify the minimum sequence identity between target and query and if you do some parallel tryouts like at 80%, 85%, 90%, 95%, 99% then you will see at which of these steps you gain mapped reads.

**GenoMax** · 08-23-2016, 09:05 AM

@michfish86: I am going to suggest that you try BBMap for these alignment. There is a separate thread for the program (suite) here. @Brian (author of BBMap) and I can help if you run into problems.

**michfish86** · 08-23-2016, 12:34 PM

GenoMax: Thanks for suggesting BBMap. It seems like a very nice program/suite and quite intuitive to get running. Instead of retyping it all, I've attached a screenshot of my BBMap output. It appears to have mapped about 78% of the reads. I'm still trying to learn about the BBMap output so I'd appreciate any feedback you have.

Thanks again for helping me!

Edit: Just noticed I left off the read 2 output stats. They are very similar to read 1's output.

Attached Files

BBMap_output.png (61.9 KB, 9 views)

**Macspider** · 08-23-2016, 11:49 PM

It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.

**michfish86** · 08-24-2016, 03:08 AM

For trimming, I used Trimmomatic. I used two different TruSeq3 Illumina adapter sequence sets that come with the program to remove the correct adapters. To verify that it worked, I ran FastQC before and after trimming. Before trimming, the "over-represented sequences" output was dominated by the adapter sequences and after trimming there were no adapter sequences listed.

Edit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?

**GenoMax** · 08-24-2016, 04:31 AM

Originally posted by Macspider View Post

It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.

Where do you see the 20% map?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

RNASeq: Expected bowtie alignment percent to transcriptome?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News