Unconfigured Ad

**cjp** · 11-09-2011, 08:41 AM

The headers look a bit different than normal - normally they end in /1 or /2

e.g.,

@HWUSI-EAS611_14:8:1:1489:931/1
and

@HWUSI-EAS611_14:8:1:1489:931/2

for the paired end data - note the spaces in yours probably cause problems as well.

Chris

**rebrendi** · 11-09-2011, 08:48 AM

The files are as they were provided as output from Illumina HiSeq 2000. Are you sure the file format is corrupted?

**nickloman** · 11-09-2011, 08:51 AM

These are Illumina 1.8 pipeline files which means they are Sanger quality encoded, so the correct Bowtie flag is --phred33-quals, which I think is the default for Bowtie anyway. Pairs look fine. My money is on a very large insert, try -X 1000 to start with and then see what the average pair distance is.

**rebrendi** · 11-09-2011, 08:58 AM

Originally posted by nickloman View Post

My money is on a very large insert, try -X 1000 to start with and then see what the average pair distance is.

Yesss! Now it works. I've got 73% reads mapped. Looks like the issue is resolved. Thank you very much, all who contributed to this thread!

**afields** · 03-30-2012, 11:40 AM

Paired-end Solexa data mapping with Bowtie

Hello all,

I have a similar problem as rebrendi. I am using Illumina HiSeq reads to map with bowtie to a reference genome. When I use the files before running any quality filtering, 43.86% of my paired end reads map to the reference. I filtered by quality using the FASTX-TOOLKIT and removed 29% of read1 and 39% of read2. When I tried to map these reads to the same reference genome 16 million reads paired up (according to flagstat), but only ~600 mapped. Individually, over 80% of each of the read files map to the reference, but together there is almost nothing. By increasing the insert size (-X) from 600 to 2000 I have increased the number of reads mapping to ~3000, but this is still a long way from the millions of reads I expect. I did run a bioanalyzer on my samples and the fragment sizes should be around 500bp, therefore I did not expect a dramatic increase in reads mapping with an increased insertion size. Does anyone have any recommendations about how to tweek the parameters of bowtie to get better mapping? Thanks!!

**fkrueger** · 03-30-2012, 11:54 AM

Does your quality trimming also discard sequences if they are getting too short? In this case the sequence-by-sequence order which is required for paired-end alignments might have gotten out of sync, and this could well explain such a dramatic drop in mapping efficiency.

Alternatively, could it be that your fragment length is not as long as you expect? It is quite common for e.g. 2x100bp reads to completely overlap each other, like this:

------------------------------------> read 1
<----------------------------------- read 2

If reads are completely contained within each other, Bowtie 1 will regard the alignment as invalid (which is arguably not the most sensible thing to do...). To find out whether this is the case here you could either hard-trim all your sequences by 1bp on the 3' end, so that the reads do not start and end at the same position, like so:

----------------------------------->. read 1
.<----------------------------------- read 2

Or soft-trim by using the option --trim3 1.

Good luck!

**afields** · 03-30-2012, 12:31 PM

Thanks for the post fkrueger. After my initial poor mapping, I was afraid that the QC had disrupted the order of the sequences so I ran a method which made certain that even if all of the sequence was discarded due to poor quality, the other information was not deleted, so the two files still have the same number of lines and they sync up. Taking your suggestion I looked over the first few lines of both files and they are not reverse compliments of each other leading me to believe that they are not likely overlapping sequence.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 50 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News