Seqanswers Leaderboard Ad

**Bukowski** · 11-08-2011, 02:50 PM

If only 1% map, then I'm sure taking the first 100,000 reads would give you plenty of sample data with which to tune your parameters without running the entire dataset through.

**rebrendi** · 11-09-2011, 06:07 AM

Still can't get them mapped. It's a good idea to use truncated files. I created test files with just 1000 first reads from the two paired-end files. Now I can play with Bowtie parameters.

The question is which parameters should I change? I already tried changing --fr/--rf/--ff, no help. What are the other possible options?

**ERG** · 11-09-2011, 06:33 AM

I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

Good luck!

**rebrendi** · 11-09-2011, 06:42 AM

Originally posted by ERG View Post

I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

Good luck!

I tried this, no help

**kmcarr** · 11-09-2011, 06:58 AM

Originally posted by rebrendi View Post

Hello,

I am mapping paired-end reads from two files per lane, with the following Bowtie command line:

./bowtie -t -v 2 -p 8 -m 1 --solexa-quals mm9 -1 filename1.fastq -2 filename2.fastq outputfilename.map

The program processed 200 million reads in 9 hours, but as a result only about 1% of them mapped. (I expect ~80% reads to be mapped for this experiment). It is very time-consuming to play with the Bowtie parameters for such large files, so I ask for your help.

Any ideas what goes wrong?

Thank you!

"--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

Increase -m to something > 1.

**cjp** · 11-09-2011, 06:59 AM

Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

Chris

**fkrueger** · 11-09-2011, 08:09 AM

It is probably indeed a matter of using the wrong quality settings and/or the -X paramter. The alignment summary in the end will tell you whether most reads got removed by the -m 1 parameter, but reducing alignments to 1% seems rather unrealistic.

Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.

**rebrendi** · 11-09-2011, 08:17 AM

Originally posted by kmcarr View Post

"--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

Increase -m to something > 1.

I tried mapping without "--solexa-quals". The same result
I tried Increase -m to 3 and to 10, This increased the number of mapped reads to 2% and 4% correspondingly. Still not too much help.

**rebrendi** · 11-09-2011, 08:19 AM

Originally posted by cjp View Post

Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

Chris

I tried changing -X/--maxins <int> , The same result.
I tried mapping the two files independently in the single-read mode: 75% and 71% mapped for each of the file. So the data seems OK, but the paired-end mapping still does not work.

**rebrendi** · 11-09-2011, 08:21 AM

Originally posted by fkrueger View Post

Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.

I have checked: the two files have exactly the same length.

**fkrueger** · 11-09-2011, 08:24 AM

Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?

**cjp** · 11-09-2011, 08:34 AM

Did you try other aligners such as BWA or Bowtie2. They are much better at pairing reads. Bowtie2 is easy to run and pretty quick too, but you'll need to reindex your genome.

example command:

bowtie2 -x /path/to/ref/hg19 -X 650 -p4 -1 r1.fq -2 r2.fq -S r12.bowtie2.sam

Chris

**rebrendi** · 11-09-2011, 08:35 AM

Originally posted by fkrueger View Post

Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?

Here are the first 4 lines of the first file:

@HWI-ST841:93

099JACXX:8:1101:1134:1866 1:N:0:
NGGTAAGTGAGAAAATCCCCCAAAGGAGACCAAGACNCTGTTTCCTGATGC
+
#1:ABBDDFCBDBEHHHHIGIIGEGEECFFGEC?BH#00B?D?BDFFEHG>
@HWI-ST841:93

099JACXX:8:1101:1117:1870 1:N:0:
NGACGCTGAGAGTTGTCATGCCTCGGTGNNNNNNNNNNNNNNNNNNNTGGC
+
#4:BBBDD?DDD+A@EIEIIIIIEFI;E#######################
@HWI-ST841:93

099JACXX:8:1101:1196:1879 1:N:0:
NGAAGGTCAACTTGATCCTGATTCAACTTTGGTACCTGGTATCTGTCCAGA
+
#1=DFFFFHHHHHJIJJJJJJJJJIJJJJJJJIIJJJJJJIIIJJJJJJHI
@HWI-ST841:93

099JACXX:8:1101:1236:1882 1:N:0:
NGGCAGGCAAGCTAACTGCTGCTGTGATGTTCAAGGCATGTGTTACCCATC

Here are the first 4 lines of the second file:

@HWI-ST841:93

099JACXX:8:1101:1134:1866 2:N:0:
AGCATCTGCGTCTCTGTTACTATTTTTCAGAATGAGGGAGGAATGGGATGG
+
@@@FDDADH?D<<CF+<A,A4,:AFHG########################
@HWI-ST841:93

099JACXX:8:1101:1117:1870 2:N:0:
AAGGGAGGAAGGTGTGTCACCAGCCTAAGTGAATGTGGACTGTGCTGTTTA
+
@?@FFBDDFFFHFHHIJBHIIGIDGH3:C?DGHDGGGIGEHGHGDGGFHG@
@HWI-ST841:93

099JACXX:8:1101:1196:1879 2:N:0:
AGATCCTGAAGAAATCCAAAACACCATCAGATCCTTCTACAAAAGGCTATA
+
CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJIJJJJJJJJJIJJIIIJJJJI
@HWI-ST841:93

099JACXX:8:1101:1236:1882 2:N:0:
AGGAGGAAGAAAGATTATAAAAGCTTTACAAAAGGTTCCGCCGTTGGAAGC

**rebrendi** · 11-09-2011, 08:38 AM

Originally posted by cjp View Post

Did you try other aligners such as BWA or Bowtie2.

I tried Eland, there were also the same problems. I did not try BWA or Bowtie2.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Paired-end Solexa data mapping wit Bowtie

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News