![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Strange Bioanalyzer results | wzombie | RNA Sequencing | 5 | 01-11-2012 04:24 AM |
strange results of samtools | liying | Bioinformatics | 3 | 09-23-2011 12:02 AM |
BWA align SOLiD PE data gives poor mapping & 0.5% properly paired | alig | Bioinformatics | 3 | 07-08-2011 09:44 AM |
Strange DE results | SMcTaggart | Bioinformatics | 0 | 11-25-2010 05:53 AM |
Strange Results After maq2sam-long | wjeck | Bioinformatics | 4 | 08-13-2009 11:58 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: United States, TN Join Date: Jul 2010
Posts: 15
|
![]()
Hi,
I used bwa to map a pair-end SOLiD data. samtools flagstat showed 66% mapped reads, but only 0.17% properlly mapped. Here are some detail and my cmd lines: my input file: SOLID_F3.csfasta + SOLID_F3_QV.qual SOLID_R3.csfasta + SOLID_R3_QV.qual (1) make color-space reference: bwa index -p hg18_cs.fa -a bwtsw -c hg18.fa (2) convert solid to fastq: perl solid2fastq.pl SOLID_ test this resulted test.read1.fastq.gz test.read2.fastq.gz test.single.fastq.gz unzip them (3) bwa aln -t 4 -c hg18_cs.fa test.read1.fastq > test_aln_sa1.sai bwa aln -t 4 -c hg18_cs.fa test.read2.fastq > test_aln_sa2.sai (4) bwa sampe hg18_cs.fa test_aln_sa1.sai test_aln_sa2.sai test.read1.fastq test.read2.fastq > aln.sam (5) samtools view -bt hg18_cs.fa -o aln.bam aln.sam [samopen] SAM header is present: 25 sequences. (6) $ samtools flagstat aln.bam The resutl shows mapped (66.64%), properly paired (0.15%) Can anyone tell me where I did wrong? Thanks! |
![]() |
![]() |
![]() |
#2 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
Are you using the paired end or mate pair protocol? Was the insert size estimate from BWA around what you expect?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: United States, TN Join Date: Jul 2010
Posts: 15
|
![]()
Hi Nil,
The data is paired end, one is 50bp while the other is 25bp. The original file name is actually F3 and F5-P2: I changed F5-P2 to R3 to make it recognizable by solid2fastq.pl provided in bwa. The insert size looks fine for me. Here are some example lines (when using ): [infer_isize] (25, 50, 75) percentile: (150, 160, 173) [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std [infer_isize] inferred external isize from 100442 pairs: 161.585 +/- 17.497 [infer_isize] skewness: 0.147; kurtosis: 0.337; ap_prior: 1.00e-05 [infer_isize] inferred maximum insert size: 285 (7.05 sigma) [bwa_sai2sam_pe_core] time elapses: 22.68 sec [bwa_sai2sam_pe_core] changing coordinates of 73 alignments. [bwa_sai2sam_pe_core] align unmapped mate... [bwa_paired_sw] 20 out of 79202 Q17 singletons are mated. [bwa_paired_sw] 316 out of 125430 Q17 discordant pairs are fixed. [bwa_sai2sam_pe_core] time elapses: 14.42 sec [bwa_sai2sam_pe_core] refine gapped alignments... 4.12 sec [bwa_sai2sam_pe_core] print alignments... 1.31 sec [bwa_sai2sam_pe_core] 89915392 sequences have been processed. [bwa_sai2sam_pe_core] convert to sequence coordinate... [infer_isize] (25, 50, 75) percentile: (150, 160, 173) [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std [infer_isize] inferred external isize from 97115 pairs: 161.534 +/- 17.614 [infer_isize] skewness: 0.137; kurtosis: 0.339; ap_prior: 1.00e-05 [infer_isize] inferred maximum insert size: 286 (7.05 sigma) |
![]() |
![]() |
![]() |
#4 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
Check out what Nils mentioned to confirm if that's the reason. Let us know what
you find. Also, I'd like to mention that BFast has a development branch called bfast2. It allows you to perform the mapping step (match) using bwa's algorithm. After that you'd perform the normal bfast process (localalign, postprocess). In addition, bfast2 has also a hybrid mode that allows you to perform matches using different algorithms on each end of your reads. This is particularly useful for PE(50+25) since bfast match does not perform well with 25bp reads.
__________________
-drd |
![]() |
![]() |
![]() |
#6 |
Member
Location: United States, TN Join Date: Jul 2010
Posts: 15
|
![]()
Thank you, Nils and Drio.
I searched online and got the following information: As explained in this file: http://www3.appliedbiosystems.com/cm...cms_080186.pdf my case should be the second one, where F3 tag is forward and F5-P2 tag is reverse. However, I saw this post in the BWA mailing list: http://sourceforge.net/mailarchive/f...e=bio-bwa-help In Heng's reply, it's said "For Illumina reads, only Forward-reverse pairs are set bit 2; for SOLiD reads, R3-forward-F3-forward or F3-reverse-R3-reverse pairs are set bit 2." So I'm guessing my case is: it is solid data but it's forward-reverse, so bwa correctly mapped most of them (66%) but failed to recognize the reads as perporly paired -- this is just my guessing. I would greatly appreciate if anyone can help me figure this out. |
![]() |
![]() |
![]() |
#7 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: UK Join Date: Aug 2010
Posts: 2
|
![]()
Hi All,
Can anybody tell me how to run the script solid2fastq.pl. I have a cs.fasta file, I have to calls the SNPs and want to use MAQ for that. However I need the files in fastq. Using BWA I did get a colour space index. But after that if i try to run perl solid2fastq.pl file.sc.fasta out.cs.fa it is not working the cs.fasta is a F3 file. It tell me it cannot open the file. Many thanks |
![]() |
![]() |
![]() |
#9 | |
Member
Location: United States, TN Join Date: Jul 2010
Posts: 15
|
![]() Quote:
http://seqanswers.com/forums/showthread.php?t=5909 drio's answer. |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: UK Join Date: Aug 2010
Posts: 2
|
![]()
ok, thanks, I will try.... but tell me what does this instruction means
<in.title> is the string showed in the `# Title:' line of a ".csfasta" read file. Then <in.title>F3.csfasta is read sequence file and <in.title>F3_QV.qual is the quality file. If <in.title>R3.csfasta is present, this script assumes reads are paired; otherwise reads will be regarded as single-end. The read name will be <out.prefix> ![]() tag and `2' for F3. Usually you may want to use short <out.prefix> I want to know what is it actually asking ? Many Thanks for the link |
![]() |
![]() |
![]() |
#11 |
Member
Location: Cambridge, UK Join Date: Mar 2010
Posts: 21
|
![]()
I have posted a patch plus an update perl script to the bwa malinglist so bwa now can handle PE data as well as MP.
|
![]() |
![]() |
![]() |
#12 |
Member
Location: NYC Join Date: Mar 2009
Posts: 14
|
![]()
Hi Brugger,
I saw your mail in the bwa mailing list about your patch. Do you know if the later version of BWA incorporates this? What I have installed is 0.5.9-r16 |
![]() |
![]() |
![]() |
Thread Tools | |
|
|