![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
BWA align SOLiD PE data gives poor mapping & 0.5% properly paired | alig | Bioinformatics | 3 | 07-08-2011 09:44 AM |
Help me!!!!! low % of properly paired reads!!! | Trudy | Bioinformatics | 1 | 05-25-2011 12:26 AM |
How to check if reads are properly paired in mate-pair data? | genepool_bee | Bioinformatics | 2 | 02-22-2011 02:07 AM |
properly paired reads in TopHat output | shurjo | Bioinformatics | 2 | 12-02-2010 12:35 PM |
How to check read are properly paired from SAM file? | jlfmssm | Bioinformatics | 5 | 06-25-2010 11:23 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
I ran TopHat on a SOLiD paired-end dataset using the following command:
Code:
tophat --color --quals --library-type fr-secondstrand -r 125 -p 7 -o /home/me/data/raw_reads/bowtie_out/sample_secstrand -G /home/me/data/gtf_ref/Rattus_norvegicus.RGSC3.4.59.tophat.gtf rn4_c /home/me/data/raw_reads/for_bowtie/sample/sample_1.csfasta /home/me/data/raw_reads/for_bowtie/sample/sample_2.csfasta /home/me/data/raw_reads/for_bowtie/sample/sample_nh_1.qual /home/me/data/raw_reads/for_bowtie/sample/sample_nh_2.qual Code:
samtools flagstat accepted_hits.bam 18837270 in total 0 QC failure 0 duplicates 18837270 mapped (100.00%) 18837270 paired in sequencing 5917586 read1 12919684 read2 4926 properly paired (0.03%) 565148 with itself and mate mapped 18272122 singletons (97.00%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) What is the distinction between a paired read and a 'properly paired' read, as designated in the flag field, and why don't these match up for my reads? Will this affect any of my downstream analysis? (using Tophat 1.2.0, Bowtie 0.12.7) Last edited by AdamB; 01-26-2011 at 04:10 AM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
How do you define the inner-mate-distance? I think the properly paired is based on the data you provide so (r- 125 with default STD of 20bp) if the inner mate distance is really 180bp with STD of 32 bp few of your reads will be called as "properly paired" as their true inner-mate-distance is outside of the mean/STD range you provided. I've been using a 5 million read subset against a transcriptome reference with BWA to get a data based range I feed into tophat.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
I have previously mapped the reads using Bioscope (ABI), and that gave me the mate pair distance stats. I just tried TopHat using 125±50 bp, which I thought more accurately reflected the spread, but the number of 'properly paired' reads is almost the same. From my Bioscope mapping stats, there are approximately 4 millions reads with a mate pair distance of 75-175 bp, yet only 5,602 (0.14%) are 'properly paired'. It seems there is a glaring error somewhere?
Last edited by AdamB; 01-27-2011 at 03:14 AM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
Also if anyone has tips on how to extract a subset of matching F3 and F5 reads from .csfasta and .qual files, please let me know.
|
![]() |
![]() |
![]() |
#5 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
If anyone has an answer to this question it would be much appreciated, thanks.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
An update:
I realised there are not the same number of F3 reads as F5-BC reads: F3 = 23,077,379 F5-BC = 26,929,508 So, for a sample of 250,000 reads, I extracted those reads for which there was both F3 and F5-BC present, and mapped with TopHat. 250,000 reads > samtools flagstat Code:
186066 in total 0 QC failure 0 duplicates 186066 mapped (100.00%) 186066 paired in sequencing 70366 read1 115700 read2 96 properly paired (0.05%) 7190 with itself and mate mapped 178876 singletons (96.14%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) Code:
212529 in total 0 QC failure 0 duplicates 212529 mapped (100.00%) 212529 paired in sequencing 85982 read1 126547 read2 55096 properly paired (25.92%) 69488 with itself and mate mapped 143041 singletons (67.30%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) |
![]() |
![]() |
![]() |
#7 | |
Member
Location: Beijing Join Date: Jul 2011
Posts: 74
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Hong Kong Join Date: Dec 2008
Posts: 350
|
![]()
Just random thought,
Is it transcriptome reads? How divergent is your genome reference to the sample you used for SOLiD? Is the ABI inner mate distance the externa or internall insert size? Top hat defines inser size as the inner part encompassed by 2 reads.
__________________
Marco |
![]() |
![]() |
![]() |
#9 | |
Member
Location: Beijing Join Date: Jul 2011
Posts: 74
|
![]() Quote:
the detail of my problem is here, http://seqanswers.com/forums/showthread.php?t=13407, Do you have any suggestion. |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: philly Join Date: Mar 2012
Posts: 2
|
![]()
you exp is strand specific rna-seq since you use --library-type?..
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|