![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Anyone know how Snowshoes-FTD finds junction-spanning reads? | thondeboer | Bioinformatics | 2 | 09-05-2013 12:55 AM |
counting junction reads in TopHat | schaffer | Bioinformatics | 3 | 10-21-2011 02:05 AM |
tophat does not find any junction | sbrohee | Bioinformatics | 12 | 10-12-2011 09:07 AM |
Tophat Junction reads are very low comapred to ERANGE | repinementer | Bioinformatics | 1 | 08-11-2010 03:41 AM |
Genomic reads and Junction reads-Tophat | repinementer | Bioinformatics | 0 | 08-11-2010 12:05 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Finland Join Date: Jun 2011
Posts: 24
|
![]()
Perhaps I missed something, but is there a straightforward way to extract reads which span an exon/exon junction (ie, spliced reads) from the accepted_hits.bam produced by Tophat? I thought I could filter to find reads that have an alignment distance (alignment start - alignment end) greater than the read length, but the .bam file only lists the start position. Looking at the .bam file and the SAM specification, I didn't find a flag that is set which I could filter on.
Am I missing something? Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston Join Date: Nov 2009
Posts: 224
|
![]()
You can use bamtools filter to find reads with an N in the CIGAR string.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Finland Join Date: Jun 2011
Posts: 24
|
![]()
Thanks, that's just what I was looking for. Sorry for not looking more carefully at the specification -- I checked the FLAGs, but not the CIGAR string.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Uppsala University, Sweden Join Date: Sep 2010
Posts: 8
|
![]()
Hmm, is it just me, or does the extracted reads (with N:s in the CIGAR) not match the junctions.bed in the tophat results? When i look at the positions from junctions.bed in the BAM file with IGV genome viewer, i cannot see the 400+ reads supposedly supporting that splice junction.
I have scrolled all the way down in IGV to see all the reads, but nowhere can i see anything that even remotely matches any of the junctions reported in the bed file. Python code: Picks out all reads from chr4 that contains N:s (3 since it is a BAM file) in the CIGAR. Should be working just fine. Using pysam library. Code:
. . . # get all the reads from the specified chromosome for read in bamFile.fetch("4"): # check if there is a splice junction in the read spliced = re.search('\(3, \d+\)',str(read.cigar)) if spliced: # write to outfile out.write(read) . . . Best regards Martin Last edited by dahlo; 09-27-2011 at 07:36 AM. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Marburg, Germany Join Date: Oct 2009
Posts: 110
|
![]()
I don't quite follow your logic. If I read the spec correct, the cigar string should look like 15M230N19M (15 matched, 230 intron bp, 19 matched).
But your regexps matches '(3, 34)' for example. I guess you could just check with 'in': if 'N' in read.cigar: output.write(read) |
![]() |
![]() |
![]() |
Thread Tools | |
|
|