![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. | rebrendi | Bioinformatics | 11 | 06-22-2016 10:55 AM |
Question to combine Bowtie output with Tophat's -- impact on Cufflinks FPKM values | berath | Bioinformatics | 0 | 04-21-2011 09:38 AM |
tophat/bowtie/cufflinks: restart, continue from interruption ? | cs | Bioinformatics | 1 | 10-15-2010 06:48 AM |
bowtie and cufflinks | damiankao | Bioinformatics | 0 | 04-22-2010 04:34 AM |
Bowtie and Cufflinks | DrD2009 | Bioinformatics | 8 | 03-01-2010 12:52 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Ljubljana Join Date: Jun 2010
Posts: 11
|
![]()
Dear colleagues,
i am building a pipeline to estimate gene abundance (expression) from RNA-seq data. I am wondering if my plan is reasonable: a) map reads with bowtie using -m 10 (for example), allowing 10 multiple hits per read a1) here i don't understand how the mapq values will be set in the SAM format, i understand that with allowing only single hits (-m 1) all mapq values will be 255 b) take only unmapped reads from a) for mapping with tophat b1) again same question, with -g 40 (default), what are the mapq values in the SAM result? OK, now i have alignments and i also have a GTF file for my organism (my.gtf) c) join alignments from step a) and b) into one sorted SAM file (a_b.sam) d) cufflinks -G my.gtf a_b.sam * how will cufflinks take into account the mapq values from the SAM and by doing so "weight" the multiple hits (giving more meaning to single hits etc.)? * why is cufflinks mentioned with tophat all the time and not with bowtie also? thank you for your answers, Gregor |
![]() |
![]() |
![]() |
#2 |
Member
Location: Ljubljana Join Date: Jun 2010
Posts: 11
|
![]()
I am looking at accepted_hits.bam (output from TopHat):
read_id_0 16 1 2803 0 36M * 0 0 ACACATACACTGCGCTATTAAACAAGACACTTGTAC ffdfffefdfefffffffffffffffffffffffff NM:i:0 NH:i:14 CC:Z:= CP:i:7210 Are in this file only alignments that mapped to splice-sites? How to know how the read was spliced? (both locations of mapping) Perhaps from the last part (SAM TAGS?): NM:i:0 NH:i:14 CC:Z:= CP:i:7210 tnx, Gregor |
![]() |
![]() |
![]() |
#3 |
Member
Location: illinois Join Date: Aug 2010
Posts: 10
|
![]()
Hi Gregor,
take a look at this file:http://samtools.sourceforge.net/SAM1.pdf tophat print both splided and non spliced alignemnts. In this case you do not have a splice (36M) You will see a spliced sequence as XXMXXIXXM. In this case X are the number of bases that Matched on one exon, number of bases from the intron, and number of bases that matched on the other exon. I hope it helps. Fernando In the item 2.2.3 of that file you have: 2.2.3. Extended CIGAR format A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing: op Description M Alignment match (can be a sequence match or mismatch) I Insertion to the reference D Deletion from the reference N Skipped region from the reference S Soft clip on the read (clipped sequence present in <seq>) H Hard clip on the read (clipped sequence NOT present in <seq>) P Padding (silent deletion from the padded reference sequence |
![]() |
![]() |
![]() |
#4 |
Member
Location: Berkeley, cA Join Date: Feb 2010
Posts: 40
|
![]()
Greg, to answer your last question: tophat uses bowtie as the engine for its read -> genome mapping as part of the algorithm for finding spliced reads. Cufflinks in turn can use the tophat alignments. The programs are modular so that you can run Cufflinks using (spliced) read alignments made with other programs.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|