SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 10:55 AM
Question to combine Bowtie output with Tophat's -- impact on Cufflinks FPKM values berath Bioinformatics 0 04-21-2011 09:38 AM
tophat/bowtie/cufflinks: restart, continue from interruption ? cs Bioinformatics 1 10-15-2010 06:48 AM
bowtie and cufflinks damiankao Bioinformatics 0 04-22-2010 04:34 AM
Bowtie and Cufflinks DrD2009 Bioinformatics 8 03-01-2010 12:52 PM

Reply
 
Thread Tools
Old 11-11-2010, 05:35 AM   #1
rgregor
Member
 
Location: Ljubljana

Join Date: Jun 2010
Posts: 11
Default Abundande with bowtie, tophat and cufflinks

Dear colleagues,

i am building a pipeline to estimate gene abundance (expression) from RNA-seq data. I am wondering if my plan is reasonable:

a) map reads with bowtie using -m 10 (for example), allowing 10 multiple hits per read
a1) here i don't understand how the mapq values will be set in the SAM format, i understand that with allowing only single hits (-m 1) all mapq values will be 255

b) take only unmapped reads from a) for mapping with tophat
b1) again same question, with -g 40 (default), what are the mapq values in the SAM result?

OK, now i have alignments and i also have a GTF file for my organism (my.gtf)

c) join alignments from step a) and b) into one sorted SAM file (a_b.sam)

d) cufflinks -G my.gtf a_b.sam

* how will cufflinks take into account the mapq values from the SAM and by doing so "weight" the multiple hits (giving more meaning to single hits etc.)?

* why is cufflinks mentioned with tophat all the time and not with bowtie also?

thank you for your answers,
Gregor
rgregor is offline   Reply With Quote
Old 11-12-2010, 12:48 AM   #2
rgregor
Member
 
Location: Ljubljana

Join Date: Jun 2010
Posts: 11
Default

I am looking at accepted_hits.bam (output from TopHat):

read_id_0 16 1 2803 0 36M * 0 0 ACACATACACTGCGCTATTAAACAAGACACTTGTAC ffdfffefdfefffffffffffffffffffffffff NM:i:0 NH:i:14 CC:Z:= CP:i:7210

Are in this file only alignments that mapped to splice-sites? How to know how the read was spliced? (both locations of mapping)

Perhaps from the last part (SAM TAGS?): NM:i:0 NH:i:14 CC:Z:= CP:i:7210

tnx,
Gregor
rgregor is offline   Reply With Quote
Old 11-14-2010, 08:11 AM   #3
fhb
Member
 
Location: illinois

Join Date: Aug 2010
Posts: 10
Default

Hi Gregor,
take a look at this file:http://samtools.sourceforge.net/SAM1.pdf

tophat print both splided and non spliced alignemnts. In this case you do not have a splice (36M)

You will see a spliced sequence as XXMXXIXXM. In this case X are the number of bases that Matched on one exon, number of bases from the intron, and number of bases that matched on the other exon.

I hope it helps.
Fernando


In the item 2.2.3 of that file you have:

2.2.3. Extended CIGAR format
A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows
for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format
further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing:
op Description
M Alignment match (can be a sequence match or mismatch)
I Insertion to the reference
D Deletion from the reference
N Skipped region from the reference
S Soft clip on the read (clipped sequence present in <seq>)
H Hard clip on the read (clipped sequence NOT present in <seq>)
P Padding (silent deletion from the padded reference sequence
fhb is offline   Reply With Quote
Old 11-15-2010, 09:08 AM   #4
lpachter
Member
 
Location: Berkeley, cA

Join Date: Feb 2010
Posts: 40
Default

Greg, to answer your last question: tophat uses bowtie as the engine for its read -> genome mapping as part of the algorithm for finding spliced reads. Cufflinks in turn can use the tophat alignments. The programs are modular so that you can run Cufflinks using (spliced) read alignments made with other programs.
lpachter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO