SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What's the best RNA Seq Tuxedo suite tutorial/streamlined intro protocol? birne412 RNA Sequencing 9 01-23-2015 08:51 AM
Advice Needed on Tuxedo Suite thickrick99 RNA Sequencing 2 08-13-2014 09:22 AM
Tuxedo suite not giving alternative TSSs their own loci Gordo2B Bioinformatics 0 07-07-2014 04:42 AM
RNA-Seq: tuxedo suite transcription start site identification zillerm Bioinformatics 2 07-05-2013 08:58 PM
Parallel Processing for Sequence Analysis jperin Bioinformatics 5 02-05-2009 05:48 AM

Reply
 
Thread Tools
Old 01-28-2016, 08:09 AM   #1
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Post Tuxedo suite / Parallel Processing

Hello,
I have a paired-end RNAseq data set for two treatment conditions without any replicates. I want to check isoform variation in a particular gene and gene expression variations in general. Two paired-end file for each sample has been broken down in to seven files as the data was generated. I want to run these data in parallel using tuxedo suit.
The thing is I am not clear whether this tophat input command takes comma separated files as replicates or pieces of a single fastq file for two paired-end files.
tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]
And what would be the next steps in running tuxedo suite parallel ?
Could anyone please help me.
Thank you very much
TPH
TPH is offline   Reply With Quote
Old 01-28-2016, 08:16 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

You may want to concatenate the files for each sample into one and then use the multiple threads option for tophat to achieve faster processing.
GenoMax is offline   Reply With Quote
Old 01-28-2016, 09:24 AM   #3
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Default

Thank you very much. really appreciate your help
TPH is offline   Reply With Quote
Old 01-28-2016, 09:35 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

I should clarify that you would want to concatenate all R1 pieces and all R2 pieces for each sample and then use resulting R1 and R2 files for tophat runs.
GenoMax is offline   Reply With Quote
Old 01-28-2016, 09:44 AM   #5
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Default

Thanks again, I saw in a post it is not recommended to concatenate data but run in parallel instead. Its totally clear how concatenated data can use for the analysis, but I do not understand how parallel running for individual file works and downside of concatenating files. Do you have any idea about that? It would be a great help.
TPH is offline   Reply With Quote
Old 01-28-2016, 09:58 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

There is many ways to skin a cat and you could certainly do this in parallel (as Pierre suggests in biostars thread) with original file pieces.

You would want to take into consideration the amount of hardware resources you have available. If you are on a cluster with plenty of nodes/RAM by all means go for processing the individual pair chinks in parallel (with multiple threads). If you have limited hardware (i.e. single server) you may want to either run the chunk jobs serially (or combine and then run them as one). If you did the analysis in chunks then you would use cuffmerge to merge your results.
GenoMax is offline   Reply With Quote
Old 01-28-2016, 10:13 AM   #7
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Default

I work in a cluster. I did the analysis by executing tophat command individually to each of the seven files with its paired file without any concatenation. I realized later the way I feed the data in was wrong because it took the data as seven different replicates. This is the way I wrote the command and I replicated it six more times.
tophat -p 8 -o tophat_out -G $genomeSeq $genomeIndex R1_001.fastq R2_001.fastq
If I want to process the data in parallel what would be the best way to put the data in? Could you please help me to figure out the correct the command for that?
TPH is offline   Reply With Quote
Old 01-28-2016, 11:31 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,881
Default

I assume you have 7 separate directories for the tophat output for the 7 files for each condition because of how you ran the analysis? You could merge the "accpeted_hits.bam" files for each condition into one as Pierre suggested in the other thread. What are you going to use for the downstream analysis, cuffdiff?
GenoMax is offline   Reply With Quote
Old 01-28-2016, 11:57 AM   #9
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Default

yea that's the output I have. So using "cat" command for the accepted_hits.bam files would work as same as concatenating starting fastq files. Thank you very much.
Yes, I am using Cuffdiff for the final step.
TPH is offline   Reply With Quote
Old 01-28-2016, 12:57 PM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You can't concatenate BAM files with "cat", though you could with "samtools cat". I would strongly encourage you to "samtools merge" instead, though!
dpryan is offline   Reply With Quote
Old 01-28-2016, 12:59 PM   #11
TPH
Member
 
Location: USA

Join Date: Jan 2016
Posts: 19
Default

Thank you so much.
TPH is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO