Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tuxedo suite / Parallel Processing

    Hello,
    I have a paired-end RNAseq data set for two treatment conditions without any replicates. I want to check isoform variation in a particular gene and gene expression variations in general. Two paired-end file for each sample has been broken down in to seven files as the data was generated. I want to run these data in parallel using tuxedo suit.
    The thing is I am not clear whether this tophat input command takes comma separated files as replicates or pieces of a single fastq file for two paired-end files.
    tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]
    And what would be the next steps in running tuxedo suite parallel ?
    Could anyone please help me.
    Thank you very much
    TPH

  • #2
    You may want to concatenate the files for each sample into one and then use the multiple threads option for tophat to achieve faster processing.

    Comment


    • #3
      Thank you very much. really appreciate your help

      Comment


      • #4
        I should clarify that you would want to concatenate all R1 pieces and all R2 pieces for each sample and then use resulting R1 and R2 files for tophat runs.

        Comment


        • #5
          Thanks again, I saw in a post it is not recommended to concatenate data but run in parallel instead. Its totally clear how concatenated data can use for the analysis, but I do not understand how parallel running for individual file works and downside of concatenating files. Do you have any idea about that? It would be a great help.

          Comment


          • #6
            There is many ways to skin a cat and you could certainly do this in parallel (as Pierre suggests in biostars thread) with original file pieces.

            You would want to take into consideration the amount of hardware resources you have available. If you are on a cluster with plenty of nodes/RAM by all means go for processing the individual pair chinks in parallel (with multiple threads). If you have limited hardware (i.e. single server) you may want to either run the chunk jobs serially (or combine and then run them as one). If you did the analysis in chunks then you would use cuffmerge to merge your results.

            Comment


            • #7
              I work in a cluster. I did the analysis by executing tophat command individually to each of the seven files with its paired file without any concatenation. I realized later the way I feed the data in was wrong because it took the data as seven different replicates. This is the way I wrote the command and I replicated it six more times.
              tophat -p 8 -o tophat_out -G $genomeSeq $genomeIndex R1_001.fastq R2_001.fastq
              If I want to process the data in parallel what would be the best way to put the data in? Could you please help me to figure out the correct the command for that?

              Comment


              • #8
                I assume you have 7 separate directories for the tophat output for the 7 files for each condition because of how you ran the analysis? You could merge the "accpeted_hits.bam" files for each condition into one as Pierre suggested in the other thread. What are you going to use for the downstream analysis, cuffdiff?

                Comment


                • #9
                  yea that's the output I have. So using "cat" command for the accepted_hits.bam files would work as same as concatenating starting fastq files. Thank you very much.
                  Yes, I am using Cuffdiff for the final step.

                  Comment


                  • #10
                    You can't concatenate BAM files with "cat", though you could with "samtools cat". I would strongly encourage you to "samtools merge" instead, though!

                    Comment


                    • #11
                      Thank you so much.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X