Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimise TopHat2 speed and results

    Hi!
    I have some questions about optimising the speed when using TopHat2 for mRNA analysis.

    As a testrun I used this command:

    tophat hg19 /path/to/x.fastq

    hg19 was manually indexes using Bowtie2.

    Here you see the Activity Monitor during the run:

    Click image for larger version

Name:	Screen Shot 2013-08-16 at 2.07.45 PM.png
Views:	1
Size:	48.1 KB
ID:	308215

    Click image for larger version

Name:	Screen Shot 2013-08-16 at 2.08.08 PM.png
Views:	1
Size:	59.7 KB
ID:	308216

    So, my questions:

    1. Is it normal to use so little of the total capacity?

    2. I have downloaded the iGenome hg19 as well as the not pre-built, the BT2 index files are not the same size as the ones I made myself, should I switch to using the pre-buildt hg19?

    3. What is the "AbundantSequences" included in the pre-built hg19 and how can I use it in my mRNA analysis?

    4. Is there something else I should note when working with mRNA analysis, like exclude all introns in hg19, or something?

    Thank you very much!!
    Last edited by sindrle; 08-16-2013, 04:38 AM.

  • #2
    1. You might tell tophat to use more threads. It'll take a lot longer to get results when using only a single thread.

    2. It doesn't much matter. I recall someone saying that the iGenomes indexes are missing somethings (perhaps it was mitochondria). When in doubt, always make your own (it doesn't take that long).

    3. Abundant species would be things like rRNAs, that are often present in VERY high amounts but not of interest. These are useful to give to cufflinks or similar to tell it to ignore those areas of the genome.

    4. For most common uses, the defaults are fine (just give it more threads).
    Last edited by dpryan; 08-16-2013, 04:54 AM. Reason: formatting

    Comment


    • #3
      Thank you very much for fast answers!

      Do you have a link describing thread settings for TopHat2?

      Also, one final question, when downloaded pre-built hg19 all annotations are included, how may I use this with Tophat2?

      Comment


      • #4
        Originally posted by sindrle View Post
        Do you have a link describing thread settings for TopHat2?

        Also, one final question, when downloaded pre-built hg19 all annotations are included, how may I use this with Tophat2?
        From the TopHat manual:

        -p/--num-threads <int> Use this many threads to align reads. The default is 1.
        Start with 2 or 4 threads. It does not mean that even if your computer has 8 "cores" you will get an equivalent speedup since disk read speeds start being the limiting factor.

        If you happen to get pre-built indexes from the iGenomes site then you can specify the location of the Bowtie2Index's while running TopHat by specifying the path to the "genome.fa" file in the index folder.

        Comment


        • #5
          Hi again!
          Thank you for input. I aborted the first run, but before Im running the second test, can you control that this code is correct?

          I want to use 8 threads: -p 8
          I want use the genes.gtf included in the iGenomes hg19: -G path/to/genes.gtf
          Since I use the -G, I also need: transcriptome-index=transcriptome_data
          I dont want coverage-search: --no-coverage-search
          My hg19 genome and Bowtie2 indexes (genome.fa and .bt2 files) is in my PATH (usr/bin/indexes) as aliases, does this work?
          Finally I have my fastq I want to analyse.

          So is this correct?

          tophat2 -p 8 -G
          /path/to/genes.gtf --transcriptome-index=transcriptome_data --no-coverage-search *
          genome /path/to/x.fastq

          "path/to" is just for simplicity.

          Im also curious about the "*" after all the option codes. Also I wonder where the "transcriptome_data" folder will be created.
          Last edited by sindrle; 08-17-2013, 01:09 PM.

          Comment


          • #6
            The "transcriptome_data" folder will be created wherever you specify after "--transcriptome-index=". I have no clue what you're trying to achieve with the random asterisk, but I suspect it won't do whatever it is that you want. Aside from that, it should work.

            Comment


            • #7
              I cant get the -G and --transcriptome-index=
              transcriptome_data/know option to work.

              Also I have to cd to where my bowtie2 indexes are to type the tophat2 command even though I have the indexes in my PATH.
              Last edited by sindrle; 08-17-2013, 03:20 PM.

              Comment


              • #8
                Having the indexes in your path won't work since they are not executable files. You will be better off providing full paths for them. There is no harm in providing full file paths (e.g. for tophat2 executable, indexes, output directories) in your command lines.

                Check the detailed tutorial at the end of this article for examples of various command lines for the TopHat/Cufflinks suite of programs: http://www.nature.com/nprot/journal/....2012.016.html

                Comment


                • #9
                  Thanks! I fiddled around for some hours yesterday and it all works like a charm now!

                  Next step is to run Cufflinks2.


                  Thank you everyone!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Investigating the Gut Microbiome Through Diet and Spatial Biology
                    by seqadmin




                    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                    02-24-2025, 06:31 AM
                  • seqadmin
                    Quality Control Essentials for Next-Generation Sequencing Workflows
                    by seqadmin




                    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                    Nucleic Acid Quality Control
                    Preparing for NGS starts with isolating the...
                    02-10-2025, 01:58 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  154 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-28-2025, 12:58 PM
                  0 responses
                  238 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-24-2025, 02:48 PM
                  0 responses
                  607 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-21-2025, 02:46 PM
                  0 responses
                  263 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X