Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimise TopHat2 speed and results

    Hi!
    I have some questions about optimising the speed when using TopHat2 for mRNA analysis.

    As a testrun I used this command:

    tophat hg19 /path/to/x.fastq

    hg19 was manually indexes using Bowtie2.

    Here you see the Activity Monitor during the run:

    Click image for larger version

Name:	Screen Shot 2013-08-16 at 2.07.45 PM.png
Views:	1
Size:	48.1 KB
ID:	308215

    Click image for larger version

Name:	Screen Shot 2013-08-16 at 2.08.08 PM.png
Views:	1
Size:	59.7 KB
ID:	308216

    So, my questions:

    1. Is it normal to use so little of the total capacity?

    2. I have downloaded the iGenome hg19 as well as the not pre-built, the BT2 index files are not the same size as the ones I made myself, should I switch to using the pre-buildt hg19?

    3. What is the "AbundantSequences" included in the pre-built hg19 and how can I use it in my mRNA analysis?

    4. Is there something else I should note when working with mRNA analysis, like exclude all introns in hg19, or something?

    Thank you very much!!
    Last edited by sindrle; 08-16-2013, 04:38 AM.

  • #2
    1. You might tell tophat to use more threads. It'll take a lot longer to get results when using only a single thread.

    2. It doesn't much matter. I recall someone saying that the iGenomes indexes are missing somethings (perhaps it was mitochondria). When in doubt, always make your own (it doesn't take that long).

    3. Abundant species would be things like rRNAs, that are often present in VERY high amounts but not of interest. These are useful to give to cufflinks or similar to tell it to ignore those areas of the genome.

    4. For most common uses, the defaults are fine (just give it more threads).
    Last edited by dpryan; 08-16-2013, 04:54 AM. Reason: formatting

    Comment


    • #3
      Thank you very much for fast answers!

      Do you have a link describing thread settings for TopHat2?

      Also, one final question, when downloaded pre-built hg19 all annotations are included, how may I use this with Tophat2?

      Comment


      • #4
        Originally posted by sindrle View Post
        Do you have a link describing thread settings for TopHat2?

        Also, one final question, when downloaded pre-built hg19 all annotations are included, how may I use this with Tophat2?
        From the TopHat manual:

        -p/--num-threads <int> Use this many threads to align reads. The default is 1.
        Start with 2 or 4 threads. It does not mean that even if your computer has 8 "cores" you will get an equivalent speedup since disk read speeds start being the limiting factor.

        If you happen to get pre-built indexes from the iGenomes site then you can specify the location of the Bowtie2Index's while running TopHat by specifying the path to the "genome.fa" file in the index folder.

        Comment


        • #5
          Hi again!
          Thank you for input. I aborted the first run, but before Im running the second test, can you control that this code is correct?

          I want to use 8 threads: -p 8
          I want use the genes.gtf included in the iGenomes hg19: -G path/to/genes.gtf
          Since I use the -G, I also need: transcriptome-index=transcriptome_data
          I dont want coverage-search: --no-coverage-search
          My hg19 genome and Bowtie2 indexes (genome.fa and .bt2 files) is in my PATH (usr/bin/indexes) as aliases, does this work?
          Finally I have my fastq I want to analyse.

          So is this correct?

          tophat2 -p 8 -G
          /path/to/genes.gtf --transcriptome-index=transcriptome_data --no-coverage-search *
          genome /path/to/x.fastq

          "path/to" is just for simplicity.

          Im also curious about the "*" after all the option codes. Also I wonder where the "transcriptome_data" folder will be created.
          Last edited by sindrle; 08-17-2013, 01:09 PM.

          Comment


          • #6
            The "transcriptome_data" folder will be created wherever you specify after "--transcriptome-index=". I have no clue what you're trying to achieve with the random asterisk, but I suspect it won't do whatever it is that you want. Aside from that, it should work.

            Comment


            • #7
              I cant get the -G and --transcriptome-index=
              transcriptome_data/know option to work.

              Also I have to cd to where my bowtie2 indexes are to type the tophat2 command even though I have the indexes in my PATH.
              Last edited by sindrle; 08-17-2013, 03:20 PM.

              Comment


              • #8
                Having the indexes in your path won't work since they are not executable files. You will be better off providing full paths for them. There is no harm in providing full file paths (e.g. for tophat2 executable, indexes, output directories) in your command lines.

                Check the detailed tutorial at the end of this article for examples of various command lines for the TopHat/Cufflinks suite of programs: http://www.nature.com/nprot/journal/....2012.016.html

                Comment


                • #9
                  Thanks! I fiddled around for some hours yesterday and it all works like a charm now!

                  Next step is to run Cufflinks2.


                  Thank you everyone!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  31 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X