Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by greggrant View Post
    Tophat doesn't look like it will work for me so I put together a plan to do it another way using bowtie. Then I found out that on my mac pro, I am getting an alignment speed of approximately 60 reads/hour. That's significantly less than the 25,000,000/hour that I expected. This run of one sequence against the m_musculus index that I downloaded from the bowtie site takes about 45 seconds. Here's the command I used. Any idea why it would take so long? I rebooted to make sure there was nothing else taxing the resources.

    bowtie -c /Applications/bowtie-0.10.0/indexes/m_musculus GAAAGTCATGCGTTTCAAGTTTGGCAAGGAATAGAAACAGACGGGCTTATGAAAATAAGGAAAACATCACCCCCAGGCG

    That sequence should have no spaces, I don't know why the forum inserts a space in the middle of it...

    Thanks in advance for any suggestions.
    Wait, that's only one read. Are you aligning all of them that way, individually on the command line?

    Comment


    • #32
      Originally posted by Cole Trapnell View Post
      Wait, that's only one read. Are you aligning all of them that way, individually on the command line?
      No I wasn't, but I think I figured it out, it takes a long time to get up and running, but once it does then it moves fast. So speed can best be tested by comparing the time between N and M reads, to subtract for start-up time. Thanks for responding so quickly.

      Comment


      • #33
        Bowtie works great, it's incredible that illumina's bundled aligner is still so poor. But is it possible to use bowtie to map also to the transcriptome? Or is that best done another way? Thanks for your help!

        Comment


        • #34
          454 support?

          Trimming all of the sequences to the same length seems like a bad idea. Any idea when tophat / bowtie will support 454 variable length reads?

          Greg -- how did you take your mouse reads and get them to be a uniform length? Straight trimming? Segmenting?

          Thanks,
          Anand

          Comment


          • #35
            Bowtie already supports variable length reads. It's TopHat that has a problem with them. Fixing TopHat for 454 would require more work than I have time to do in the near future, as I am focused primarily on Cufflinks (and graduating). It's on the list of things to do, but so are many other things...

            One obstacle to me making the changes is that I don't currently have any 454 RNA-Seq data to work with, as none of my collaborators use it. If someone was willing to provide me with a small test set (perhaps a chromosome's worth), I could at least assess how much work it will actually be to add support for this. I'd keep it confidential, of course.

            Comment


            • #36
              how much data?

              Cole,

              How much data is a "chromosome's worth"? (I ask only because some chromosomes are huge, and others not so huge).

              I've got 9 reads, 312-385 average read length, 110-170 mbases per read.

              How much of that would you like? Fastq? SFF?

              Thanks,
              Anand

              Comment


              • #37
                Tophat Coverage Plots

                I have a question regarding the coverage files output by tophat. What exactly is required for a read to be include in a coverage map? If Tophat breaks up a read into four pieces and only one maps, is that piece included? Also, how are reads that map to multiple locations handled? Any information is greatly appreciated, thank you.

                Comment


                • #38
                  Originally posted by greggrant View Post
                  I have a question regarding the coverage files output by tophat. What exactly is required for a read to be include in a coverage map? If Tophat breaks up a read into four pieces and only one maps, is that piece included? Also, how are reads that map to multiple locations handled? Any information is greatly appreciated, thank you.
                  Never mind, I figured it out. If a read maps to two locations, Tophat keeps both and includes both in the coverage plot. I think that fact should be documented somewhere in the manual.

                  Comment


                  • #39
                    Originally posted by acpatel View Post
                    Trimming all of the sequences to the same length seems like a bad idea. Any idea when tophat / bowtie will support 454 variable length reads?

                    Greg -- how did you take your mouse reads and get them to be a uniform length? Straight trimming? Segmenting?

                    Thanks,
                    Anand
                    Sorry I just saw this post, I realize I'm responding quite late, but I tiled by 454 reads with equal sized tiles. Didn't work though, tophat didn't produce any good results, surely because it has trouble mapping across those homopolymers. I'm off 454 altogether, I'm only working with Solexa data at this point because we're getting 115 base reads from our machine now so Solexa really seems to have superseded 454 at this point. Sell your 454 stock!

                    Comment


                    • #40
                      Hi All,

                      I am getting this error message on Mac OS X 10.5.8

                      Could you please help me what did I install wrong?

                      Thanks in advance!



                      tophat -r 20 test_ref reads_1.fq reads_2.fq

                      [2012-10-22 16:52:02] Beginning TopHat run (v2.0.5)
                      -----------------------------------------------
                      [2012-10-22 16:52:02] Checking for Bowtie
                      Bowtie version: 2.0.0.7
                      [2012-10-22 16:52:02] Checking for Samtools
                      Samtools version: 0.1.18.0
                      [2012-10-22 16:52:02] Checking for Bowtie index files
                      [2012-10-22 16:52:02] Checking for reference FASTA file
                      [2012-10-22 16:52:02] Generating SAM header for test_ref
                      format: fastq
                      quality scale: phred33 (default)
                      [2012-10-22 16:52:02] Preparing reads
                      [FAILED]
                      Error running 'prep_reads'
                      dyld: unknown required load command 0x80000022

                      Comment


                      • #41
                        HI.. I am a new user of tophat and I have illumina generated RNA seq single end reads of 35ntd length. Here is the error I am getting when I tried to run tophat...

                        root@ubuntu:~# tophat -g 1 -p 4 -o '/media/bv/My Passport/output' '/home/bv/Desktop/b/bowtie2-2.1.0/index/hg19' '/media/bv/My Passport/Normal/normal.fastq'

                        [2013-07-02 12:46:03] Beginning TopHat run (v2.0.3)
                        -----------------------------------------------
                        [2013-07-02 12:46:03] Checking for Bowtie
                        Bowtie version: 2.0.0.6
                        [2013-07-02 12:46:03] Checking for Samtools
                        Samtools version: 0.1.19.0
                        [2013-07-02 12:46:03] Checking for Bowtie index files
                        [2013-07-02 12:46:03] Checking for reference FASTA file
                        Warning: Could not find FASTA file /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19.fa
                        [2013-07-02 12:46:03] Reconstituting reference FASTA file from Bowtie index
                        Executing: /usr/bin/bowtie2-inspect /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19 > /media/bv/My Passport/output/tmp/hg19.fa
                        [2013-07-02 12:51:29] Generating SAM header for /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19
                        format: fastq
                        quality scale: phred33 (default)
                        [2013-07-02 12:51:40] Preparing reads
                        left reads: min. length=35, max. length=35, 15669232 kept reads (5155 discarded)
                        Warning: you have only one segment per read.
                        If the read length is greater than or equal to 45bp,
                        we strongly recommend that you decrease --segment-length to about half the read length because TopHat will work better with multiple segments
                        open: No such file or directory
                        [main_samview] fail to open "/media/bv/My" for reading.
                        Warning: junction database is empty!
                        open: No such file or directory
                        [main_samview] fail to open "/media/bv/My" for reading.
                        [2013-07-02 12:54:33] Reporting output tracks
                        [FAILED]
                        Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /media/bv/My Passport/output/ --max-multihits 1 --max-seg-multihits 10 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p4 --no-closure-search --no-microexon-search --sam-header /media/bv/My Passport/output/tmp/hg19_genome.bwt.samheader.sam --samtools=/usr/local/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /media/bv/My Passport/output/tmp/hg19.fa /media/bv/My Passport/output/junctions.bed /media/bv/My Passport/output/insertions.bed /media/bv/My Passport/output/deletions.bed /media/bv/My Passport/output/fusions.out /media/bv/My Passport/output/tmp/accepted_hits /media/bv/My Passport/output/tmp/left_kept_reads.bam
                        Loaded 0 junctions

                        please help me in learning what has went wrong..

                        Comment


                        • #42
                          Hi Next
                          I think you should know what is your read length and insert size. Tophat takes 250 insert size by default.
                          Now my question is how can we change the default option ?
                          may someone answer my question posted here: http://seqanswers.com/forums/showthr...098#post112098


                          Originally posted by Nextgenanalysis View Post
                          HI.. I am a new user of tophat and I have illumina generated RNA seq single end reads of 35ntd length. Here is the error I am getting when I tried to run tophat...

                          root@ubuntu:~# tophat -g 1 -p 4 -o '/media/bv/My Passport/output' '/home/bv/Desktop/b/bowtie2-2.1.0/index/hg19' '/media/bv/My Passport/Normal/normal.fastq'

                          [2013-07-02 12:46:03] Beginning TopHat run (v2.0.3)
                          -----------------------------------------------
                          [2013-07-02 12:46:03] Checking for Bowtie
                          Bowtie version: 2.0.0.6
                          [2013-07-02 12:46:03] Checking for Samtools
                          Samtools version: 0.1.19.0
                          [2013-07-02 12:46:03] Checking for Bowtie index files
                          [2013-07-02 12:46:03] Checking for reference FASTA file
                          Warning: Could not find FASTA file /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19.fa
                          [2013-07-02 12:46:03] Reconstituting reference FASTA file from Bowtie index
                          Executing: /usr/bin/bowtie2-inspect /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19 > /media/bv/My Passport/output/tmp/hg19.fa
                          [2013-07-02 12:51:29] Generating SAM header for /home/bv/Desktop/b/bowtie2-2.1.0/index/hg19
                          format: fastq
                          quality scale: phred33 (default)
                          [2013-07-02 12:51:40] Preparing reads
                          left reads: min. length=35, max. length=35, 15669232 kept reads (5155 discarded)
                          Warning: you have only one segment per read.
                          If the read length is greater than or equal to 45bp,
                          we strongly recommend that you decrease --segment-length to about half the read length because TopHat will work better with multiple segments
                          open: No such file or directory
                          [main_samview] fail to open "/media/bv/My" for reading.
                          Warning: junction database is empty!
                          open: No such file or directory
                          [main_samview] fail to open "/media/bv/My" for reading.
                          [2013-07-02 12:54:33] Reporting output tracks
                          [FAILED]
                          Error running /usr/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /media/bv/My Passport/output/ --max-multihits 1 --max-seg-multihits 10 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p4 --no-closure-search --no-microexon-search --sam-header /media/bv/My Passport/output/tmp/hg19_genome.bwt.samheader.sam --samtools=/usr/local/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /media/bv/My Passport/output/tmp/hg19.fa /media/bv/My Passport/output/junctions.bed /media/bv/My Passport/output/insertions.bed /media/bv/My Passport/output/deletions.bed /media/bv/My Passport/output/fusions.out /media/bv/My Passport/output/tmp/accepted_hits /media/bv/My Passport/output/tmp/left_kept_reads.bam
                          Loaded 0 junctions

                          please help me in learning what has went wrong..

                          Comment


                          • #43
                            Originally posted by Nextgenanalysis View Post
                            HI.. I am a new user of tophat and I have illumina generated RNA seq single end reads of 35ntd length. Here is the error I am getting when I tried to run tophat...

                            root@ubuntu:~# tophat -g 1 -p 4 -o '/media/bv/My Passport/output' '/home/bv/Desktop/b/bowtie2-2.1.0/index/hg19' '/media/bv/My Passport/Normal/normal.fastq'
                            (... much deleted ...)
                            [main_samview] fail to open "/media/bv/My" for reading.
                            (... more deleted ...)
                            please help me in learning what has went wrong..
                            You have spaces in your path/file names. Yes, I know that you quoted them in your tophat command line and thus they should be ok even with a space. Never-the-less in subsequent steps they became unquoted. I suggest that you put your files in a path without spaces in in; e.g., MyPassport instead of My Passport.

                            BTW, jp's comment has nothing to do with your problem.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 08:47 AM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            59 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            54 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X