Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat with multiple fastq files

    Hi All,

    I'm probably missing something really obvious here which I can't figure out...

    The problem: It seems that tophat2 doesn't read correctly the two lists of fastq files passed as lists of comma separated files. In particular it ignores the last file of the second list (the second mate)

    More in detail: I have a library that has been sequenced on two lanes in paired-end mode. So I have two pairs of fastq files. For testing purposes I reduced the number of reads in each file as follows:

    ## Lane 3:
    s_3_1.fq.gz: 25000 reads (mate 1)
    s_3_4.fq.gz: 25000 reads (mate 2)

    ## Lane 4:
    s_4_1.fq.gz: 50000
    s_4_4.fq.gz: 50000

    Now, if I run tophat like this:

    Code:
    tophat2 -o both1 -r 100 --mate-std-dev 80 --library-type fr-unstranded -G ${annotgtf} ${bwtidx} \
        s_3_1.fq.gz,s_4_1.fq.gz \
        s_3_4.fq.gz,s_4_4.fq.gz
    It appears that the "left" reads are in total 74991 (25K + 50K), while the "right" reads are only 24992 (that is, only from s_3_4.fq.gz)

    Here's the first line of the output

    Code:
    [2012-07-09 15:21:43] Beginning TopHat run (v2.0.4)
    -----------------------------------------------
    [2012-07-09 15:21:43] Checking for Bowtie
    		  Bowtie version:	 2.0.0.5
    [2012-07-09 15:21:43] Checking for Samtools
    		Samtools version:	 0.1.18.0
    [2012-07-09 15:21:43] Checking for Bowtie index files
    [2012-07-09 15:21:43] Checking for reference FASTA file
    [2012-07-09 15:21:43] Generating SAM header for /lustre/sblab/berald01/reference_data/genomes/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
    	format:		 fastq
    	quality scale:	 phred33 (default)
    [2012-07-09 15:22:10] Reading known junctions from GTF file
    [2012-07-09 15:22:16] Preparing reads
    	 left reads: min. length=20, max. length=100, 74991 kept reads (9 discarded)
    	right reads: min. length=20, max. length=100, 24992 kept reads (8 discarded)
    So, I tried to invert the order with which the fastq files are passed as last and second last arguments:

    Code:
    tophat2 -o both2 -r 100 --mate-std-dev 80 --library-type fr-unstranded -G ${annotgtf} ${bwtidx} \
        s_4_1.fq.gz,s_3_1.fq.gz \
        s_4_4.fq.gz,s_3_4.fq.gz
    Again, "left reads" are ~75000 but now the right reads are ~50000 (only from s_4_4.fq.gz, now ignoring s_3_4.fq.gz):

    Code:
    2012-07-09 15:21:42] Beginning TopHat run (v2.0.4)
    ...
    [2012-07-09 15:21:50] Preparing reads
    	 left reads: min. length=20, max. length=100, 74991 kept reads (9 discarded)
    	right reads: min. length=20, max. length=100, 49949 kept reads (51 discarded)
    Running the two pairs of files separately produces the expected number of left and right reads, as well as concatenating the fastq files belonging to the same mate. So I guess the problem is not with the files themselves (which look fine to me anyway).

    Any ideas what's happening?

    Many thanks
    Dario

  • #2
    concatenating files before tophat?

    I notice that you did not get any answer to your question, did you figure it out your self?
    I have similar question

    Hi
    I have a RNA-seq library that has been sequenced multiple times, then I have four fastq files. My question is:
    Do I need to concatenate them before alignment in tophat?
    Can i just list the four files at the end of the tophat command like that?

    If my files are fastq1, fastq2, fastq3 and fastq4,

    and I do:

    tophat -p 4 --segment-length 20 --no-novel-juncs -G /proj/seq/data/TAIR10_Ensembl/Annotation/Archives/archive-2013-03-06-09-54-25/Genes/genes.gtf -o C_ctrl_rep1_THout_6 /proj/seq/data/TAIR10_Ensembl/Sequence/Bowtie2Index/genome fastq1 fastq2 fastq3 fastq4

    Comment


    • #3
      Originally posted by colaneri View Post
      I notice that you did not get any answer to your question, did you figure it out your self?
      I have similar question
      Hi- No. No answer. If I correctly remember I ended up concatenating the fastq files.

      If you get any insights please post on SEQanswers!

      Dario

      Comment


      • #4
        tophat with multiple fastaq files

        Hi guys, I was reading this thread because i felt was close enough to my question, but I could not find and answer, so here is my question.

        I have 4 fastq files, all of them has been generated from the same RNA-seq library. So my question is, can I run all them together in only one tophat job?
        which will be the command? do I need to list them separately with commas?

        I'm wondering if I can skyp the step of merging the files in only one big fastq

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X