Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems using TopHat with RNA-seq data

    Hi

    I am totally new to all this stuff and promptly ran into problems using Tophat for aligning RNAseq-data.

    I have an invertebrate draft genome around 100megabases in 7000 scaffolds and I wanted to align PE RNAseq-data to the genome with Tophat. I wanted to use the output in Maker for annotation of the genome, because I have not yet done a de-novo-assembly.

    Tophat finished without any errors but the results have been a tiny accepted_hits.bam and a huge unmapped.bam.

    So what I did was:

    - build the genome index with bowtie2-build
    - trimmed the 76 bp paired-end reads with Trimmomatic (leading: 20 trailing: 20 minlen: 50)
    - ran Tophat with default parameters, except: -r 150 (insert size 250 - 300bp)


    I am not sure which of the many logs are important, some also seem to contain redundant information, but here are:

    - tophat.log:

    [2015-02-06 17:50:08] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2015-02-06 17:50:08] Checking for Bowtie
    Bowtie version: 2.0.0.6
    [2015-02-06 17:50:08] Checking for Samtools
    Samtools version: 0.1.18.0
    [2015-02-06 17:50:08] Checking for Bowtie index files
    [2015-02-06 17:50:08] Checking for reference FASTA file
    Warning: Could not find FASTA file H2genome.fa
    [2015-02-06 17:50:08] Reconstituting reference FASTA file from Bowtie index
    Executing: /local64/usr_local/bin/bowtie2-inspect H2genome > ./tophat_out/tmp/H2genome.fa
    [2015-02-06 17:50:15] Generating SAM header for H2genome
    format: fastq
    quality scale: phred33 (default)
    [2015-02-06 17:50:17] Preparing reads
    left reads: min. length=50, count=31002899
    right reads: min. length=50, count=31002899
    [2015-02-06 18:15:30] Mapping left_kept_reads against H2genome with Bowtie2
    [2015-02-06 18:39:35] Mapping left_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 18:44:16] Mapping left_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 18:49:13] Mapping left_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 18:56:30] Mapping right_kept_reads against H2genome with Bowtie2
    [2015-02-06 19:20:49] Mapping right_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 19:25:54] Mapping right_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 19:31:06] Mapping right_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 19:36:59] Searching for junctions via segment mapping
    [2015-02-06 19:41:42] Retrieving sequences for splices
    [2015-02-06 19:41:48] Indexing splices
    [2015-02-06 19:43:16] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 19:45:25] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 19:47:56] Mapping left_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 19:50:27] Joining segment hits
    [2015-02-06 20:06:22] Mapping right_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 20:08:41] Mapping right_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 20:11:01] Mapping right_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 20:13:48] Joining segment hits
    [2015-02-06 20:28:45] Reporting output tracks

    - reports.log:

    [samopen] SAM header is present: 6973 sequences.
    Loaded 7 junctions
    Reporting final accepted alignments...done.
    Printing junction BED track...done
    Printing insertions...done
    Printing deletions...done
    Found 5 junctions from happy spliced reads

    - bowtie.left_kept_reads.fixmap.log:

    49466 out of 31052365 reads have been filtered out
    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    - bowtie.right_kept_reads.fixmap.log:

    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    So it seems that most of the reads align to the genome, right (>80%)? Although, to me, it looks a little bit odd that the right reads have exactly the same values as the left reads?

    Also, in any of the "bowtie.right/left_kept_reads_seg.fixmap.log" files, the alignment rate is much lower.

    e.g. bowtie.right_kept_reads_seg2.fixmap.log:

    9740245 reads; of these:
    9740245 (100.00%) were unpaired; of these:
    6453402 (66.26%) aligned 0 times
    2647488 (27.18%) aligned exactly 1 time
    639355 (6.56%) aligned >1 times
    33.74% overall alignment rate

    So I am really interested to understand what has happened in my run and why Tophat has found almost no accepted hits and junctions?

    Any suggestions?

    Thanks in advance!

  • #2
    Sorry,

    could be a false alarm. I may have simple messed up my fastq files. I am rerunning the alignment right now.

    Sorry again

    Comment


    • #3
      Q20 is a pretty high threshold for quality trimming. Depending on your data quality, you may shorten your reads substantially, which will make them less sensitive to alternative splicing. Also, since you are (probably) doing a quantitative analysis, very aggressive trimming and discarding of reads shorter than 50bp can cause bias, as there is a correlation between bases/motifs and quality values.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X