Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with mapping of scriptSeq of RNASeq

    Hello everyone,

    I have been trying to process some RNAseq data prepared using the ScriptSeq protocol and sequenced on a HiSeq machine. My pipeline was to remove adaptors with trimmomatic, align with bowtie2 against the transcriptome and then use express to quantify the transcripts.

    However when I ran express I got a warning message about "The observed alignments appear disproportionately in the forward-reverse order". I have been trying to understand what could cause this to happen on paired ended data. After aligning each pair individual against the transcriptome I noticed the first pair aligns most of the time on the forward strand but the reverse pair seem to align on both strands. See below:

    pair_1
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5975216
    Mapped reads: 2530358 (42.3476%)
    Forward strand: 5836104 (97.6719%)
    Reverse strand: 139112 (2.32815%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    pair_2
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5994394
    Mapped reads: 2543964 (42.4391%)
    Forward strand: 3587426 (59.8463%)
    Reverse strand: 2406968 (40.1536%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?

  • #2
    Originally posted by Bacms View Post
    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?
    No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.

    Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.
      But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?

      Originally posted by Brian Bushnell View Post
      Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.
      This is Chlamydomonas Reinhardtii and the reason for using bowtie is that as far as I can tell there is no way of using tophat/cufflinks as input to express but I may be wrong.
      What is the best way to attach the report from fastqc as the attachment size is to small to attach.

      Comment


      • #4
        Originally posted by Bacms View Post
        But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?
        My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

        What is the best way to attach the report from fastqc as the attachment size is to small to attach.
        Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.

        P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.

        Comment


        • #5
          For ScriptSeq libraries, use –fr secondstrand. -fr secondstrand means that the strand being synthesized on the sequencer is the sense strand for Read 1.

          Olaf

          Comment


          • #6
            [QUOTE=Brian Bushnell;147990]My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

            Ok yes I am aware that when you are aligning to the genome you should get ~50/50%


            Originally posted by Brian Bushnell View Post
            Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.
            I will double check the fastqc for an option to output in pdf

            Originally posted by Brian Bushnell View Post
            P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.
            Will do but since I am using a python script to perform the system commands I didn't have the individual commands for all steps. Here they are now:
            #Run fastqc
            Running fastqc v0.11.2
            fastqc --outdir=../results/140526_I453_FCC4LT4ACXX_L1_Index1/ ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq

            #Running trimmomatic
            java -jar trimmomatic-0.32.jar PE -threads 24 -trimlog trim_log.txt ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1U.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2U.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:40

            #Align against the transcriptome using bowtie
            bowtie2 -k 100 -p 22 --phred64 --un-conc ../results/140526_I453_FCC4LT4ACXX_L1_Index1/unmapped.fq -x Creinhardtii_281_v5.5.transcript -1 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq -2 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq -S ../results/140526_I453_FCC4LT4ACXX_L1_Index1/cDNA.bowtie

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X