Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • paired end directionality

    Hello
    I recently prepped a paired end directional rna-seq library using epicentre's script-seq kit.

    The way they do it is that read1 will always be sense to the read and read2 will always be antisense to the read.

    My question to you is: how does an algorithm (TopHat, GSNAP, BWA, etc) determine strandedness given this information?
    I would think that the strandedness can be determined by read1, is this correct?
    When I look at my bam file (sam output below) I do not see anything that would assign strandedness to the read.

    HTML Code:
    D5N1JJN1:93:D09AFACXX:1:2105:1716:91336 163     chr3    131045  40      2S97M   =       131117  173     CTCCTGAATTCTTTCTTGCATAAGATCCAAGAACCCTCTTTTGGAGTCTGAATTAGGACCCCTTTCCTGCAACACCTATGCCATGCAAAGTTAACAACC     CCFFFFFHHHHHJJJJJJJJJJJJIJJJJJJEGHHJJJJJJJJJGIIJJGHIIJJJJJJJJJIJIJIJJJJHHHHFFFFCCEEEEDCDDCDDEEDDDDD MD:Z:97 NH:i:1  NM:i:0  SM:i:40
    D5N1JJN1:93:D09AFACXX:1:2105:1716:91336 83      chr3    131117  40      99M     =       131045  -173    CCTATGCCATGCAAAGTTAACAACCCACATACTGTGGATTAGGATATGGGTTGCCCCCCTTTGAAATATGGGGTCATTATTTTGCCTGCCACACTGCCC     DDC@ADDDEEDDDEDDEDDDBDDDDEEEEEDDDDDDDDDDEEDDCCDDDDDDDDJJJJJIHFFJIHHGEJJJJJIIJIIJGJIIJJIHHHHHFFFFFCC MD:Z:93G5       NH:i:1  NM:i:1  SM:i:40
    The 9th column is not an accurate read out (that I think) because according to samtools manual: The leftmost segment has a plus sign and the rightmost has a minus sign. The sign of segments in the middle is un-de fined. Obviously this would be false in the instance of a transcript that would be transcribing from right to left.

    Thanks for any help.

  • #2
    The strandedness is in the flag in column 2.

    163 = 128+32+2+1
    that means this read is from the second fastq, and is not reversed, but the mate is, and the pair is properly paired.

    83 = 64+16+2+1
    that means the read is from the first fastq, and it's reversed, but the mate is not, and the pair is properly paired.

    If you aligned to genome, aren't you going to have RNAs running in both directions? So if the RNA at that locus runs backwards, then read 1 is sense.
    Last edited by swbarnes2; 03-07-2012, 02:37 PM.

    Comment


    • #3
      Ahh, ok. That makes sense.

      Sorry if this is a stupid question, but how would the algorithm know to reverse the fastq from read 1 or to keep it the same? Maybe i'm missing something elementary here, but on some reads the 1st fastq is reversed and in others it is not. An example is below:

      I found another set of reads where

      147 = 16+128+2+1
      this is the second fastq and it is reversed, the mate is not and the pair is properly paired
      99 = 64+32+2+1
      this is the first fastq and it is not reversed, the mate is,and the pair is properly paired

      thanks!
      Last edited by zorph; 03-07-2012, 03:33 PM.

      Comment


      • #4
        Your data makes perfect sense to me. Those four flags, 83,99,147,163, are exactly what you want. Two reads running in opposite directions, and the fact that they are properly paired means that they point at each other, just as they should.

        I can't speak to whether or not your premise that the first fastq always contains the sense direction is accurate; I've never seen data from a run that preserved directionality, but the fact that you have all 4 good flags doesn't invalididate that premise. You are aligning to the genome. RNA will run in both directions.

        I don't know that the aligner "knows" to reverse anything. It aligns both ends independantly as best it can, and the best way is for them to be in opposite directions, pointing at each other, because that's what your fragment actaully looked like.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X