Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina paired-end reads. More than 2 adapter sequences.

    Hi,

    I've been recently involved in a project where my task is to analyze double-stranded RNA sequencing data, and I'm relatively new to this field.

    Data:
    - Illumina MiSeq 2x150 reads
    - 2 samples - about 3.5 million paired-end reads per sample
    - Known linker/primer sequences
    - RNA library is supposed to include only dsRNA molecules > 150 nt

    I've started analyzing the data of sample 1 and I've found that a huge proportion of the reads have the following structure (R1 and R2 are mate reads):

    R1-5' => [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] xxxx 3'
    R2-3' => xxx [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] 5'

    (Note 1 : mate reads are complementary)
    (Note 2 : xxx portion ~10 nt long)

    Is there any explanation for this?

    I have a very basic idea about the RNA-Seq process, but not enough to explain why do I have more than 2 adapters per read.

    Thanks in advance.

  • #2
    Sorry, missalignment of the read structure. Corrected below:

    R1-5' => [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] xxxx 3'
    R2-3' => xxx [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] 5'

    Comment


    • #3
      Do you have any idea how the libraries were generated? This certainly does not look like a standard Illumina TruSeq library. It could be that something is going wrong during one of the adapter ligation steps that is causing fragments to concatamerize. It's also strange that adapter 1 is being sequenced first in both reads.

      Comment


      • #4
        This looks very strange to me. I don't think any of your reads should start with adapter sequence. They can end with adapter sequence if your insert is shorter than your read length. Is your "some sequence 2" a short sequence of indexing nucleotides?

        I can only think that the actual Illumina adapters are outside of your strange adapter reads.

        Comment


        • #5
          kcchan and microgirl123,

          Thanks a lot for your posts.

          A couple of clarifications:

          Do you have any idea how the libraries were generated?
          Hope this helps:
          - RNA fragmentation
          - Double strand cDNA synthesis
          - End repair
          - Linkers/adapters are ligated at the 3' and 5' ends of the double-stranded RNA (I refer to those sequences as adapter 1 and 2)
          - Denaturing
          - PCR amplification (primers are complementary to the linkers)
          - HTS

          It's also strange that adapter 1 is being sequenced first in both reads
          The actual sequences are:

          HTML Code:
          R1 => [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] xxxx
          R2 => [Adapter 2] [some sequence 2] [Adapter 2] [some sequence 1] [Adapter 1] xxxx
          But, since they are paired end reads, I was trying to show that both sequences were complementary:

          HTML Code:
          R1-5' =>     [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2] xxxx 3'
          R2-3' => xxx [Adapter 1] [some sequence 1] [Adapter 2] [some sequence 2] [Adapter 2]      5'
          Sorry about the confusion.

          Is your "some sequence 2" a short sequence of indexing nucleotides?
          No, I've already checked for the indexing nucleotides. Anyway I've taken a deeper look at this "some sequence 2", and I think it is also containing the Adapter 1, with a few mismatches.

          I can only think that the actual Illumina adapters are outside of your strange adapter reads.
          Adapter sequences were provided to me by the experimental people, and they are those Adapter 1 and 2 that I show.

          Comment


          • #6
            Ah, I see what's going on now. This is a directional RNA-Seq library. If I understand correctly, the 3' Adapter sequence is showing up multiple times. This is indicative of an adapter that was designed incorrectly. The oligo for the 3' adapter must have a modification at the 3' end in order to prevent it ligating to other inserts; either a dideoxy nucleotide or amino modification. I'm guessing this was not done and the shorter RNA fragments concatamerized during the 5' ligation reaction. The data is still good, but you'll have to do some work to separate the individual inserts in the reads.

            Comment


            • #7
              Thanks kcchan!

              Any suggestion on how to proceed?

              Since the majority of paired reads (r1/r2) show a high degree of overlapping, maybe I could build single reads from them, trim the adaptors, and map the resulting short reads to the reference genome.

              Comment


              • #8
                Yea, I think the best you can do with those reads is to use them as a single end read. You're going to have to do a bit of work to get the reads cleaned up, however. Not only will you need to trim off the adapters, but also separate each sequence in between the adapters and treat it as individuals reads.

                Comment


                • #9
                  You could split the reads on the index sequences (forward and reverse complement) using perl. You would also need to assign independent read names and (for quality scores) keep track of the positional information, plus discard reads below a certain length. Unless these are supposed to be small RNAs or it's a SAGE type of experiment, the small insert sizes would cause me to question the sample quality.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X