Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • small RNA-seq size filtering

    Hello all,

    I am relatively new to small RNA seq (have only worked with regular RNA seq), and I had a question about my workflow. I am looking for differential expression of miRNAs between 2 sample types - not interested at the moment in novel miRNA discovery. I have 50 bp paired-end reads from an Illumina HiSeq using a library prepared by the Illumina Truseq Small RNA protocol.

    I was planning to use cutadapt to trim the adapters from the forward and reverse reads since cutadapt has the ability to work with paired end reads. My question comes in the maximum and minimum sizes of the reads to keep. It is my understanding that mature miRNAs range in sizes around 17 to 25 nt long, so would it be correct to have cutadapt only keep reads with a minimum of 15 nt and a maximum of 30 nt long after adapter trimming in order to help eliminate other RNAs present in the data? I was planning to use bowtie to map the data to miRBase after the trimming.

    Another alternative I was thinking of was only keeping those reads in which the forward and reverse read are perfectly complementary to each other after trimming, but I was unsure of this method and/or how to perform it.

    Thanks in advance!

  • #2
    You can do this with BBMerge:

    bbmerge.sh in=x1.fq in2=x2.fq outm=y1.fq outm2=y2.fq mininsert=15 minoi=10 tbo

    That will trim the reads based on reverse-complement overlap and send only the successfully overlapped reads to the outm destination, retaining pairing. Alternatively, you could just output the consensus merged reads like this, so you don't have to deal with pairs anymore:

    bbmerge.sh in=x1.fq in2=x2.fq outm=y.fq mininsert=15 minoi=10

    The result will not have anything with inserts shorter than 15bp, but it will have reads with inserts longer than 35bp, so you can filter the remainder like this:

    reformat.sh in=y.fq out=z.fq maxlen=35

    Reformat will accept paired or unpaired input.

    Note that this method (looking for reverse-complement overlap) may result in fewer reads being retained than looking for adapters, because reads that overlap in multiple different orientations will be discarded as ambiguous. So trimming based on adapters and filtering by resultant length is also viable, but that method will also miss some reads (the ones with high error rate in adapter sequence). The reads each method misses will be different. I have another program, BBDuk, which can trim reads based on both adapter sequence AND overlap, which would be optimal for your case... but currently it has some constants set for overlap mode that will not work well on small RNAs, so I need to make them parameters.
    Last edited by Brian Bushnell; 07-07-2014, 12:35 PM.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      You can do this with BBMerge:

      bbmerge.sh in=x1.fq in2=x2.fq outm=y1.fq outm2=y2.fq mininsert=15 minoi=10 tbo

      That will trim the reads based on reverse-complement overlap and send only the successfully overlapped reads to the outm destination, retaining pairing. Alternatively, you could just output the consensus merged reads like this, so you don't have to deal with pairs anymore:

      bbmerge.sh in=x1.fq in2=x2.fq outm=y.fq mininsert=15 minoi=10

      The result will not have anything with inserts shorter than 15bp, but it will have reads with inserts longer than 35bp, so you can filter the remainder like this:

      reformat.sh in=y.fq out=z.fq maxlen=35

      Reformat will accept paired or unpaired input.

      Note that this method (looking for reverse-complement overlap) may result in fewer reads being retained than looking for adapters, because reads that overlap in multiple different orientations will be discarded as ambiguous. So trimming based on adapters and filtering by resultant length is also viable, but that method will also miss some reads (the ones with high error rate in adapter sequence). The reads each method misses will be different. I have another program, BBDuk, which can trim reads based on both adapter sequence AND overlap, which would be optimal for your case... but currently it has some constants set for overlap mode that will not work well on small RNAs, so I need to make them parameters.
      Thanks for your reply. Would this work even though the insert size is smaller than a single read? As far as I know, most of these programs are optimized to run with inserts larger than a single read, but smaller than two of the reads. Also, what about the adapters? Would those need to be trimmed beforehand?

      Comment


      • #4
        It works fine with insert size shorter than read length, with the extra flags "mininsert=15 minoi=10". By default it doesn't look for insert sizes shorter than 35bp.

        You can trim adapters beforehand if you want but it's not necessary unless the r1 adapter and r2 adapter are close to being reverse-complementary. Still, it may improve the results.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 11:49 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X