Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sfh838t
    Member
    • Apr 2014
    • 29

    why are sRNA output reads longer than siRNA?

    Hi,
    this is possibly a dumb question, but if my goal is to find siRNA (20-25 nt long) why are the Illumina reads 36 nt long, at least before quality trim? if a 24 nt long RNA piece (plus primers) is sequenced, how is it possible for the result to be 36 nt long? Am I looking at this way too simplistic or what?

    Also: if I can align (bowtie2) enough reads to cover my entire virus sequence, how come after assembling (velvet) the contigs cover only fractions of the ref seq? How much of it is covered depends on number of reads mostly, kmer size a little also.
    anything I can do to improve the assembly?
  • cmbetts
    Senior Member
    • Jun 2012
    • 120

    #2
    The quick answer for the first question is that the sequencer runs as many cycles as you tell it to, and that's how long the reads come out. If the insert is shorter than the read length, it reads into the adapter on the opposite side, and gibberish (mostly As) beyond that. The bases in the adapter need to be removed by sequence identity, not quality.

    I can't answer the second question, as I've never need to do a genome assembly.

    Comment

    • sfh838t
      Member
      • Apr 2014
      • 29

      #3
      thank you. I can see the nonsense part . and yes, it was 36nt after adapter removal.

      Comment

      • MU Core
        Member
        • Apr 2008
        • 60

        #4
        If using bcl2fastq for adapter trimming, I believe default minimum-trimmed-read-length is set to 35. If trimming would cut a read down to less than 35 bases then the bases between the end of the trimmed read and position 35 are “masked” by replacing them with N’s. So the remaining adapter after 20 bases would be masked. Our group has set the minimum-trimmed-read-length to 10 for small RNA data sets. This may not be your situation but thought it worth mentioning.

        Comment

        • sfh838t
          Member
          • Apr 2014
          • 29

          #5
          I used cutadapt for adapter removal which best that I can tell will remove all parts of the search string no matter where they occur.
          It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb. I know, aligning and assembling are two different things/algorithms, but still.
          If anyone has an idea where else I could maybe ask this question?

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by sfh838t View Post
            It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb.
            Let me see if I am understanding this right.

            If you align you can find reads covering the entire reference (8kb?) but if you try to assemble those reads then you can only get contigs that represent just 1 kb of the 8kb reference?

            Sequence assembly is a hard problem. If there are repeats in your reference (coupled with the short reads in your dataset) then that result is not surprising.

            Comment

            • sfh838t
              Member
              • Apr 2014
              • 29

              #7
              yes, you did understand correctly.
              I used either BWA or bowtie2 to align reads to ref seq, then go through the samtools steps to filter out only reads that align, convert back to fastq, then run velvet or ABySS and get mostly nothing, depending on read depth.
              I have three plant samples with apparently varying degrees of virus infections, assembled contig coverage increases from 1kb, to 2 and 6kb of 8kb total virus length with increasing read depth. However, for each sample I can use IGV to look at and bedtools to give me numbers for the read alignments and if I use all reads regardless of their length I have coverage of the entire target virus minus 1 to 6 nts.

              Comment

              • fanli
                Senior Member
                • Jul 2014
                • 197

                #8
                Is there something particular about your virus that you'd be trying to do assembly with really short reads? I don't think a lot of the assemblers out there are optimized for this...

                Comment

                • sfh838t
                  Member
                  • Apr 2014
                  • 29

                  #9
                  looking for variants, maybe strain identification etc.
                  velvet seems to be commonly used for this, any suggestions for a different assembler?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Is what we are discussing now unrelated to the original question or is this an ssRNA virus? (I can split later posts into a new thread if that is so).

                    Is there a reason you are trying to assemble the virus (when you have a reference)? (Edit: Loks like @fanli already asked this question while I was typing this).

                    If you have some time take a look at tadpole.sh from BBMap. It may provide a fresh option. I would also look into BBSplit to separate the viral reads before doing the assembly with tadpole.

                    Comment

                    • sfh838t
                      Member
                      • Apr 2014
                      • 29

                      #11
                      it was the second question, so I don't know if it should be split.
                      I will look into tadpole and the other suggestions, thanks!

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        Today, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 06:09 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      37 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      42 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...