Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • why are sRNA output reads longer than siRNA?

    Hi,
    this is possibly a dumb question, but if my goal is to find siRNA (20-25 nt long) why are the Illumina reads 36 nt long, at least before quality trim? if a 24 nt long RNA piece (plus primers) is sequenced, how is it possible for the result to be 36 nt long? Am I looking at this way too simplistic or what?

    Also: if I can align (bowtie2) enough reads to cover my entire virus sequence, how come after assembling (velvet) the contigs cover only fractions of the ref seq? How much of it is covered depends on number of reads mostly, kmer size a little also.
    anything I can do to improve the assembly?

  • #2
    The quick answer for the first question is that the sequencer runs as many cycles as you tell it to, and that's how long the reads come out. If the insert is shorter than the read length, it reads into the adapter on the opposite side, and gibberish (mostly As) beyond that. The bases in the adapter need to be removed by sequence identity, not quality.

    I can't answer the second question, as I've never need to do a genome assembly.

    Comment


    • #3
      thank you. I can see the nonsense part . and yes, it was 36nt after adapter removal.

      Comment


      • #4
        If using bcl2fastq for adapter trimming, I believe default minimum-trimmed-read-length is set to 35. If trimming would cut a read down to less than 35 bases then the bases between the end of the trimmed read and position 35 are “masked” by replacing them with N’s. So the remaining adapter after 20 bases would be masked. Our group has set the minimum-trimmed-read-length to 10 for small RNA data sets. This may not be your situation but thought it worth mentioning.

        Comment


        • #5
          I used cutadapt for adapter removal which best that I can tell will remove all parts of the search string no matter where they occur.
          It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb. I know, aligning and assembling are two different things/algorithms, but still.
          If anyone has an idea where else I could maybe ask this question?

          Comment


          • #6
            Originally posted by sfh838t View Post
            It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb.
            Let me see if I am understanding this right.

            If you align you can find reads covering the entire reference (8kb?) but if you try to assemble those reads then you can only get contigs that represent just 1 kb of the 8kb reference?

            Sequence assembly is a hard problem. If there are repeats in your reference (coupled with the short reads in your dataset) then that result is not surprising.

            Comment


            • #7
              yes, you did understand correctly.
              I used either BWA or bowtie2 to align reads to ref seq, then go through the samtools steps to filter out only reads that align, convert back to fastq, then run velvet or ABySS and get mostly nothing, depending on read depth.
              I have three plant samples with apparently varying degrees of virus infections, assembled contig coverage increases from 1kb, to 2 and 6kb of 8kb total virus length with increasing read depth. However, for each sample I can use IGV to look at and bedtools to give me numbers for the read alignments and if I use all reads regardless of their length I have coverage of the entire target virus minus 1 to 6 nts.

              Comment


              • #8
                Is there something particular about your virus that you'd be trying to do assembly with really short reads? I don't think a lot of the assemblers out there are optimized for this...

                Comment


                • #9
                  looking for variants, maybe strain identification etc.
                  velvet seems to be commonly used for this, any suggestions for a different assembler?

                  Comment


                  • #10
                    Is what we are discussing now unrelated to the original question or is this an ssRNA virus? (I can split later posts into a new thread if that is so).

                    Is there a reason you are trying to assemble the virus (when you have a reference)? (Edit: Loks like @fanli already asked this question while I was typing this).

                    If you have some time take a look at tadpole.sh from BBMap. It may provide a fresh option. I would also look into BBSplit to separate the viral reads before doing the assembly with tadpole.

                    Comment


                    • #11
                      it was the second question, so I don't know if it should be split.
                      I will look into tadpole and the other suggestions, thanks!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X