SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimmomatic dropping 100% sRNA reads BADE Bioinformatics 8 03-02-2016 03:35 PM
whole genome metagenomics - longer reads or more reads liux Bioinformatics 4 11-13-2014 12:11 AM
Align sRNA reads from two libraries bornanarchist Bioinformatics 0 05-28-2013 08:36 PM
splitting longer reads into two smaller reads for bowtie input a_mt Bioinformatics 0 02-19-2013 10:14 PM
Longer reads => more errors? [email protected] RNA Sequencing 2 12-18-2009 01:51 PM

Reply
 
Thread Tools
Old 03-15-2016, 11:50 AM   #1
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default why are sRNA output reads longer than siRNA?

Hi,
this is possibly a dumb question, but if my goal is to find siRNA (20-25 nt long) why are the Illumina reads 36 nt long, at least before quality trim? if a 24 nt long RNA piece (plus primers) is sequenced, how is it possible for the result to be 36 nt long? Am I looking at this way too simplistic or what?

Also: if I can align (bowtie2) enough reads to cover my entire virus sequence, how come after assembling (velvet) the contigs cover only fractions of the ref seq? How much of it is covered depends on number of reads mostly, kmer size a little also.
anything I can do to improve the assembly?
sfh838t is offline   Reply With Quote
Old 03-15-2016, 12:42 PM   #2
cmbetts
Member
 
Location: Bay Area

Join Date: Jun 2012
Posts: 88
Default

The quick answer for the first question is that the sequencer runs as many cycles as you tell it to, and that's how long the reads come out. If the insert is shorter than the read length, it reads into the adapter on the opposite side, and gibberish (mostly As) beyond that. The bases in the adapter need to be removed by sequence identity, not quality.

I can't answer the second question, as I've never need to do a genome assembly.
cmbetts is offline   Reply With Quote
Old 03-15-2016, 01:50 PM   #3
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default

thank you. I can see the nonsense part . and yes, it was 36nt after adapter removal.
sfh838t is offline   Reply With Quote
Old 03-16-2016, 06:30 AM   #4
MU Core
Member
 
Location: Columbia, Missouri

Join Date: Apr 2008
Posts: 50
Default

If using bcl2fastq for adapter trimming, I believe default minimum-trimmed-read-length is set to 35. If trimming would cut a read down to less than 35 bases then the bases between the end of the trimmed read and position 35 are “masked” by replacing them with N’s. So the remaining adapter after 20 bases would be masked. Our group has set the minimum-trimmed-read-length to 10 for small RNA data sets. This may not be your situation but thought it worth mentioning.
MU Core is offline   Reply With Quote
Old 03-16-2016, 06:53 AM   #5
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default

I used cutadapt for adapter removal which best that I can tell will remove all parts of the search string no matter where they occur.
It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb. I know, aligning and assembling are two different things/algorithms, but still.
If anyone has an idea where else I could maybe ask this question?
sfh838t is offline   Reply With Quote
Old 03-16-2016, 07:02 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,574
Default

Quote:
Originally Posted by sfh838t View Post
It still puzzles me though why I can find reads that align (regardless of read length) covering literally my whole ref seq but can only come up with contigs covering 1kb of nearly 8 kb.
Let me see if I am understanding this right.

If you align you can find reads covering the entire reference (8kb?) but if you try to assemble those reads then you can only get contigs that represent just 1 kb of the 8kb reference?

Sequence assembly is a hard problem. If there are repeats in your reference (coupled with the short reads in your dataset) then that result is not surprising.
GenoMax is offline   Reply With Quote
Old 03-16-2016, 07:12 AM   #7
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default

yes, you did understand correctly.
I used either BWA or bowtie2 to align reads to ref seq, then go through the samtools steps to filter out only reads that align, convert back to fastq, then run velvet or ABySS and get mostly nothing, depending on read depth.
I have three plant samples with apparently varying degrees of virus infections, assembled contig coverage increases from 1kb, to 2 and 6kb of 8kb total virus length with increasing read depth. However, for each sample I can use IGV to look at and bedtools to give me numbers for the read alignments and if I use all reads regardless of their length I have coverage of the entire target virus minus 1 to 6 nts.
sfh838t is offline   Reply With Quote
Old 03-16-2016, 07:14 AM   #8
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 197
Default

Is there something particular about your virus that you'd be trying to do assembly with really short reads? I don't think a lot of the assemblers out there are optimized for this...
fanli is offline   Reply With Quote
Old 03-16-2016, 07:17 AM   #9
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default

looking for variants, maybe strain identification etc.
velvet seems to be commonly used for this, any suggestions for a different assembler?
sfh838t is offline   Reply With Quote
Old 03-16-2016, 07:23 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,574
Default

Is what we are discussing now unrelated to the original question or is this an ssRNA virus? (I can split later posts into a new thread if that is so).

Is there a reason you are trying to assemble the virus (when you have a reference)? (Edit: Loks like @fanli already asked this question while I was typing this).

If you have some time take a look at tadpole.sh from BBMap. It may provide a fresh option. I would also look into BBSplit to separate the viral reads before doing the assembly with tadpole.
GenoMax is offline   Reply With Quote
Old 03-16-2016, 08:16 AM   #11
sfh838t
Member
 
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 23
Default

it was the second question, so I don't know if it should be split.
I will look into tadpole and the other suggestions, thanks!
sfh838t is offline   Reply With Quote
Reply

Tags
sirna, velvet

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO