View Single Post
Old 02-09-2017, 06:10 AM   #4
Location: Mountain Grove, MO, USA

Join Date: Apr 2014
Posts: 29

these are single end reads from an RNA sequencing project. adapter sequences have been removed.
out of some 37 mil reads I get 4000 reads reads aligning to a just under 8000bp long virus that we know is present in the sample. But these are not siRNA in 20nt length range, these reads range from about 30 to 100 nt long.
I am sure there is duplication, though I do not know how to find or eliminate that and would appreciate any hints as to where to read about this or find tools to work with this.
out of about 160 contigs, 40 are very short, the same length as the shortest reads in fact.
the reads matching the virus (bwa aligned, IGV visualized) seem to visually cover most of the virus sequence, the 27 contigs cover maybe 30 % of the virus sequence.
Thanks for the help !
sfh838t is offline   Reply With Quote