SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Paired-end Illumina RNA-seq adapter trimming fabrice Bioinformatics 8 01-05-2015 07:48 AM
What is the idT base modification in the 3' small RNA-seq adapter? Joanne Harding RNA Sequencing 2 11-30-2011 06:22 AM
Small RNA - Reads too long after trimming DrDTonge Bioinformatics 1 07-11-2011 07:37 PM
mirTools with 454 Data for non coding Rna analysis Giorgio C Bioinformatics 14 10-11-2010 07:05 AM
small RNA Adapter suludana Illumina/Solexa 5 10-07-2010 02:27 AM

Reply
 
Thread Tools
Old 07-09-2012, 04:33 PM   #1
ndeshpan
Member
 
Location: Sydney

Join Date: Nov 2009
Posts: 29
Unhappy HiSeq small rna data adapter trimming using Adapter_trim.pl (mirTools)

Hi,

My HiSeq data for small RNA looks like this


@HWI-ST705:254:C0G8HACXX:5:1101:1674:1995 1:N:0:AGTCAA
TGAGATGAAGCACTGTAGCTCTGGAATTCTCGGGT
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJH
@HWI-ST705:254:C0G8HACXX:5:1101:1765:1986 1:N:0:AGTCAA
TGAGAACTGAATTCCATAGGCTGTTGGAATTCTCG
+
BCCFFFFDHFHHHIJJJIJJJJJHHJJEHIJIJJJ
@HWI-ST705:254:C0G8HACXX:5:1101:1785:1990 1:N:0:AGTCAA
GCTCTGTGATGAACCCTGGAATTCTCGGGTGCCAA
+
=?@DFFFA=AADFHIJJJJGAHHGIIIJI?CFGHC
@HWI-ST705:254:C0G8HACXX:5:1101:1825:1999 1:N:0:AGTCAA
TTTGGCAATGGTAGAACTCACACCTGGAATTCTCG
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJ
@HWI-ST705:254:C0G8HACXX:5:1101:2182:1985 1:N:0:AGTCAA
CAACNGAATCCCAAAAGCAGCTGTGGAATTCTCGG
+
@@@D#2=BDDHFHBGHHIIIGHHHIGGGHH<?DHI
@HWI-ST705:254:C0G8HACXX:5:1101:2106:1988 1:N:0:AGTCAA
TAGCTTATCAGACTGATGTTGACTTGGAATTCTCG
+
??@FFFD+=CFFFHGIJJGIHHHHHJCFHEHHHDH
@HWI-ST705:254:C0G8HACXX:5:1101:2543:1995 1:N:0:AGTCAA
TTCACAGTGGCTAAGTTCTGCTGGAATTCTCGGGT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJH

When I use the Adapter_trim.pl script from miRTools with format option "3" (for illumina format 1.3+) or evn "2" for the older formats ..I get a empty output file..

The previous illumina datasets had the complete IDs repeated befire the quality value lines and the script used to work good for me...

Any suggestions?

regards,

Nandan
The read grouping script from miRAnalyser also gives me issues for HiSeq dataset (again this worked well for GAII illumina data)
ndeshpan is offline   Reply With Quote
Old 07-09-2012, 11:38 PM   #2
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

I don't know miRTools, but I am pretty convinced that Biopieces will be quite helpful for this. Especially find_adaptor.
maasha is offline   Reply With Quote
Old 08-21-2012, 06:11 AM   #3
vikas0633
Junior Member
 
Location: denmark

Join Date: Mar 2012
Posts: 3
Default

To my knowledge - best way to do adapter filtering/trimming is

http://hannonlab.cshl.edu/fastx_toolkit/

if you are not fan of unix system then try galaxy
https://main.g2.bx.psu.edu/
vikas0633 is offline   Reply With Quote
Old 08-21-2012, 06:24 AM   #4
sgcsd
Junior Member
 
Location: USA

Join Date: Aug 2012
Posts: 7
Default

Hi all:
I am new to Illumina sequencing. I have a very basic question. The sequence we get after running through the Illumina pipeline, does they contain adapters for all the reads or only few reads.

Recently we did an sequencing run through Hiseq2000 (multiplexed) and the fastq file has only few reads containing (5%) adapters or primers. I used the adapter and primer sequences used in library prep (from illumina truseq).

I read some where that when the pipeline demultiplex it trims the reads and removes the barcode.Is it true.

Please reply or direct me to some literature that explains the basic.

Thank you
sgcsd is offline   Reply With Quote
Old 02-26-2013, 12:13 AM   #5
bharat_iyengar
Member
 
Location: Delhi, India

Join Date: Dec 2012
Posts: 20
Default

i am facing a similar problem..

I was interested in getting some information from a publicly available hiseq2000 small RNA seq data from drosophila.
However, the library was prepared by cloning and not using truseq (as reported in the SRA. Accession number SRR513393).

This isnt a great concern. I used fastqc to analyse the reads and the quality distribution seemed to be pretty okay (PFA). However, no overrepresented sequence was detected and I am unsure of the sequence of adapters. The reads are 50 nt long (more than twice the size of any miRNA or similar RNAs).

I used bowtie v0.12.9 to align the reads against the drosophila transcriptome index that I built from flybase transcripts release v5.49, with options (-v 2 --norc -a --best --strata). No read got aligned, and I suspect that it might be because of some bogus sequence filling up the ends. I am not able to detect what those bogus sequences might be.

Any tips for preprocessing.

Plus, when I used tophat to align the reads against genome index with annotations provided from GFF file v5.49 from flybase, then tophat stopped with a report "gtf_to_fasta returned an error" [ isn't tophat supposed to accept GFF v3 files ??]
Attached Images
File Type: png per_base_quality.png (10.1 KB, 7 views)
bharat_iyengar is offline   Reply With Quote
Reply

Tags
hiseq, illumina, mirtools, small rna

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO