SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to remove 3'-adaptor sequence from illumina DGE expression data archory Illumina/Solexa 0 11-29-2011 06:53 PM
Problems with small RNA adaptor sequences chris Bioinformatics 0 09-16-2010 09:04 AM
Calculating small RNA expressions from Solexa Sequencing with Cufflinks DrD2009 Bioinformatics 6 06-29-2010 08:12 PM
Does small RNA sequencing generate comparable reads as regular Solexa sequencing? Goodluck Illumina/Solexa 1 08-19-2009 02:12 AM
A question about the small RNA sequencing data satp Illumina/Solexa 8 11-11-2008 12:29 AM

Reply
 
Thread Tools
Old 02-05-2009, 10:51 PM   #1
satp
Member
 
Location: GuangZhou

Join Date: Jun 2008
Posts: 13
Default How to trim the adaptor sequence from the solexa small RNA sequencing data?

Hi,

I have downloaded some solexa small RNA sequencing data from NCBI GEO(like this: http://www.ncbi.nlm.nih.gov/geo/quer...?acc=GSM273725). I want to trim the adaptor sequence from these sequences first, could anyone tell me which software can be used to do this work in personal computer effectively?

Leo
satp is offline   Reply With Quote
Old 02-06-2009, 01:27 AM   #2
regyre
Member
 
Location: Umeň, Sweden

Join Date: Jan 2009
Posts: 11
Default bioconductor Shortread package

Hi,

If you're familiar with R and bioconductor, you can use some of the functionality from the ShortRead package : http://www.bioconductor.org/packages...ShortRead.html. Actually you should rather use R 2.9.0 (the development version) to get the latest version of the ShortRead package which has much more capabilities implemented.

There's been some emails with some code samples posted in this mailing list: https://stat.ethz.ch/mailman/listinf...sig-sequencing back in December-January.

Hope this helps,

N.
regyre is offline   Reply With Quote
Old 02-06-2009, 05:01 AM   #3
satp
Member
 
Location: GuangZhou

Join Date: Jun 2008
Posts: 13
Default

Thanks for replying. But I don't famailiar with R and bioconductor and want to learn it in the future.

I wonder whether there is any software that is specifically developed for trimming the adaptor of deep-sequencing data.
satp is offline   Reply With Quote
Old 02-06-2009, 06:01 AM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,172
Default

If you are doing this as part of an alignment pipeline you could use Novoalign from Novocraft. (Check the sticky thread of software packages for a link.) I will trim the adapter on the fly. It's default screening sequence is the appropriate adapter for Illumina small RNA data but you can enter your own sequence as well.

I have also used the program fuzznuc which is part of the EMBOSS package. You provide a short sequence to screen for, which may include ambiguous codes and allowances for mismatches. The program does not trim the read, just outputs the locations where a match is found; you could then use this information with a perl script to actually trim the reads.
kmcarr is offline   Reply With Quote
Old 02-06-2009, 07:32 AM   #5
satp
Member
 
Location: GuangZhou

Join Date: Jun 2008
Posts: 13
Default

Thanks for your advices. It seems that Novoalign can't be run in PC with small ram(my PC only has 2G ram). I haven't used Novoalign yet, could this software just trim the adaptor without aligning the sequence to the genome?

Just now I found that Shijun et.al.(http://www.ncbi.nlm.nih.gov/pubmed/18342361) suggested useing vectorstrip of EMBOSS package to trim the adaptor in his paper. I will try the fuzznuc and vectorstrip now.

Whether any other adaptor-trimming software exist?
satp is offline   Reply With Quote
Old 04-16-2009, 01:05 PM   #6
demis001
Member
 
Location: USA

Join Date: Apr 2009
Posts: 10
Default

I had the same problem six month ago. The easiest and fastest way is to write a script and remove and filter garbage reads from your data. To do that you need "adapter sequence information" from the guy who sequence your data.

You need the nc sequence of 3' and 5 prime adapter. Then you read each line and look for the adaptor using RegEx.

The most annoying part is, you will only find 1-2 million read out of 10 to 15 million containing either part of 3' or 5 pri adapter. A large protion of the reads does not contain adapter.

DD
Bioinformatics Research Analyst
demis001 is offline   Reply With Quote
Old 10-07-2010, 12:06 AM   #7
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

Any one has tried trimming both 3' and 5' end adapter sequnces? Because we have a situation to find adapters in both 3' and 5' solexa reads.

I tried FASTX clipper, but it doesnot allow me to allow mismatches in the adapter sequence. Any better solutions?

Code:
$ fastx_clipper -h
	usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]
Thanks.
seq_GA is offline   Reply With Quote
Old 10-07-2010, 01:49 AM   #8
francois.sabot
Member
 
Location: France

Join Date: Dec 2009
Posts: 41
Default

There is also cutadapt, allowing a % of distance..
in fastx_clipper, you can specify the -M value lower than your adapter size, for the bording mutation. Just remind that fastx_clipper remove one by one the adapter. Launch them in series using the pipe.
__________________
Francois Sabot, PhD

Be realistic. Demand the Impossible.
www.wikiposon.org
francois.sabot is offline   Reply With Quote
Old 10-07-2010, 06:13 PM   #9
satp
Member
 
Location: GuangZhou

Join Date: Jun 2008
Posts: 13
Default

Quote:
Originally Posted by seq_GA View Post
Any one has tried trimming both 3' and 5' end adapter sequnces? Because we have a situation to find adapters in both 3' and 5' solexa reads.

I tried FASTX clipper, but it doesnot allow me to allow mismatches in the adapter sequence. Any better solutions?

Code:
$ fastx_clipper -h
	usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]
Thanks.
You can try vectorstrip in EMBOSS. It can trim both 3' and 5' end adapter sequences and allow mismatches in the adapter sequences.
satp is offline   Reply With Quote
Old 11-01-2010, 02:34 PM   #10
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default where can I download vectorstrip?

Quote:
Originally Posted by satp View Post
You can try vectorstrip in EMBOSS. It can trim both 3' and 5' end adapter sequences and allow mismatches in the adapter sequences.
Hi Satp,

I want to download vectorstrip so I can apply it to the Gigabytes of sequence data but I can only find websites with a graphical interface to the program, which is too slow and time-consuming for my large number of sequences.

do you know if I can download the program and execute it on the commandline?

Thank YOU so much.
thinkRNA is offline   Reply With Quote
Old 11-01-2010, 03:47 PM   #11
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Never mind, I figured out that I need to install all of Emboss!
thinkRNA is offline   Reply With Quote
Old 11-17-2010, 02:08 PM   #12
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Just in case anyone else stumbles over this thread, I'd like to repeat what francois said and point people to cutadapt (of which I'm the author). It does find adapters in the 5' and 3' ends of reads and allows mismatches.
mmartin is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO