SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA Soft Clipping Bio.X2Y Bioinformatics 11 03-09-2015 09:08 AM
Adapters sequence Chartiza Ion Torrent 7 02-01-2012 08:01 AM
adapters avatar10 General 2 12-02-2010 05:04 AM
controlling clipping behavior in bwa 'aln' and 'bwtsw' aligners jnfass Bioinformatics 1 01-10-2010 01:49 PM
what's clipping point? jordi 454 Pyrosequencing 1 05-26-2009 01:47 AM

Reply
 
Thread Tools
Old 05-26-2011, 11:53 AM   #1
nsl
Member
 
Location: CA

Join Date: Jan 2011
Posts: 28
Default Clipping adapters

Hello readers ,

I will start with the caveat that I am a newbie fumbling along the process of teaching myself how to analyse my Tru-seq data.

I tried excising the below mentioned adapter and the output has dwindled to ~20% of the original reads. This seems erroneous, I did not allow for any wiggle room with mis-matches. Is this too stringent? Or is there a wholesale error in the way i went about things? What happens to all the non-clipped reads?

Thanks


Info: Clipping Adapter: CGATGT
Min. Length: 15
Non-Clipped reads - discarded.
Input: 36838900 reads.
Output: 5341173 reads.
discarded 1539984 too-short reads.
discarded 127004 adapter-only reads.
discarded 29709745 non-clipped reads.
discarded 120994 N reads.
nsl is offline   Reply With Quote
Old 05-26-2011, 12:05 PM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

It looks to me that most of your discarded reads are "non-clipped." Why are you discarding these? If no adapter sequence is in the the read, then I am not sure why you would want to discard this.

What program are you using for the adapter trimming and what are the settings?

The truseq adapters are actually 63 bp long and the barcodes are somewhere in the middle of these adapters, if I remember correctly. Did you trim using the full 63 bp adapter, or just the 6bp sequence you listed? That could probably throw it off.

Last edited by chadn737; 05-26-2011 at 12:16 PM.
chadn737 is offline   Reply With Quote
Old 05-26-2011, 12:21 PM   #3
nsl
Member
 
Location: CA

Join Date: Jan 2011
Posts: 28
Default

dear chadn737,

It is clear to me that now there is a "level zero" and I am at it ....sigh.

I used the the clip tool in Galaxy. I left all the settings pretty much on default except for changing the clip sequence to "custom".

I didn't know the adapters were 63bps. I got the 6 bp sequence from a page of the illumina manual posted by a blogger in this forum. We outsourced the library prep and they used Kit A with adaper indexes 2,4,5,6,7 & 12 and these I assumed were 6bps.

Forgive my ignorance, what would it mean to have reads without the adapters?

Thanks
nsl is offline   Reply With Quote
Old 05-26-2011, 12:36 PM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

To give an example of what I meant, we have a core facility that does our sequencing. They prep the libraries, sequence, and then send us a fastq file of the reads. Part of their pipeline is to trim the adapter sequence from the 5' end so that in theory, all the reads in my fastq file should be adapter free. Now, I know from experience that this is not the case, that with longer read lengths a certain percentage of my reads have adapter sequence at the 3' end that I then have to trim.

So my advice would be to find out how your data was processed before you got the reads. In particular, ask the people who did the sequencing whether or not they trimmed the adapters prior to giving you the reads. If they already did adapter trimming, then those "non-clipped" reads are probably free of adapter sequence and can be directly aligned. The fact that so many reads are not trimmed, suggest to me this is the case.

I am not that familiar with Galaxy, but I think they use the fastx toolkit, someone correct me if I am wrong. I use a tool called Cutadapt for trimming contaminating adapter sequence.

Also, see the attached pdf. The full truseq adapters are in there. The barcode you listed appears to be part of the index 2 adapter.
Attached Files
File Type: pdf Illumina Adapter Sequence Letter 2011-01-11.pdf (105.8 KB, 174 views)
chadn737 is offline   Reply With Quote
Old 05-26-2011, 01:04 PM   #5
nsl
Member
 
Location: CA

Join Date: Jan 2011
Posts: 28
Default

dear chadn737,

Thank you for the pdf and the explanation. I was told by the sequencing facility that they did not remove the adapters. I am puzzled b/c your explanation seems logical.

Unfortunately the folks at the sequencing center have only only caused more confusion in me- it feels like a case of the blind leading the blind.

Thanks again for your advice!
nsl is offline   Reply With Quote
Old 05-26-2011, 01:05 PM   #6
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Ok, then you do need to trim for the adapters, but use the entire 63bp sequence listed in that file. That should improve the results.
chadn737 is offline   Reply With Quote
Old 05-27-2011, 02:08 PM   #7
nsl
Member
 
Location: CA

Join Date: Jan 2011
Posts: 28
Default

Dear chadn737,


Quote:
Originally Posted by chadn737 View Post
...Now, I know from experience that this is not the case, that with longer read lengths a certain percentage of my reads have adapter sequence at the 3' end that I then have to trim....
what length of reads and roughly what percentage were you referring to here?
also, how do you know you have to trim them?

thanks
nsl is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO