Seqanswers Leaderboard Ad

**husamia** · 07-08-2011, 09:03 AM

I think I might have an advice and question, I have similar situation in my case its not straight forward and I am not sure about your case.
If the adapter is consistently placed for example XXXX is always the same place with reference to all reads then a script would make sense, provided a fasta file with known adapters. see (http://www.bioperl.org/wiki/Removing...ncing_adapters) I use tool provided with commencial software from softgenetics to remove adapters and it can automatically detect them or provide file. However, if your describing concatenation which is random then XXXX is entirely dependent on your alignment. There is no way you can remove them without doing alignment.
Also I thought contatenation applies to ends of the reads i.e.

>1
XXXXX_AdpaterA
>or 2
Adapter2_XXXXX

This has been my case with miRNA seq at least with illumina. You may consult the sequencing platform documentation.

**Giorgio C** · 07-10-2011, 06:47 AM

Thank yo for your reply. Yes the problem is that the adaptors and the miRNAs don't occure always in a regular way. All the reads are different from the others. I may have only one miRNA in a read between two adaptors, or Two miRNAs or Three until a max of four miRNAs in one read. But not always these miRNAs are perfectly closed by their adaptors so i can't use the most frequently scripts to extract adaptors and obtai only the miRNAs. If is it suitable for this situation can you explain me the way of the alignment please, telling a little pipeline ??? Thanks a lot.

Cheers,

Giorgio

**chadn737** · 07-10-2011, 12:25 PM

I have used a tool called cutadapt for trimming adapter sequences. It works really well when you have partial adapter sequence in your reads:

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/cutadapt/

**husamia** · 07-11-2011, 09:16 AM

Originally posted by Giorgio C View Post

If is it suitable for this situation can you explain me the way of the alignment please, telling a little pipeline ???

it seems that your read lengths are variable, i.e. you may be using 454 instead of illumina. In my case my read length is same for all reads. It makes it easier to generalize when taking read length into account for all reads since they are the same length for script writing which requires some assumptions.
In my case the read length of all reads is 40 and my adapter is 15 so my unknown mirs is 25bps. You see in my case I doubt that I will be capturing more than one read or adapter so 1adapter=1read for all reads. I simply trimmed my reads based on known adapter sequence. So I ended up with reads ranging from 25-40. I also did quality trimming based on base call but read length is not below 25bps. Then I did alginment to whole genome for reads that were trimmed and reads that were not trimmed I got ~40% of non-trimmed reads and ~90% of trimmed reads to whole genome. Thats my general pipeline. I am sure there are better ways than this but I have limited set of tools. If you can write script I hope this may help you make basic assumptions which is needed to try removing reads based on certain criteria and see what you get.

**Giorgio C** · 07-12-2011, 04:03 AM

Thanks for your reply. Infact in my case these are 454 reads. And as you said this pipeline is not aplicable. I try other ways.

Cheers,
Giorgio

**gringer** · 07-12-2011, 07:58 AM

Perl script?

a quick perl script that just clips out adaptor sequences may work:

Code:

perl -pe 's/(AdaptorA|AdaptorB)//g' <fileName>

For example:

Code:

$ perl -pe 's/(AdaptorA|AdaptorB)//g' file.txt 
>1
__miRNA_
>2
XXXXXX-_ miRNA__XXXXXX__miRNA_

**gringer** · 07-12-2011, 08:02 AM

Oh, wait, you wanted the sequence between the adaptors only. That's a bit more tricky, but might still come under the 'perl can do it easy' umbrella.

Code:

$ perl -pe 's/.*?((AdaptorA|AdaptorB)(.*?)(AdaptorA|AdaptorB))/$3/g' file.txt 
>1
__miRNA_
>2
_ miRNA__miRNA_

**Giorgio C** · 07-13-2011, 12:10 AM

Thanks you very much, it's a very useful script !!!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

How to extract sequences between adaptors ?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News