SEQanswers (
-   Bioinformatics (
-   -   RNA_seq read separation help (

umamayil 06-18-2015 07:48 AM

RNA_seq read separation help
Hi to everyone,

I am new member to this forum. I have 100bp single read illumina fastq files. When we looked at the reads we saw some interesting sequences. We want to separate those reads and write it in separate fastq file for analysis. For example we want to separate "ATTTTTTTTAGAAAAAAAA" containing reads (we saw something around 2million reads out of 9million reads). Can you please give me guidance how to do it. IF there is program or any unix commands will be helpful. I am not a unix person. please give me commands to execute.

Thanks a lot.

GenoMax 06-18-2015 08:06 AM from BBMap package can do this. If that sequence is at the end of the reads then,


$ -Xmx1g in=reads.fq outm=matched.fq outu=unmatched.fq restrictleft=19 k=19 literal=ATTTTTTTTAGAAAAAAAA
In this case, all reads starting with "ATTTTTTTTAGAAAAAAAA" will end up in "matched.fq" and all other reads will end up in "unmatched.fq". Specifically, the command means "look for 19-mers in the leftmost 19 bp of the read", which will require an exact prefix match, though you can relax that if you want.

So you could bin all the reads with your known sequence, then look at the remaining reads to see what they have in common. You can do the same thing with the tail of the read using "restrictright" instead, though you can't use both restrictions at the same time.

umamayil 06-18-2015 09:18 AM

Thanks. The sequence will be either in the middle or in the end. How to separate if the interested sequence is in the middle.

Thanks again

GenoMax 06-18-2015 09:25 AM

Just remove the "restrictleft/right" directive and the entire sequence will be searched.

umamayil 06-19-2015 06:29 AM


Thanks a lot. I will try the commands you have given to me.

Thanks again and have a nice weekend.


All times are GMT -8. The time now is 09:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.