SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
NA gene expression values in TCGA RNA_Seq Level3 data mrfox Bioinformatics 0 04-14-2015 11:37 AM
RNA_seq on (non-model) less well annotated genomes Drerio General 5 06-10-2012 11:46 PM
EdgeR compare 3 samples (RNA_Seq) Palgrave Bioinformatics 1 05-27-2012 11:39 PM
Sample quality for RNA_SEQ cub103 Sample Prep / Library Generation 3 08-02-2010 06:26 AM
% matching problem- RNA_Seq sanush RNA Sequencing 1 11-29-2009 09:58 PM

Reply
 
Thread Tools
Old 06-18-2015, 06:48 AM   #1
umamayil
Junior Member
 
Location: newark, NJ

Join Date: Jun 2015
Posts: 3
Default RNA_seq read separation help

Hi to everyone,

I am new member to this forum. I have 100bp single read illumina fastq files. When we looked at the reads we saw some interesting sequences. We want to separate those reads and write it in separate fastq file for analysis. For example we want to separate "ATTTTTTTTAGAAAAAAAA" containing reads (we saw something around 2million reads out of 9million reads). Can you please give me guidance how to do it. IF there is program or any unix commands will be helpful. I am not a unix person. please give me commands to execute.

Thanks a lot.
Mayil
umamayil is offline   Reply With Quote
Old 06-18-2015, 07:06 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

bbduk.sh from BBMap package can do this. If that sequence is at the end of the reads then,

Code:
$ bbduk.sh -Xmx1g in=reads.fq outm=matched.fq outu=unmatched.fq restrictleft=19 k=19 literal=ATTTTTTTTAGAAAAAAAA
In this case, all reads starting with "ATTTTTTTTAGAAAAAAAA" will end up in "matched.fq" and all other reads will end up in "unmatched.fq". Specifically, the command means "look for 19-mers in the leftmost 19 bp of the read", which will require an exact prefix match, though you can relax that if you want.

So you could bin all the reads with your known sequence, then look at the remaining reads to see what they have in common. You can do the same thing with the tail of the read using "restrictright" instead, though you can't use both restrictions at the same time.
GenoMax is offline   Reply With Quote
Old 06-18-2015, 08:18 AM   #3
umamayil
Junior Member
 
Location: newark, NJ

Join Date: Jun 2015
Posts: 3
Default

Hi,
Thanks. The sequence will be either in the middle or in the end. How to separate if the interested sequence is in the middle.

Thanks again
Mayil
umamayil is offline   Reply With Quote
Old 06-18-2015, 08:25 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Just remove the "restrictleft/right" directive and the entire sequence will be searched.
GenoMax is offline   Reply With Quote
Old 06-19-2015, 05:29 AM   #5
umamayil
Junior Member
 
Location: newark, NJ

Join Date: Jun 2015
Posts: 3
Default

Hi,

Thanks a lot. I will try the commands you have given to me.

Thanks again and have a nice weekend.

Mayil
umamayil is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO