SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reads of Insert - PacBio, Forward or Reverse? Fad2012 Pacific Biosciences 5 01-27-2015 05:09 AM
Why Sanger sequencing can have the forward and reverse reads? arkilis Bioinformatics 6 10-08-2013 03:41 PM
CAP3 for forward and reverse reads sp24 Bioinformatics 4 07-04-2013 01:52 PM
Forward and reverse reads in NGS tahamasoodi Bioinformatics 11 03-12-2013 08:56 AM
forward and reverse reads with BWA jwhite Bioinformatics 3 02-20-2013 08:41 AM

Reply
 
Thread Tools
Old 02-25-2015, 10:05 AM   #21
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

That doesn't really have anything to do with this thread, so I suggest you post it in a new thread.
Brian Bushnell is offline   Reply With Quote
Old 02-26-2015, 07:23 AM   #22
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hi again Brian,
Apparently you have all of the tools that I need for this problem that I am dealing with in my data. Now I am wondering if you have a tool that will extract a random subset of reads from a fasta file? For example, I have a fasts file of about 50,000 reads. I want to align them to a database that has a limit of only 3000 reads at a time so I would like to pull out randomly, a subset of 3000 reads from my file. I prefer not to pull out the first 3000 using head or the last 3000 using tail. I could write a quick script for this but thought I would first ask you.

Thanks again for all of your help.
Jen
JenBarb is offline   Reply With Quote
Old 02-26-2015, 08:51 AM   #23
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Jen,

You seem to be asking all the right questions!

reformat.sh in=reads.fasta out=sampled.fasta sample=3000

There are various other sampling options too, like a specific number of bases or a specific fraction of the total number of reads, but that's the one you want in this case.

-Brian
Brian Bushnell is offline   Reply With Quote
Old 02-27-2015, 07:03 AM   #24
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Yes another awesome script!
Thank you so much. If we ever get to publication, we will certain cite your tool!
jen
JenBarb is offline   Reply With Quote
Old 02-27-2015, 11:21 AM   #25
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Another question, Brian. Have you ever seen a case where the reformat.sh script did not work properly? I have made 4 subsets of read ids into name files from a large fasta file and I am trying to separate the reads into 4 different fasta files based on my 4 different names file however 3 of these work fine and I get my expected subset of reads but one of them is not working. I can not figure out what is going on with it. I used all of the same steps to generate the 4 of them but for some reason, one subset is not working at all. I then took a couple of reads in that name file and did a grep with my big file as a sanity check and it pulled the reads out just fine. Any idea here? my command line:

filterbyname.sh in=combined_seqs.fa out=subset4.fa names=subset_names.txt include=t overwrite=true

Jen
JenBarb is offline   Reply With Quote
Old 02-27-2015, 01:03 PM   #26
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Jen,

Are you talking about reformat not working or filterbyname not working? Reformat is very well tested, used hundreds of times a day, and I have only heard of one bug in it in the last 5 months, which has been fixed. filterbyname is not used nearly as much, though I still have not encountered a situation in which it failed recently.

What is the format of your names file? Well, specifically, can you give an example of a fasta entry in the fasta file, and a line from the names.txt file, that you expect to match but don't - as well as the console output of the program? Or, if they're small and non-confidential, you can email them both to me and I'll investigate. I suspect it's a formatting issue.
Brian Bushnell is offline   Reply With Quote
Old 03-02-2015, 07:42 AM   #27
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hi Brian,
I was talking about the filterbyname.sh script. However, I just did a sanity check on a subset and it worked. Then I reran it on the full data set and it worked. Maybe I was just too tired on Friday or something and I was missing something somewhere.

At any rate, it worked great and now I am moving along with my project.

Thank you again for your help!
Jen
JenBarb is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO