SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reads of Insert - PacBio, Forward or Reverse? Fad2012 Pacific Biosciences 5 01-27-2015 06:09 AM
Why Sanger sequencing can have the forward and reverse reads? arkilis Bioinformatics 6 10-08-2013 04:41 PM
CAP3 for forward and reverse reads sp24 Bioinformatics 4 07-04-2013 02:52 PM
Forward and reverse reads in NGS tahamasoodi Bioinformatics 11 03-12-2013 09:56 AM
forward and reverse reads with BWA jwhite Bioinformatics 3 02-20-2013 09:41 AM

Reply
 
Thread Tools
Old 02-06-2015, 12:39 PM   #1
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default extract forward and reverse reads?

Hello,
I have forward and reverse reads in a fastq file from Ion Torrent PGM sequencing data and I would like to know if anyone knows a way that I can extract the forward and reverse reads into two separate files?

And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

Thank you,
Jennifer
JenBarb is offline   Reply With Quote
Old 02-06-2015, 12:53 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Jennifer,

You can do that with BBTools:

reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

and

filterbyname.sh in=reads.fq out=filtered.fq names=names.txt


-Brian
Brian Bushnell is offline   Reply With Quote
Old 02-06-2015, 12:54 PM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by JenBarb View Post
And also, does anyone know of a way to extract certain reads from a fastq file given a list of read IDs?

Thank you,
Jennifer
Heng Li's seqtk "subseq" option: https://github.com/lh3/seqtk
GenoMax is offline   Reply With Quote
Old 02-06-2015, 01:01 PM   #4
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Thank you so much for a quick reply! I really appreciate it. I will try it now.
Jennifer
JenBarb is offline   Reply With Quote
Old 02-09-2015, 05:57 AM   #5
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hi Brian,
I am looking for the installation instructions on the page you sent and I can't find them. I also was looking for info about the two scripts that you sent. Is there a documentation page that describes what the scripts do and any argument options they take?

Thank you,
Jennifer
JenBarb is offline   Reply With Quote
Old 02-09-2015, 06:05 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

There is no installation required for BBTools. You just uncompress the file. Then you can run the shell scripts (you may need to add execute permissions depending on what OS you are using). If you run the shell script by itself (e.g. $ reformat.sh) it will print information about all possible command line options.

Here is a thread with information about reformat tool: http://seqanswers.com/forums/showthread.php?t=46174
GenoMax is offline   Reply With Quote
Old 02-09-2015, 06:06 AM   #7
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Thank you!!
JenBarb is offline   Reply With Quote
Old 02-09-2015, 08:01 AM   #8
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hello again,
So I tried the reformat.sh script on my fastq file. I then took each separate output file (fwd and reverse reads) and blasted the reads along a database of interest and am still finding that some reads align in the forward and some align in the reverse direction. My understanding is that the result of this program should have put only forward reads into one file and reverse reads into another file and thus the results of the alignment would be forward only and reverse only given the appropriate file. I am not finding this to be true. Thoughts?
JenBarb is offline   Reply With Quote
Old 02-09-2015, 09:28 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

In the original post you had talked about forward/reverse reads in a simple context (as if they are two reads from the two ends of a fragment).

reformat.sh will not separate reads that align in opposite orientations. It will only separate reads if they were interleaved in a single file (as long as they came from a single fragment).

You will need to parse the output from your alignment program (what program are you using?) to separate reads that align to +/- strands into two files. I am not sure if BBMap can write to separate alignment files based on the strand info.
GenoMax is offline   Reply With Quote
Old 02-09-2015, 09:49 AM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Actually... there is a tool for that, "splitsam.sh", which is not part of the public distribution because I didn't think it would be of use to anyone. I've attached it to this post; just extract it and put it in the folder with the other shellscripts, then run it like this:

splitsam.sh mapped.sam forward.sam reverse.sam

You can also do that with samtools, by filtering on the 0x10 flag bit. In either case, they have to be mapped first, of course - you cannot determine which read goes to which strand from a fastq file.
Attached Files
File Type: gz splitsam.sh.gz (504 Bytes, 7 views)
Brian Bushnell is offline   Reply With Quote
Old 02-09-2015, 09:57 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Ask and ye shall receive

Roll that into BBMap download Brian!
GenoMax is offline   Reply With Quote
Old 02-09-2015, 10:50 AM   #12
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Thank you so much for your help!
JenBarb is offline   Reply With Quote
Old 02-09-2015, 11:49 AM   #13
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by GenoMax View Post
Roll that into BBMap download Brian!
OK... I don't like to release incomplete things so I made it faster, added some features, and then rolled it into the download.

Quote:
Originally Posted by JenBarb View Post
Thank you so much for your help!
You're welcome!
Brian Bushnell is offline   Reply With Quote
Old 02-09-2015, 11:57 AM   #14
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Thank you again, Brian and GenoMax for all of your help.

I now am trying the filterbyname script and it does not seem to be pulling out only those reads that match a particular read id found in my names.txt file. Is there something I am missing?

sh /data/barbj/bbmap/filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt
JenBarb is offline   Reply With Quote
Old 02-09-2015, 11:59 AM   #15
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:

filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t

Sorry for the confusion. I guess that default is kind of odd.

Last edited by Brian Bushnell; 02-09-2015 at 12:02 PM.
Brian Bushnell is offline   Reply With Quote
Old 02-09-2015, 12:35 PM   #16
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Thank you! It worked beautifully.
JenBarb is offline   Reply With Quote
Old 02-24-2015, 08:43 AM   #17
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hello Again,
I thought I would ask you a follow up question since you had a tool that nicely worked for my other issues. Do you have a script that will pull out reads that match a certain sequence with a certain number of mismatches?

I have a sequence of about 18bp that a subset of my reads contain somewhere within the read and I would like to be able to pull them out allowing for 1 or 2 mismatches?

THANKS,
JEN
JenBarb is offline   Reply With Quote
Old 02-24-2015, 10:09 AM   #18
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Jen,

You can use BBDuk for that:

bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq literal=ACGTACGTACGTACGTAC k=18 mm=f hdist=2

Make sure "k" is set to the exact length of the sequence. "hdist" controls the number of substitutions allowed. "outm" gets the reads that match. By default this also looks for the reverse-complement; you can disable that with "rcomp=f".
Brian Bushnell is offline   Reply With Quote
Old 02-24-2015, 10:16 AM   #19
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Wonderful! I will try it. These tools are amazing!
Jen
JenBarb is offline   Reply With Quote
Old 02-25-2015, 11:01 AM   #20
mhkiani
Member
 
Location: Texas

Join Date: Oct 2013
Posts: 12
Default Finding RNA-editing

I have strand specifice RNA-seq data from different samples and I'm interested to find the possible RNA editing between samples, I used CLC workbench to call variants and did the comparison and in the output I have SNP and MNP variation. My question is that how I can filter these variants to find editing sites vs SNP?

Thanks
mhkiani is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO