![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tophat error: Error: qual length (113) differs from seq length (101) for fastq record | JenBarb | RNA Sequencing | 1 | 10-20-2016 10:07 AM |
random subset paired-end fastq | dnusol | Bioinformatics | 15 | 04-17-2016 03:36 AM |
Extract subset of Fastq sequences based on a list of IDs | pepperoni | Bioinformatics | 36 | 05-06-2013 02:38 AM |
Extract unaligned reads (Tophat) from FastQ | Uwe Appelt | Bioinformatics | 5 | 08-07-2012 05:33 AM |
extract subset (mapped reads) from csfasta and .qual files | KevinLam | SOLiD | 1 | 01-18-2010 01:38 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Madrid Join Date: May 2013
Posts: 1
|
![]()
Hi,
Does anyone has a script in Perl to extract a subset of fastq sequences based on length sequence? thanks very much!! ![]() |
![]() |
![]() |
![]() |
#2 |
Member
Location: san antonio Join Date: Jul 2011
Posts: 32
|
![]()
Hi,
I've written a python code which could do the same job for you. unzip the gz files Input.fastq.gz Filter_fastq_by_Sequence_length.py.gz The input.fastq file has 50 sequence reads which are of varying length from 22 bp, 33bp, 36bp and 41 bp... This is just a model Execute the following code in command line: for help python Filter_fastq_by_Sequence_length.py -h Code: python Filter_fastq_by_Sequence_length.py -i Input.fastq -l 22 -o Output.fastq Once is code is executed successfully, The Output.fastq file created will have 2 sequences reads of 22 bp each Try to excute length - 33, 36, 41 and 0 to understand how the program works. Then, You could try your input file on this code and change the length. It should hopefully work. Let me know how it goes and in case you need any help. -- Thanks |
![]() |
![]() |
![]() |
#3 |
Member
Location: Charlottesville, VA Join Date: Apr 2010
Posts: 34
|
![]()
Hi,
Thanks for your Python script. However, when I was trying to run it in my Mac (OSX) I got the following error message: d-128-54-196:PythonApps yb8d$ python Filter_fastq_by_Sequence_length.py -i Input.fastq -l 22 Output.fastq Using Following inputs Input file is Input.fastq Seq_length is 22 Output file is Filtering in Progress...... Traceback (most recent call last): File "Filter_fastq_by_Sequence_length.py", line 58, in <module> filter_by_len(param[0],param[1],param[2]) File "Filter_fastq_by_Sequence_length.py", line 6, in filter_by_len f=open(ofile,'w') IOError: [Errno 2] No such file or directory: '' Can you shed some light as what caused this error? Best Wing |
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You need an "-o" in front of "Output.fastq":
Code:
python Filter_fastq_by_Sequence_length.py -i Input.fastq -l 22 -o Output.fastq |
![]() |
![]() |
![]() |
#5 | |
Member
Location: san antonio Join Date: Jul 2011
Posts: 32
|
![]()
Hi Wing,
Devon's solution for the problem is right. Thanks. The script errored out, as it was not able to recognize the outfile file. Quote:
-- Muthu |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Charlottesville, VA Join Date: Apr 2010
Posts: 34
|
![]()
Hi Muthu et al.,
Thanks much for the quick reply for picking up my stupid omission of a main switch. After the fix, I am happy to report that everything works beautifully. Wing |
![]() |
![]() |
![]() |
#7 |
Member
Location: USA Join Date: Nov 2012
Posts: 51
|
![]()
Hii every one
I have two fastq files of raw reads from Ion_PGM.. I just want to know that is it possible to get the stat of how many Q20 reads it has?? and is it possible to extract those reads in fastq format?? Can i extract the reads of 100bases using the following script?? Thanx for any help in advance Regards Chayan |
![]() |
![]() |
![]() |
#8 |
Member
Location: san antonio Join Date: Jul 2011
Posts: 32
|
![]()
Chayan,
The script only allows you to extract Fastq sequences by length and not by quality. Hopefully you would have figures that out by now. sorry for the late reply. Thanks -- Muthu |
![]() |
![]() |
![]() |
#9 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
BBTools has a script called reformat.sh which will allow extraction of reads with a minimum average quality of at least X (maq=X) or minimum read length of at least Y (minlength=Y). It can also write a histogram of the read qualities (aqhist=) using linear and logarithmic averages. Requires Java.
reformat.sh in=reads.fq out=filtered.fq maq=20 minlength=100 aqhist=hist.txt |
![]() |
![]() |
![]() |
#10 |
Member
Location: USA Join Date: Nov 2012
Posts: 51
|
![]()
Okk thanks to both of you, additionally is there a tool or utility which allow k-mer based read extraction?
|
![]() |
![]() |
![]() |
#11 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Depends on exactly what you have in mind, but I wrote a tool (BBDuk) that will filter reads based on the presence of specific kmers. For example:
bbduk.sh -Xmx1g in=reads.fq out=unmatched.fq outm=matched.fq ref=kmers.fa k=31 That will split the file reads.fq into two output files, one containing reads with kmers matching the reference, and one with the rest of the reads, using a kmer length of 31. |
![]() |
![]() |
![]() |
#12 |
Member
Location: USA Join Date: Nov 2012
Posts: 51
|
![]()
Okk i understand..but i want a different utility..i have a metagenomic read files..it is more likely that within that file reads coming from a particular organism will have a similar kind of k-mer frequency, suppose tetramer and based on this criteria i want to extract the read subsets tnd hen perform the asssembly..unfortunately here i cant use any direc reference as i am lookingt for the novel lineages..am i now clear to you??
|
![]() |
![]() |
![]() |
#13 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Ahh, you want a binning tool. If you make a reference containing organisms that are somewhat closely related - say, at least 70% identity - you can use BBSplit. If not, well... there are various binning tools that use kmer frequency, or coverage, or both. But they don't tend to work well on short reads. I don't know of a single tool that will do a good job of solving this problem; I think it's generally addressed through a complicated pipeline involving a lot of labor.
|
![]() |
![]() |
![]() |
Tags |
iontorrent, length, perl, script, sequence |
Thread Tools | |
|
|