SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Filtering out SNPs based on Genotype Quality (GQ) MurielGB Bioinformatics 2 02-07-2014 12:11 AM
Filtering VCF variants based on sequencing coverage elfuser Bioinformatics 0 02-19-2013 08:59 PM
Filtering out Illumina primer sequence kevpar General 1 05-09-2011 08:45 AM
Filtering Illumina GAIIx reads using fastx rubi Bioinformatics 8 05-02-2011 04:54 AM

Reply
 
Thread Tools
Old 12-30-2014, 02:23 PM   #1
tonybert
Member
 
Location: seattle

Join Date: Aug 2012
Posts: 38
Default Illumina-Tag sequencing, filtering homopolymers/entropy based filtering

Hi all, I have a general question to anyone that has been involved in tag-sequencing of functional genes in environmental samples. I use a data processing pipeline based on Usearch (Rob Edgar's methods) and QIIME to analyze functional genes related to the nitrogen cycle in soils and marine environments . I am using usearch, specifically the fastq_stats option, to assess quality values of my sequences over read-length. After visusal inspection of read-length vs. Q value, I am then trimming the sequences to a fixed position, and using QIIME to filter reads using split_libraries_fastq.py with these quality options (-q 19 -r 3 -p 0.75 -n 0). I am using QIIME rather than usearch exclusively because I like to then split the cleaned file with split_fasta_on_sample_ids.py (i have lots of samples, some from different environments, that I would like to analyze independently). Anyway, I have found that for some target genes of lower abundance organisms, there are often a substantial proportion of sequences still in the cleaned files with homo-polymer runs of AAAAAAA, or something similar to this. After doing some investigation, I found that there are ways to eliminate these sequenes using DUST filters, such as those found in the PrinSeq package. However, I am wondering if anyone knows of methods, scripts, or filter options within QIIME or usearch that will remove these sequences. The reason I am concerned about these files is obvious, I hope, and a substantial amount of these reads ends up contributing to my OTU files, yet display no hits to reference gene databases of my targets or even GenBank!

So, overall, does anyone know if entropy based filters exist in QIIME or usearch for elimination of low-complexity sequences.

Hope my question makes sense!

Thanks,

-Tony
tonybert is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO