FastX tool for removing duplicates

archie.chauhan

Junior Member

Join Date: Nov 2011

Posts: 9
- Share
- Tweet
#1

FastX tool for removing duplicates

05-15-2012, 05:40 AM

Hi,
I have gone through various SeqAns posts regarding duplicate removal but could not get desired answer. Since I am a mol biologist new to bioinformatics i have a few queries.
I am having illumina DNA 2x100 paired end reads. FAstQC analysis indicated a large number of duplicates which seem to be correct. Since the dataset is too big I wanted to remove the duplicates. Therefore, i used Galaxy. I first used Fastq groomer followed by FastX collapse for both R1 and R2 reads separately. My plan of action was : to first remove duplicates, filter and trim my seq and finally assemble them using velvet. As far as I know velvet requires shuffling of the paired end reads prior to assembly. Therefore I have few questions wrt my approach:
1) the fastX collapse tool gives its own headers to the seq. It seems that the paired end information is lost. Am I right OR it just that the headers have changed but the inf is still there. If so where is it?
2) I used R1 and R2 reads separately for grooming and FastX collapse analysis. Should i first shuffle my reads using velvet and than use the FastX collapse tool on the shuffled seq OR
3) I should first join the paired end data and then use FastX tool. But in this case how do i do shuffling with velvet?

I would appreciate if someone can answer the queries.

Regards,
Archana
Tags: None

Previous template Next

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad