Seqanswers Leaderboard Ad

**john_mu** · 06-08-2010, 08:53 AM

If you just have the raw reads, you can use the "uniq" command in Linux to extract the unique reads (after sorting).

http://en.wikipedia.org/wiki/Uniq

**gzentner** · 06-08-2010, 03:27 PM

Thanks John!

That sounds like it should be a useful command. I was just wondering if you could give me a little more detail.

I have the ChIP-seq data as FASTQ files which I align using bowtie. Would I use the uniq command on the FASTQ prior file to alignment to generate another FASTQ containing only unique reads?

i.e., prior to alignment, run uniq -u on the FASTQ?

Thanks!

**john_mu** · 06-08-2010, 03:35 PM

No worries, but the method I suggested is a bit of a hack... It will require you to fiddle with the data a bit.

Firstly, do you need to preserve the read-quality information? If so then it is probably best to write your own python or perl script to do it. I'm pretty sure there are existing tools to do this though... I just can't re-call off the top of my head.

-----

The method I suggested is to firstly extract the raw-reads from the FASTQ file by using
instructions here

http://www-stat.stanford.edu/~kinfai/SpliceMap/preprocess.html#fastq

Then sort the reads with http://en.wikipedia.org/wiki/Sort_(Unix)

sort input_file > output_file

Finally use "uniq"

uniq -u input_file > output_file

After you do this, you can align your reads using bowtie with the "-r" option for raw reads.

**lifeng.tian** · 06-10-2010, 09:48 PM

You can try fastx_collapser from http://hannonlab.cshl.edu/fastx_toolkit/

**sridharacharya** · 09-17-2010, 07:37 AM

Re: Aligning only unique reads in Bowtie

I have few questions regarding the best practices that are adopted, in dealing with multiple alignments from a single read and presence of identical reads in the data (from Biology stand point) :

I am curious, how important it is to deal with identical reads.
Having many identical reads in data means something wrong with the
experiment?

What could be considered as max. cutoff value for the number of identical reads in the data, so as to not consider those reads?

In the other case of a single read aligning at multiple places in a genome, what should be the cutoff value for number of multiple alignments, so as to not consider those reads?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Aligning only unique reads in Bowtie

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News