Seqanswers Leaderboard Ad

**vivek_** · 10-23-2012, 06:40 AM

Just a guess without looking at your data:

grep "^@" *.fastq | awk '{gsub(/\/.*$/,""); print}' | sort | uniq -c | awk '{if($1 == 2) print $2}' > headers

Will print the unique headers to a file.

for seq in `cat headers`;do grep -A 3 $seq *.fastq;done

Will print the corresponding sequences.

I think the unix command might error out for large files.

**micans** · 10-24-2012, 01:01 AM

efficient command line tool for counting duplicated reads

tally will do what you ask. It compresses sequences in memory (approximately 3.5-fold, depending on read length), so it can handle fairly large files. It comes with reaper, and can be downloaded here: http://www.ebi.ac.uk/~stijn/reaper/s...per-12-205.tgz.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Count unique reads in a FASTQ file

Comment

Comment

Latest Articles

ad_right_rmr

News