SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
duplicate reads in Illumina short, single end reads of RNAseq data inbarpl Bioinformatics 4 05-22-2012 08:36 AM
paired-end RNA-Seq of short transcripts fbarreto Illumina/Solexa 3 11-23-2011 11:43 AM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
PubMed: Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads Newsbot! Literature Watch 0 05-05-2011 11:40 PM
Paired end Short read data SS1234 Bioinformatics 6 06-09-2010 01:16 PM

Reply
 
Thread Tools
Old 07-11-2013, 08:00 AM   #1
jakeenk
Junior Member
 
Location: canada

Join Date: Aug 2010
Posts: 3
Default Removing short reads from paired-end fastqs

Sometimes trimming adapters from two paired read files (with, say, cutadapt) results in unequal trimming for the members of any given pair. Therefore if you subsequently remove short inserts from both readfiles independently, it can throw the pairs out of sync as soon as it removes one but not the other member of a pair.

The following script "nixshorts_PE" will remove a read pair from two paired-end read fastqs when at least one of the two members are below a certain length. The same method can be used for removing short reads from a single-end file, with some adjustments. Just thought some of you might find this handy.

Please post improvements to this script if you think of them. Thanks!

#!/bin/bash

# This removes reads of a below a certain length from paired read files in fastq format (e.g., R1 and R2 from the same library)

# Usage: $ bash nixshorts_PE [input fastqR1] [input fastqR2] [minimum read length to keep]

# PROCESS:

#1. Start with inputs
R1fq=$1
R2fq=$2
minlen=$3

#2. Find all entries with read length less than minimum length and print line numbers, for both R1 and R2
awk -v min=$minlen '{if(NR%4==2) if(length($0)<min) print NR"\n"NR-1"\n"NR+1"\n"NR+2}' $R1fq > temp.lines1
awk -v min=$minlen '{if(NR%4==2) if(length($0)<min) print NR"\n"NR-1"\n"NR+1"\n"NR+2}' $R2fq >> temp.lines1

#3. Combine both line files into one, sort them numerically, and collapse redundant entries
sort -n temp.lines1 | uniq > temp.lines
rm temp.lines1

#4. Remove the line numbers recorded in "lines" from both fastqs
awk 'NR==FNR{l[$0];next;} !(FNR in l)' temp.lines $R1fq > $R1fq.$minlen
awk 'NR==FNR{l[$0];next;} !(FNR in l)' temp.lines $R2fq > $R2fq.$minlen
rm temp.lines

#5. Conclude
echo "Pairs shorter than $minlen bases removed from $R1fq and $R2fq"
jakeenk is offline   Reply With Quote
Reply

Tags
paired-end, paired-end sync, short read removal, trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO