I am wondering if anyone knows of an easy way (or tool) to take paired Illumina fastq data and isolate only those read pairs that contain a specific sequence (say a 20 or 30mer) as the first 20 (or 30) nucleotides of one of the reads.
I will have a 2x150 MiSeq library where some percentage of the total reads (the percentage that I want) will have one read in the pair that starts with a specific nucleotide sequence. Ideally I would like to collect all read pairs where one read in the pair starts with the specific sequence as well as have that sequence trimmed from the read containing it before I map the pairs to the genome.
Thanks.
I will have a 2x150 MiSeq library where some percentage of the total reads (the percentage that I want) will have one read in the pair that starts with a specific nucleotide sequence. Ideally I would like to collect all read pairs where one read in the pair starts with the specific sequence as well as have that sequence trimmed from the read containing it before I map the pairs to the genome.
Thanks.
Comment