Hi all,
I'm trying to trim 125bp PE illumina reads with cutadapt to remove reads with any adapter sequence presence and then keep PE reeds which have no adapter or barcode readthrough.
I've run cutadapt with -a AGATCGGAAGAGC -A AGATCGGAAGAGC -m 125 which works fine for read 1's.
Any adapter readthrough in read 2 will contain 1 of 24 sample barcodes prior to the common adapter being read and given that these are variable and the reads are to be used for SNP analysis, I need to remove these as well.
The first thought was to use -A NNNNNNNNAGATCGGAAGAGC, but that removes all read 2's in the data set. I tried running – A with reverse complement of all 24 barcodes prior to the common adapter (ie there were 24 -A's) but this removed 34% of data, so I'm not sure if that was correct. I finally tried incorporating the reverse complement of the restriction enzyme used which would be read prior to both barcode and common adapter and that removed around 16% of reads.
Could anyone advise on the best approach for this kind of analysis or the correct way to do it as I dont feel I am there yet. Cutadapt manual talks about using NNNNNNNN (we have an 8 base barcode) if the barcode is embedded within the adapter, but ours is read before and thus as tried above, -A NNNNNNNNAGATCGGAAGAGC, removes all reads. I tried embedding NNNNNNNN to represent variable barcode between restriction site and adapter, but am not sure if this is correct.
Thanks very much for any help.
I'm trying to trim 125bp PE illumina reads with cutadapt to remove reads with any adapter sequence presence and then keep PE reeds which have no adapter or barcode readthrough.
I've run cutadapt with -a AGATCGGAAGAGC -A AGATCGGAAGAGC -m 125 which works fine for read 1's.
Any adapter readthrough in read 2 will contain 1 of 24 sample barcodes prior to the common adapter being read and given that these are variable and the reads are to be used for SNP analysis, I need to remove these as well.
The first thought was to use -A NNNNNNNNAGATCGGAAGAGC, but that removes all read 2's in the data set. I tried running – A with reverse complement of all 24 barcodes prior to the common adapter (ie there were 24 -A's) but this removed 34% of data, so I'm not sure if that was correct. I finally tried incorporating the reverse complement of the restriction enzyme used which would be read prior to both barcode and common adapter and that removed around 16% of reads.
Could anyone advise on the best approach for this kind of analysis or the correct way to do it as I dont feel I am there yet. Cutadapt manual talks about using NNNNNNNN (we have an 8 base barcode) if the barcode is embedded within the adapter, but ours is read before and thus as tried above, -A NNNNNNNNAGATCGGAAGAGC, removes all reads. I tried embedding NNNNNNNN to represent variable barcode between restriction site and adapter, but am not sure if this is correct.
Thanks very much for any help.
Comment