View Single Post
Old 09-19-2015, 12:21 PM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

The data are standard MiSeq output, PE-300, with reads one and two interleaved. As you note, the data have already been demultiplexed (indicated by the index 'ACATCTTGACG' in the read identifier). The phiX controls lack indices and will not be present in demultiplexed data.

The forward primer largely matches your expectation (5bp linker + GTGCCAGCMGCCGCGGTAA at the start of read one, although there's a penultimate C instead of A in the second example). However, the reverse primer sequence is not detectable in either example of read two. The caveat is that read quality is very low and the nucleotide bias is nearly 100% C/T, so it's hard to say for sure. It would be useful to check with high-quality examples of read two.

I would recommend evaluating the data with FASTQ for quality metrics. You can use BBMap's BBDuk.sh command to trim by quality, length, or sequence string. By renaming, I assume you mean adding read groups, which can be done using Picard.
HESmith is offline   Reply With Quote