So while waiting for my own data to show up off the hiseq I've been fooling with publicly-available data to make sure I can actually run a pipeline. One thing I've noticed with many datasets from different labs is a curious, and distinctive, pattern of per base sequence content in the first 10-11 bases. You can see it here, here, and another one is attached below. By trimming it in an individual set, though, the % reads aligning doesn't seem to change appreciably. What are these things?
Many thanks for any insights.
Many thanks for any insights.
Comment