Hello,
As described below, I believe i may have found a bug when using the --read2pos argument of featureCounts. Additionally, i would appreciate some clarification on the use of the --readExtension5 and/or --readExtension3 arguments when combined with --read2pos, as i don't see the point of extending a read outwards as opposed to inwards.
To provide some context, I created a library of transposon-mediated insertions (see ex below), digested them (Sau3AI), and ligated adapters (Adp) prior to deep sequencing (Illumina MiSeq).
Random E. coli clone:
IR: Inverted repeat; (L) and (R) are reverse complements of each other
Km: kanamycin resistance cassette
N: random nucleotide
Normally, one would expect one of the following ligations to occur:
1. Adp-NNN-ACTGCT-IR(L), OR;
2. IR(R)-GCTACA-NNN-Adp
However, due to the design of the experiment, adapters ligate either to the left-hand side of the knock-in (in the 5'-to-3' orientation) OR to the right-hand side of the knock-in, but in its opposite orientation (3'-to-5').
1. Adp-NNN-ACTGCT-IR(L), OR;
2. Adp-NNN-TGTAGC*-IR(R)*
* The reverse complements of the sequences shown in the randomly mutated E. coli clone.
Thus, after trimming the Adp and IR regions, and aligning using bowtie2, I have two scenarios for my (unstranded) paired-end reads:
In the first case (1), the location of each read's insert can be assigned by reducing the reads to their 3'-ends (i.e. --read2pos 3); whereas in the second case (2), the location of each read's insert can be assigned by reducing them to their 5'-ends (i.e. --read2pos 5).
/context.
The problem i am facing is that paired-end reads that overlaps with two (or more) genes/meta-features show up as "Unassigned Ambiguity", even though they are reduced to their 5' most base or 3' most base. Is this a bug? Shouldn't it be impossible to overlap with more than one gene/meta-feature as the reads are reduced to a single base?
This brings me to my second question: as a workaround to the "Unassigned Ambiguity", i thought about extending the reads a variable amount inwards from their respective ends (e.g. 10-nt downstream of the 5' most base and/or 10-nt upstream of the 3' most base), so that i could increase the minOverlap value and thus mitigate "Unassigned Ambiguity." However, it seems that the only options are to extend the reads outwards. Could someone explain the logic behind increasing reads outwards instead of inwards?
Thanks!
As described below, I believe i may have found a bug when using the --read2pos argument of featureCounts. Additionally, i would appreciate some clarification on the use of the --readExtension5 and/or --readExtension3 arguments when combined with --read2pos, as i don't see the point of extending a read outwards as opposed to inwards.
To provide some context, I created a library of transposon-mediated insertions (see ex below), digested them (Sau3AI), and ligated adapters (Adp) prior to deep sequencing (Illumina MiSeq).
Random E. coli clone:
NNN-ACTGCT-IR(L)-Km-IR(R)-GCTACA-NNN
IR: Inverted repeat; (L) and (R) are reverse complements of each other
Km: kanamycin resistance cassette
N: random nucleotide
Normally, one would expect one of the following ligations to occur:
1. Adp-NNN-ACTGCT-IR(L), OR;
2. IR(R)-GCTACA-NNN-Adp
However, due to the design of the experiment, adapters ligate either to the left-hand side of the knock-in (in the 5'-to-3' orientation) OR to the right-hand side of the knock-in, but in its opposite orientation (3'-to-5').
1. Adp-NNN-ACTGCT-IR(L), OR;
2. Adp-NNN-TGTAGC*-IR(R)*
* The reverse complements of the sequences shown in the randomly mutated E. coli clone.
Thus, after trimming the Adp and IR regions, and aligning using bowtie2, I have two scenarios for my (unstranded) paired-end reads:
In the first case (1), the location of each read's insert can be assigned by reducing the reads to their 3'-ends (i.e. --read2pos 3); whereas in the second case (2), the location of each read's insert can be assigned by reducing them to their 5'-ends (i.e. --read2pos 5).
/context.
The problem i am facing is that paired-end reads that overlaps with two (or more) genes/meta-features show up as "Unassigned Ambiguity", even though they are reduced to their 5' most base or 3' most base. Is this a bug? Shouldn't it be impossible to overlap with more than one gene/meta-feature as the reads are reduced to a single base?
This brings me to my second question: as a workaround to the "Unassigned Ambiguity", i thought about extending the reads a variable amount inwards from their respective ends (e.g. 10-nt downstream of the 5' most base and/or 10-nt upstream of the 3' most base), so that i could increase the minOverlap value and thus mitigate "Unassigned Ambiguity." However, it seems that the only options are to extend the reads outwards. Could someone explain the logic behind increasing reads outwards instead of inwards?
Thanks!