Hi All,
I have RNA seq data from ~ 20 samples, 2x72, Solexa, about 20-25 million fragments per sample.
When trying to run picard's MarkDuplicates I got this error back:
Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 2278214, Read name WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0, Mate Alignment start (195002931) must be <= reference sequence length (181748087) on reference chr2
If looking at the read-pair that caused this error:
grep WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 accepted_hits.sam
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 113 chr1 195002931 255 72M = 3420320 0 AGAAAAAAATCCACCACCACCACCACCACCAAAAGGAACTACCCCACTGTGATGTAGGGCTGTAGAGGGGGG ###?BBB??'>=/=>2>A/AA7BB9BBBDBEGFEDEDBEDBEEFFCFDEEEEFFEDGGFGGGGGGGGGGGGG NM:i:1
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 177 chr2 3420320 255 72M = 195002931 0 TTTTTTTTTTCTTTGAGACAGGGTTTCTCTGTGTAGCCTTGGCTGTCCTGGAACTCACTCTGTAGACCAAGC GDEEEEDEEDGFEFGGGEGGGGGEGFGGGGGGGGGG?GGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGG NM:i:2
The problem is that I have fragments where the different ends map to different chromosomes. In this case this causes an error because the first end maps on pos 195002931 (on chromosome 1), and chromosome 2, which the second end maps to, is not that long.
Is there a way to inform picard to swallow these alignments? Would be good if the SAM format would include the chr mapping for the pair as well. Picard does not disregard other non-proper pairs.
Or should I just not use fragments where the different ends map to diff chromosomes? How do you usually treat this?
Thank you,
Boel
I have RNA seq data from ~ 20 samples, 2x72, Solexa, about 20-25 million fragments per sample.
When trying to run picard's MarkDuplicates I got this error back:
Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 2278214, Read name WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0, Mate Alignment start (195002931) must be <= reference sequence length (181748087) on reference chr2
If looking at the read-pair that caused this error:
grep WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 accepted_hits.sam
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 113 chr1 195002931 255 72M = 3420320 0 AGAAAAAAATCCACCACCACCACCACCACCAAAAGGAACTACCCCACTGTGATGTAGGGCTGTAGAGGGGGG ###?BBB??'>=/=>2>A/AA7BB9BBBDBEGFEDEDBEDBEEFFCFDEEEEFFEDGGFGGGGGGGGGGGGG NM:i:1
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 177 chr2 3420320 255 72M = 195002931 0 TTTTTTTTTTCTTTGAGACAGGGTTTCTCTGTGTAGCCTTGGCTGTCCTGGAACTCACTCTGTAGACCAAGC GDEEEEDEEDGFEFGGGEGGGGGEGFGGGGGGGGGG?GGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGG NM:i:2
The problem is that I have fragments where the different ends map to different chromosomes. In this case this causes an error because the first end maps on pos 195002931 (on chromosome 1), and chromosome 2, which the second end maps to, is not that long.
Is there a way to inform picard to swallow these alignments? Would be good if the SAM format would include the chr mapping for the pair as well. Picard does not disregard other non-proper pairs.
Or should I just not use fragments where the different ends map to diff chromosomes? How do you usually treat this?
Thank you,
Boel
Comment