Hi!
I would like to do an analysis of alternatively used exons in RNAseq data (paired reads, strandless) with DEXSeq and I am struggling with generating the counts using the dexseq_count.py script contained in the DEXSeq package (latest version) several days already.
The problem which occurs is that I get the following warning for many reads:
For example: The first SAM file for which I wanted to get the counts contains 65.851.460 lines with altogether 28.899.079 unique read ids and I get 1.844.364 warnings like the one quoted.
I searched a lot and saw that many people had the same problems, but none of the solutions seems to work for me. I sorted my file using "samtools sort -n" and also ran "samtools fixmate" (because before I got other warnings related to the bitflags set in the SAM files). I also tried to use "sort -sk -k 1,1" to sort the file, with "export LC_ALL=POSIX" as this was suggested elsewhere, but it didn't help.
Here are two examples of the reads corresponding to the first to warnings that appear when running "dexseq_count.py":
In the end, many people reported that they went back and used other aligners to generate the BAM/SAM files then, but I only have access to the BAM files, not to the original data. I read that there is a function "bamtofastq" available in "biobambam" and other tools, nevertheless I am wondering if there is no better way to solve my problem?
Enomis
I would like to do an analysis of alternatively used exons in RNAseq data (paired reads, strandless) with DEXSeq and I am struggling with generating the counts using the dexseq_count.py script contained in the DEXSeq package (latest version) several days already.
The problem which occurs is that I get the following warning for many reads:
UserWarning: Read XXX claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
I searched a lot and saw that many people had the same problems, but none of the solutions seems to work for me. I sorted my file using "samtools sort -n" and also ran "samtools fixmate" (because before I got other warnings related to the bitflags set in the SAM files). I also tried to use "sort -sk -k 1,1" to sort the file, with "export LC_ALL=POSIX" as this was suggested elsewhere, but it didn't help.
Here are two examples of the reads corresponding to the first to warnings that appear when running "dexseq_count.py":
Code:
HWI-ST858_57:1:1101:1228:14101#10@0 113 chr1 566918 255 76M chrM 6369 0 CTCCTNTATCTTAGGGGCCATNNATTTCATCACAACAATTATCAATATAAAACCCCCTGCCATAACCCAATACCAA HWI-ST858_57:1:1101:1228:14101#10@0 113 chrM 6369 255 76M chr1 566918 0 CTCCTNTATCTTAGGGGCCATNNATTTCATCACAACAATTATCAATATAAAACCCCCTGCCATAACCCAATACCAA
Code:
HWI-ST858_57:1:1101:1228:20037#10@0 163 chr1 45243352 255 76M = 45244240 964 CCAAGACCCTGGTGAAGAATTGCATCGTGCTCATCGACAGCACACCGTACCGACAGTGGTA HWI-ST858_57:1:1101:1228:20037#10@0 83 chr1 45244240 255 76M = 45243352 -964 GCTTCNTGCGTGCATCGCTTCNNGGCCGGGACAGTGTGGCCGAGCAGATGGCTATGTGCTA HWI-ST858_57:1:1101:1228:20037#10@0 97 chr14 106444649 255 76M chr5 33162796 0 CAACTCTTTGCCCTCTAGCACATAGCCATCTGCTCGGCCACACTGTCCCGGCCNNGAAGCG HWI-ST858_57:1:1101:1228:20037#10@0 81 chr5 33162796 255 76M chr14 106444649 0 GCTTCNTGCGTGCATCGCTTCNNGGCCGGGACAGTGTGGCCGAGCAGATGGCTATGTGCTA
Enomis
Comment