Hello!
I have low mapping rate for the SOLiD RNA-seq data (organism - bacteria), around 30-40%, although usually we get 70-80%. I extracted unmapped reads and reads that have multiple hits (they are all poorly aligned and discarded from the further analysis), so:
1) average quality is the same as for good samples (~26 bases)
2) there is an enrichment of TTTTT for unmapped reads and different kind of other k-mers for multiple-hits reads (most of them consistent between samples)
4) GC content is higher (53-55%) for unmapped and muliple-hits reads than for mapped reads (40%)
3) if I look at reads, they look like they consist of short straches of repeated nucleotides:
>178_1751_207_F3
AGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAAAAGAACCTGAAACCGTGTACGT
ACAAGGAGGGGAGAT
>178_1751_758_F3
CGAAAGGCGTAGTCGATGGGAAACAGGTTAATATTCCTGTACTTGGTGTTACTGCGAAGG
GGGGACGGAGATGCG
>178_1752_2_F3
AAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGGAACTCCT
TGCATCTAAATTTAT
I also tried to assemble reads with Trinity, but all the derived contigs are mapped to our bacteria. Mapping agaist human genome did not give anything. It does not look like it is biological contamination. Checked for adapters and did trimming - nothing.
I have low mapping rate for the SOLiD RNA-seq data (organism - bacteria), around 30-40%, although usually we get 70-80%. I extracted unmapped reads and reads that have multiple hits (they are all poorly aligned and discarded from the further analysis), so:
1) average quality is the same as for good samples (~26 bases)
2) there is an enrichment of TTTTT for unmapped reads and different kind of other k-mers for multiple-hits reads (most of them consistent between samples)
4) GC content is higher (53-55%) for unmapped and muliple-hits reads than for mapped reads (40%)
3) if I look at reads, they look like they consist of short straches of repeated nucleotides:
>178_1751_207_F3
AGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAAAAGAACCTGAAACCGTGTACGT
ACAAGGAGGGGAGAT
>178_1751_758_F3
CGAAAGGCGTAGTCGATGGGAAACAGGTTAATATTCCTGTACTTGGTGTTACTGCGAAGG
GGGGACGGAGATGCG
>178_1752_2_F3
AAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGGAACTCCT
TGCATCTAAATTTAT
I also tried to assemble reads with Trinity, but all the derived contigs are mapped to our bacteria. Mapping agaist human genome did not give anything. It does not look like it is biological contamination. Checked for adapters and did trimming - nothing.
Comment