I'm working on data analysis of mouse RNAseq samples now. The library was prepared by Nugen Universal Plus mRNA-seq kit, sequenced by Illumina HiSeq 2500, single end 50bp. I mapped the reads to mm10 reference and the mapping rate was low for some samples. I checked the unmapped reads and found it can be mapped to the human genome. I summarized commands and the mapping rate as follows.
STAR --genomeDir /path/to/Mus_musculus.GRCm38_dir --readFilesIn /path/to/read1 --outFileNamePrefix /path/to/output/dir/prefix --outSAMtype BAM SortedByCoordinate Unsorted --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 100000 --alignMatesGapMax 100000
STAR --genomeDir /path/to/ENSEMBL.homo_sapiens_dir --readFilesIn /path/to/read1 --outFileNamePrefix /path/to/output/dir/prefix --outSAMtype BAM SortedByCoordinate Unsorted --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 100000 --alignMatesGapMax 100000
Sample mouse_mapping_rate human_mapping_rate
mouse_sample1 28.63% 71.87%
mouse_sample2 72.53% 41.21%
mouse_sample3 75.03% 37.65%
mouse_sample4 50.51% 55.50%
mouse_sample5 68.56% 43.18%
mouse_sample6 3.30% 90.45%
mouse_sample7 75.45% 39.63%
mouse_sample8 69.98% 42.84%
mouse_sample9 33.56% 69.02%
mouse_sample10 23.53% 78.36%
mouse_sample11 54.74% 54.65%
mouse_sample12 58.27% 50.32%
mouse_sample13 50.98% 56.36%
mouse_sample14 77.27% 37.42%
mouse_sample15 48.44% 57.53%
mouse_sample16 58.93% 50.21%
mouse_sample17 78.94% 36.84%
mouse_sample18 65.56% 44.09%
My questions are:
Could anyone know what's the overlap rate between the mouse and human transcriptome?
Whether can human contamination interfere with the analysis of mouse data?
Thanks a lot.
STAR --genomeDir /path/to/Mus_musculus.GRCm38_dir --readFilesIn /path/to/read1 --outFileNamePrefix /path/to/output/dir/prefix --outSAMtype BAM SortedByCoordinate Unsorted --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 100000 --alignMatesGapMax 100000
STAR --genomeDir /path/to/ENSEMBL.homo_sapiens_dir --readFilesIn /path/to/read1 --outFileNamePrefix /path/to/output/dir/prefix --outSAMtype BAM SortedByCoordinate Unsorted --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 100000 --alignMatesGapMax 100000
Sample mouse_mapping_rate human_mapping_rate
mouse_sample1 28.63% 71.87%
mouse_sample2 72.53% 41.21%
mouse_sample3 75.03% 37.65%
mouse_sample4 50.51% 55.50%
mouse_sample5 68.56% 43.18%
mouse_sample6 3.30% 90.45%
mouse_sample7 75.45% 39.63%
mouse_sample8 69.98% 42.84%
mouse_sample9 33.56% 69.02%
mouse_sample10 23.53% 78.36%
mouse_sample11 54.74% 54.65%
mouse_sample12 58.27% 50.32%
mouse_sample13 50.98% 56.36%
mouse_sample14 77.27% 37.42%
mouse_sample15 48.44% 57.53%
mouse_sample16 58.93% 50.21%
mouse_sample17 78.94% 36.84%
mouse_sample18 65.56% 44.09%
My questions are:
Could anyone know what's the overlap rate between the mouse and human transcriptome?
Whether can human contamination interfere with the analysis of mouse data?
Thanks a lot.
Comment