Hello, everyone,
I'm using miRDeep2 package to identify miRNAs in my sequencing samples. Things went smoothly in the mapping process. However, while I'm executing miRDeep2.pl, it seems that my reference genome file cannot pass sanity check, as shown below:
---------------------------------------------------------
sanity_check_genome.pl /home/cobrass/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa
Error: problem with /home/cobrass/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa
Error in line 203.212: The sequence
GGTGACAAAGTTCCCGGCCAGTGYGTTTGCGGGTAACGACTGTCTTTGTGGCTCTCCACT
contains characters others than [acgtnACGTN]
Please check your file for the following issues:
I. Sequences are allowed only to comprise characters [ACGTNacgtn].
II. Identifiers are not allowed to have withespaces.
------------------------------------------------------------------------------
By the way, the genome sequence I've downloaded from TAIR10 show many non-ATCG symbols, which makes the script to stop.
Is a way to solve this problem? Like manually substitute the non-ATCGN characters into N (Though I think this will become a bias)?
Or what command can I use to avoid the sanity check? Thx!
Cobrass
I'm using miRDeep2 package to identify miRNAs in my sequencing samples. Things went smoothly in the mapping process. However, while I'm executing miRDeep2.pl, it seems that my reference genome file cannot pass sanity check, as shown below:
---------------------------------------------------------
sanity_check_genome.pl /home/cobrass/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa
Error: problem with /home/cobrass/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa
Error in line 203.212: The sequence
GGTGACAAAGTTCCCGGCCAGTGYGTTTGCGGGTAACGACTGTCTTTGTGGCTCTCCACT
contains characters others than [acgtnACGTN]
Please check your file for the following issues:
I. Sequences are allowed only to comprise characters [ACGTNacgtn].
II. Identifiers are not allowed to have withespaces.
------------------------------------------------------------------------------
By the way, the genome sequence I've downloaded from TAIR10 show many non-ATCG symbols, which makes the script to stop.
Is a way to solve this problem? Like manually substitute the non-ATCGN characters into N (Though I think this will become a bias)?
Or what command can I use to avoid the sanity check? Thx!
Cobrass