Hello all
I am new in bioinformatics and linux, and right now I am starting my "training" with some 454 RNAseq data. The starting data are 454 RNA sequencing reads from 10 different individuals (3 runs each). Until now I have converted my fasta/qual to fastq files and then collapsed the fastq files from the different reads of each individual into a single one, before proceeding with the quality control analysis. All went smooth and the outputs seem to be ok. Now I want to procceed with the assembly and to do this I plan to use MIRA as implemented in Geneious. I have uploaded the trimmed/clipped fastq files and when I select them and try to do an assembly with the default parameters I get the following error message:
Fatal error (may be due to problems of the input data or parameters):
********************************************************************************
* Some read names were found more than once (see log above). This usually *
* hints to a serious problem with your input and should really, really be *
* fixed. You can choose to ignore this error with , but this will *
* almost certainly lead to problems with result files (ACE and CAF for sure, *
* maybe also SAM) and probably to other unexpected effects. *
I have already cecked whether I could have put together fastq files more than once, but when I looked at my scripts there are no errors. I tried assembling files by pairs to see which are the problematic ones, and I get this error message with a few of them so now I need to check where is the problem and if it is actually true that I have repeated read names in these files (although this shouldn't be the case). I would like to find some script that allows me to extract just the read names from these files, so I can then compare them and check if I there are repeated read names between files, but I cannot find anything useful anywhere. Does anyone knows how can I do this, and also anyone has any guess on why is Mira reporting this error?
Thanks in advance
Olalla
I am new in bioinformatics and linux, and right now I am starting my "training" with some 454 RNAseq data. The starting data are 454 RNA sequencing reads from 10 different individuals (3 runs each). Until now I have converted my fasta/qual to fastq files and then collapsed the fastq files from the different reads of each individual into a single one, before proceeding with the quality control analysis. All went smooth and the outputs seem to be ok. Now I want to procceed with the assembly and to do this I plan to use MIRA as implemented in Geneious. I have uploaded the trimmed/clipped fastq files and when I select them and try to do an assembly with the default parameters I get the following error message:
Fatal error (may be due to problems of the input data or parameters):
********************************************************************************
* Some read names were found more than once (see log above). This usually *
* hints to a serious problem with your input and should really, really be *
* fixed. You can choose to ignore this error with , but this will *
* almost certainly lead to problems with result files (ACE and CAF for sure, *
* maybe also SAM) and probably to other unexpected effects. *
I have already cecked whether I could have put together fastq files more than once, but when I looked at my scripts there are no errors. I tried assembling files by pairs to see which are the problematic ones, and I get this error message with a few of them so now I need to check where is the problem and if it is actually true that I have repeated read names in these files (although this shouldn't be the case). I would like to find some script that allows me to extract just the read names from these files, so I can then compare them and check if I there are repeated read names between files, but I cannot find anything useful anywhere. Does anyone knows how can I do this, and also anyone has any guess on why is Mira reporting this error?
Thanks in advance
Olalla
Comment