It's my first time dealing with Solid data. It's mate pair reads, solid sequenced. A I found some problem after I used fastqc to check the raw reads, which were SRA data downloaded from NCBI:
1. The qualities for R1 and R2 were imbalance, especially showing very poor quality in R1. What might be the problem? The only thing I did before it was converting sra format to fastq.
2. Why are there so many Ns in R1 data? Is it because fastqc doesn't know how to treat dot in colorspace codes?
3. Someone tried to align those reads onto ref genome. It came out very few reads could be mapped. At most 50% of the reads could be mapped. We used aligner BFAST, and also tried BWA. Since in the publication, this dataset has a good aligment, it's impossible due to bad sequencing.
Could anyone give me some suggestion?
1. The qualities for R1 and R2 were imbalance, especially showing very poor quality in R1. What might be the problem? The only thing I did before it was converting sra format to fastq.
2. Why are there so many Ns in R1 data? Is it because fastqc doesn't know how to treat dot in colorspace codes?
3. Someone tried to align those reads onto ref genome. It came out very few reads could be mapped. At most 50% of the reads could be mapped. We used aligner BFAST, and also tried BWA. Since in the publication, this dataset has a good aligment, it's impossible due to bad sequencing.
Could anyone give me some suggestion?
Comment