Yesterday I ran hundreds, if not thousands of alignment of fastq files to human genome, so I have some findings. Just look at below:
1.3G SRR010942_2.recal.fastq.gz
259M SRR010942_2.sai
(12135599 sequences have been processed)
2.2G SRR016607_1.recal.fastq.gz
62M SRR016607_1.sai
(18504558 sequences have been processed.)
2.3G SRR016607_2.recal.fastq.gz
388M SRR016607_2.sai
(18504558 sequences have been processed.)
Just look at second and third file, they have the same size and sequences processed, but result in sai file of quite different size. Why?
Or generalize my question, is sai file size proportional to the size of fastq file? According to my output, I would say no. But I don't know why.
1.3G SRR010942_2.recal.fastq.gz
259M SRR010942_2.sai
(12135599 sequences have been processed)
2.2G SRR016607_1.recal.fastq.gz
62M SRR016607_1.sai
(18504558 sequences have been processed.)
2.3G SRR016607_2.recal.fastq.gz
388M SRR016607_2.sai
(18504558 sequences have been processed.)
Just look at second and third file, they have the same size and sequences processed, but result in sai file of quite different size. Why?
Or generalize my question, is sai file size proportional to the size of fastq file? According to my output, I would say no. But I don't know why.
Comment