Hello, I am involved in a project where an organism (eukaryote) was sent for paired end sequencing. Usually when we do that we get something like XX_SN132_B_s_7_2_seq_GDR-14.txt and XX_SN132_B_s_7_1_seq_GDR-14.txt back indicating the forward file and reverse file. So I use the velvet provided script called shuffle sequences, get one file and start assembling.
But last week we got something in the format: XX_SNXX_M_L001_GDR-24_R1.fastq.gz and XX_SNXX_M_L001_GDR-24_R2.fastq.gz. At first I saw the files and thought it was no big deal. I extracted them, shuffled them and started assembly. What I noticed is that although the sequencing was paired end with 500 bp inserts, velvetoptimiser was telling me that the library length is 250, which is shorter than the insert. I then checked and saw that the forward reads overlap their mate pairs, almost entirely, 190-240 bp overlap. Read length is 250.
I am not sure I followed the correct procedure with the data this time. overlapping paired end reads are not normal as far as i understand.
But last week we got something in the format: XX_SNXX_M_L001_GDR-24_R1.fastq.gz and XX_SNXX_M_L001_GDR-24_R2.fastq.gz. At first I saw the files and thought it was no big deal. I extracted them, shuffled them and started assembly. What I noticed is that although the sequencing was paired end with 500 bp inserts, velvetoptimiser was telling me that the library length is 250, which is shorter than the insert. I then checked and saw that the forward reads overlap their mate pairs, almost entirely, 190-240 bp overlap. Read length is 250.
I am not sure I followed the correct procedure with the data this time. overlapping paired end reads are not normal as far as i understand.
Comment