Hello everyone.
I looked for some answers in the forum, but I still have doubts about overlapping.
I read about it on some sites and forums on the subject.
So I used the PEAR program to merge my reads.
This next generation sequencing procedure employed the Illumina platform HiSeq, the insert size was 800-900 bp and the library was TruSeq LT DNA kit. The average size of the sequences is from 100-101. (I used the FASTQC for the quality of information).
I first used the raw data and then the trimmed data (Trimmomatic).
In both cases, the overlapping percentage was very low.
So, this is good or bad?
Raw data:
Trimmed data:
Thank you, and I'm studying as much as I can about it.
P.S: Yes, I read the http://seqanswers.com/forums/showthread.php?t=66830 before.
I looked for some answers in the forum, but I still have doubts about overlapping.
I read about it on some sites and forums on the subject.
So I used the PEAR program to merge my reads.
This next generation sequencing procedure employed the Illumina platform HiSeq, the insert size was 800-900 bp and the library was TruSeq LT DNA kit. The average size of the sequences is from 100-101. (I used the FASTQC for the quality of information).
I first used the raw data and then the trimmed data (Trimmomatic).
In both cases, the overlapping percentage was very low.
So, this is good or bad?
Raw data:
Code:
PEAR v0.9.8 [April 9, 2015] Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593 Forward reads file.................: ../../sequences/MP_CGATGT_L001_R1_001.fastq Reverse reads file.................: ../../sequences/MP_CGATGT_L001_R2_001.fastq PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 50 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 9 Allocating memory..................: 200,000,000 bytes Computing empirical frequencies....: DONE A: 0.266503 C: 0.233513 G: 0.233699 T: 0.266286 2441195 uncalled bases Assemblying reads: 100% [B]Assembled reads ...................: 445,709 / 42,848,431 (1.040%) Discarded reads ...................: 4,585 / 42,848,431 (0.011%) Not assembled reads ...............: 42,398,137 / 42,848,431 (98.949%)[/B] Assembled reads file...............: MP_CGATGT_L001.assembled.fastq Discarded reads file...............: MP_CGATGT_L001.discarded.fastq Unassembled forward reads file.....: MP_CGATGT_L001.unassembled.forward.fastq Unassembled reverse reads file.....: MP_CGATGT_L001.unassembled.reverse.fastq
Code:
PEAR v0.9.8 [April 9, 2015] Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593 Forward reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R1_001_p.fq Reverse reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R2_001_p.fq PHRED..............................: 33 Using empirical frequencies........: YES Statistical method.................: OES Maximum assembly length............: 999999 Minimum assembly length............: 50 p-value............................: 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap....................: 10 Scoring method.....................: Scaled score Threads............................: 9 Allocating memory..................: 200,000,000 bytes Computing empirical frequencies....: DONE A: 0.267372 C: 0.232951 G: 0.231743 T: 0.267933 6664 uncalled bases Assemblying reads: 100% [B] Assembled reads ...................: 380,009 / 32,256,001 (1.178%) Discarded reads ...................: 0 / 32,256,001 (0.000%) Not assembled reads ...............: 31,875,992 / 32,256,001 (98.822%)[/B] Assembled reads file...............: MP_CGATGT_L001_trim.assembled.fastq Discarded reads file...............: MP_CGATGT_L001_trim.discarded.fastq Unassembled forward reads file.....: MP_CGATGT_L001_trim.unassembled.forward.fastq Unassembled reverse reads file.....: MP_CGATGT_L001_trim.unassembled.reverse.fastq
P.S: Yes, I read the http://seqanswers.com/forums/showthread.php?t=66830 before.
Comment