SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat2 - Low percentage of mapped reads caiosuz RNA Sequencing 14 02-15-2016 09:10 AM
Low cluster Passing Filter Percentage justinjun Illumina/Solexa 2 06-01-2015 06:23 AM
Low mapping percentage with TopHat2 DerSeb RNA Sequencing 3 06-05-2012 05:35 AM
the low percentage of mapped reads wangli RNA Sequencing 2 04-04-2012 05:56 AM
low percentage of reads mapped rahilsethi SOLiD 3 09-13-2010 06:01 AM

Reply
 
Thread Tools
Old 03-15-2016, 07:00 AM   #1
euduca
Junior Member
 
Location: Ilhéus

Join Date: Mar 2016
Posts: 1
Question Low percentage of overlapping

Hello everyone.
I looked for some answers in the forum, but I still have doubts about overlapping.

I read about it on some sites and forums on the subject.
So I used the PEAR program to merge my reads.

This next generation sequencing procedure employed the Illumina platform HiSeq, the insert size was 800-900 bp and the library was TruSeq LT DNA kit. The average size of the sequences is from 100-101. (I used the FASTQC for the quality of information).

I first used the raw data and then the trimmed data (Trimmomatic).

In both cases, the overlapping percentage was very low.
So, this is good or bad?

Raw data:

Code:
PEAR v0.9.8 [April 9, 2015]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR
Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: ../../sequences/MP_CGATGT_L001_R1_001.fastq
Reverse reads file.................: ../../sequences/MP_CGATGT_L001_R2_001.fastq
PHRED..............................: 33
Using empirical frequencies........: YES
Statistical method.................: OES
Maximum assembly length............: 999999
Minimum assembly length............: 50
p-value............................: 0.010000
Quality score threshold (trimming).: 0
Minimum read size after trimming...: 1
Maximal ratio of uncalled bases....: 1.000000
Minimum overlap....................: 10
Scoring method.....................: Scaled score
Threads............................: 9

Allocating memory..................: 200,000,000 bytes
Computing empirical frequencies....: DONE
  A: 0.266503
  C: 0.233513
  G: 0.233699
  T: 0.266286
  2441195 uncalled bases
Assemblying reads: 100%

Assembled reads ...................: 445,709 / 42,848,431 (1.040%)
Discarded reads ...................: 4,585 / 42,848,431 (0.011%)
Not assembled reads ...............: 42,398,137 / 42,848,431 (98.949%)
Assembled reads file...............: MP_CGATGT_L001.assembled.fastq
Discarded reads file...............: MP_CGATGT_L001.discarded.fastq
Unassembled forward reads file.....: MP_CGATGT_L001.unassembled.forward.fastq
Unassembled reverse reads file.....: MP_CGATGT_L001.unassembled.reverse.fastq
Trimmed data:

Code:
PEAR v0.9.8 [April 9, 2015]

Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR
Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593

Forward reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R1_001_p.fq
Reverse reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R2_001_p.fq
PHRED..............................: 33
Using empirical frequencies........: YES
Statistical method.................: OES
Maximum assembly length............: 999999
Minimum assembly length............: 50
p-value............................: 0.010000
Quality score threshold (trimming).: 0
Minimum read size after trimming...: 1
Maximal ratio of uncalled bases....: 1.000000
Minimum overlap....................: 10
Scoring method.....................: Scaled score
Threads............................: 9

Allocating memory..................: 200,000,000 bytes
Computing empirical frequencies....: DONE
  A: 0.267372
  C: 0.232951
  G: 0.231743
  T: 0.267933
  6664 uncalled bases
Assemblying reads: 100%

Assembled reads ...................: 380,009 / 32,256,001 (1.178%)
Discarded reads ...................: 0 / 32,256,001 (0.000%)
Not assembled reads ...............: 31,875,992 / 32,256,001 (98.822%)
Assembled reads file...............: MP_CGATGT_L001_trim.assembled.fastq
Discarded reads file...............: MP_CGATGT_L001_trim.discarded.fastq
Unassembled forward reads file.....: MP_CGATGT_L001_trim.unassembled.forward.fastq
Unassembled reverse reads file.....: MP_CGATGT_L001_trim.unassembled.reverse.fastq
Thank you, and I'm studying as much as I can about it.

P.S: Yes, I read the http://seqanswers.com/forums/showthread.php?t=66830 before.
euduca is offline   Reply With Quote
Old 03-15-2016, 07:09 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,059
Default

How do you expect reads (~100 bp) sampled from two ends of a fragment (that is 800-900 bp) to overlap/merge?

Perhaps you should be aligning these reads to a reference rather than trying to overlap them directly?

BTW: Reads that are overlapping/merging likely represent cases where the insert (fragment) must be very short.
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO