Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low percentage of overlapping

    Hello everyone.
    I looked for some answers in the forum, but I still have doubts about overlapping.

    I read about it on some sites and forums on the subject.
    So I used the PEAR program to merge my reads.

    This next generation sequencing procedure employed the Illumina platform HiSeq, the insert size was 800-900 bp and the library was TruSeq LT DNA kit. The average size of the sequences is from 100-101. (I used the FASTQC for the quality of information).

    I first used the raw data and then the trimmed data (Trimmomatic).

    In both cases, the overlapping percentage was very low.
    So, this is good or bad?

    Raw data:

    Code:
    PEAR v0.9.8 [April 9, 2015]
    
    Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR
    Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593
    
    Forward reads file.................: ../../sequences/MP_CGATGT_L001_R1_001.fastq
    Reverse reads file.................: ../../sequences/MP_CGATGT_L001_R2_001.fastq
    PHRED..............................: 33
    Using empirical frequencies........: YES
    Statistical method.................: OES
    Maximum assembly length............: 999999
    Minimum assembly length............: 50
    p-value............................: 0.010000
    Quality score threshold (trimming).: 0
    Minimum read size after trimming...: 1
    Maximal ratio of uncalled bases....: 1.000000
    Minimum overlap....................: 10
    Scoring method.....................: Scaled score
    Threads............................: 9
    
    Allocating memory..................: 200,000,000 bytes
    Computing empirical frequencies....: DONE
      A: 0.266503
      C: 0.233513
      G: 0.233699
      T: 0.266286
      2441195 uncalled bases
    Assemblying reads: 100%
    
    [B]Assembled reads ...................: 445,709 / 42,848,431 (1.040%)
    Discarded reads ...................: 4,585 / 42,848,431 (0.011%)
    Not assembled reads ...............: 42,398,137 / 42,848,431 (98.949%)[/B]
    Assembled reads file...............: MP_CGATGT_L001.assembled.fastq
    Discarded reads file...............: MP_CGATGT_L001.discarded.fastq
    Unassembled forward reads file.....: MP_CGATGT_L001.unassembled.forward.fastq
    Unassembled reverse reads file.....: MP_CGATGT_L001.unassembled.reverse.fastq
    Trimmed data:

    Code:
    PEAR v0.9.8 [April 9, 2015]
    
    Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR
    Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593
    
    Forward reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R1_001_p.fq
    Reverse reads file.................: ../../../trim/trimmomatic/MP_CGATGT_L001_R2_001_p.fq
    PHRED..............................: 33
    Using empirical frequencies........: YES
    Statistical method.................: OES
    Maximum assembly length............: 999999
    Minimum assembly length............: 50
    p-value............................: 0.010000
    Quality score threshold (trimming).: 0
    Minimum read size after trimming...: 1
    Maximal ratio of uncalled bases....: 1.000000
    Minimum overlap....................: 10
    Scoring method.....................: Scaled score
    Threads............................: 9
    
    Allocating memory..................: 200,000,000 bytes
    Computing empirical frequencies....: DONE
      A: 0.267372
      C: 0.232951
      G: 0.231743
      T: 0.267933
      6664 uncalled bases
    Assemblying reads: 100%
    [B]
    Assembled reads ...................: 380,009 / 32,256,001 (1.178%)
    Discarded reads ...................: 0 / 32,256,001 (0.000%)
    Not assembled reads ...............: 31,875,992 / 32,256,001 (98.822%)[/B]
    Assembled reads file...............: MP_CGATGT_L001_trim.assembled.fastq
    Discarded reads file...............: MP_CGATGT_L001_trim.discarded.fastq
    Unassembled forward reads file.....: MP_CGATGT_L001_trim.unassembled.forward.fastq
    Unassembled reverse reads file.....: MP_CGATGT_L001_trim.unassembled.reverse.fastq
    Thank you, and I'm studying as much as I can about it.

    P.S: Yes, I read the http://seqanswers.com/forums/showthread.php?t=66830 before.

  • #2
    How do you expect reads (~100 bp) sampled from two ends of a fragment (that is 800-900 bp) to overlap/merge?

    Perhaps you should be aligning these reads to a reference rather than trying to overlap them directly?

    BTW: Reads that are overlapping/merging likely represent cases where the insert (fragment) must be very short.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    30 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X