Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Large difference (5%) in GC content of illumina readsets derived from the same line.

    Hi all,

    I have four illumina 101PE datasets in which sequenced samples were derived from an indica rice line.
    Their relationships were as follows:
    Wild-type
    Generation_1 --> selfing --> Generation_2
    Mutagenized-type
    Generation_1 --> mutagenize --> Mutant_1 --> selfing --> Mutant_2

    The pooled of G_1, G_2, M_1 and M_2 were sequenced, respectively.
    However, they were not sequenced in one experiment simutaneously. The G_1 and M_1 together, the G_2 and M_2 together were sequenced in two experiments respectively.

    Theoretically, these four samples could be no significant differences in genetic consists, and even can be counted as the same variety.
    But my problem comes out here, they have large GC% content discrepancies:
    G_1, 44.08%; M_1, 44.51%;
    G_2, 38.61%; M_2, 40.61%.

    As far as I can see, the gc% of my samples should be 42%~44% according to Nipponbare reference genome (43.7%).
    Later, I found the GC% discrepancies still been there in my BAM files (generated by using preprocessed reads mapped to reference respectively) .

    Is these datasets normal or reasonable?
    What factors can be responsible for the GC% discrepancies ?
    Is there anyone can give me some suggestions?

    Thanks a bunch!
    Last edited by lovenlong; 12-03-2013, 09:25 PM.

  • #2
    Any difference in the insert size of the libraries? The GC bias in PCR amplification might be different depending on insert size.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Originally posted by SNPsaurus View Post
      Any difference in the insert size of the libraries? The GC bias in PCR amplification might be different depending on insert size.
      The insert sizes of libraries for my samples were nearly the same:
      G_1, 260±66;
      G_2, 280±59;
      M_1, 267±62;
      M_2, 283±48.

      Additionally, my samples would never be the reason to the GC% differences.
      All individuals for pooled were derived from one pure rice line selfing at least 20 generations, and all were check with 24 SSR markers before selected for pooling.

      Thanks.

      Comment


      • #4
        Have you asked your sequence provider if there was anything peculiar about the two sequencing runs as far as base signal intensities (assume this is illumina sequencing) were concerned?

        Comment


        • #5
          Originally posted by GenoMax View Post
          Have you asked your sequence provider if there was anything peculiar about the two sequencing runs as far as base signal intensities (assume this is illumina sequencing) were concerned?
          Hi,

          I'm waiting for their reply now. But seems that they have not yet met this kind thing before.

          I'm wondering if the GC% discrepancy can be happened on the PhiX174 control in different Hiseq2000 sequencing experiments.

          In the evalutation paper reported by Minoche et al (Genome Biology 2011, 12:R112, doi:10.1186/gb-2011-12-11-r112), they found GC% higher than expected in Hiseq datasets:
          "The GC content of the unfiltered HiSeq reads was higher than expected: 40% for Bv + PhiX data and 45.5% for At + PhiX. The B. vulgaris reference sequence has a %GC of 35% [8] and that of the A. thaliana genome is 36% (calculated from TAIR10 [9]). The fraction of PhiX reads (44.7% GC) accounts for only 1 to 2% of the data. For the PhiX sample sequenced on the GAIIx the %GC of 45.7% is much closer to the expected value of 44.7%."

          This looks really strange.

          Thanks!
          Last edited by lovenlong; 12-05-2013, 01:32 AM.

          Comment


          • #6
            The datasets make sense to me because you're not sequencing from all the same individuals. Sequencing bias is also a good explanation. M1 is most similar to G1 because mutation doesn't affect many sites in the genome. I think somaclonal variation during tissue culture stage can also affect genome content. As for G2 and M2, I would expect the genome to change due to selfing (more homozygous). This doesn't necessarily means higher GC% as observed in your case.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X