Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MiSeq PE run output files

    Hi,

    Can anyone advise me why two output files from a paired end run differ in size? The file for run 2 is about 2 times bigger than for read 1, so I thought it inlcudes both reads 1 and reads 2. Yet, the end strings of each read name (either 1:N:0:1 in read 1 or 2:N:0:1 in read 2) indicate this is not the case. So why the difference?

  • #2
    Are you referring to the gzip compressed or the un-compressed fastq files?

    I haven't seen a case where one read files is twice as large as the other, but I have seen differences in file size for the gzipped files. Part of this is most likely due to the compression algorithm being able to compress one file better than the other. It's also possible that with adapter trimming turned on, depending on the quality of your data, you could have longer reads for read 2 because the data quality dropped enough that Reporter couldn't properly identify the adapter and thus didn't trim it.

    Either way, I wouldn't be concerned about it.

    Comment


    • #3
      No, this is not compression as decompressed files are also about twice longer. The number of records as counted using Biopieces is the same, and clean&trim reduces file size to about the same as read 1. It is very likely much longer records with lots of Ns, which is very surprising. I guess somtheing is wrong with basecalling.

      Comment


      • #4
        Are all of the reads the same length? Did you disable adapter trimming? This may contribute to unequal file sizes.

        Comment


        • #5
          Looks like this indeed a read quality issue as quality filtering/adaptor removal levels file sizes, although read 2 file size now always become smaller. Say, from original 4.2 GB and 8 GB files shrink to 3.8 GB and 3.6 GB, and this is a common trend no matter what library sizes or run lengths are. I am bugging Illumina Tech Support with that. For example, read quality peaks at 100-120 cycles and sharply declines after that even with library size around 600 bp.

          Comment


          • #6
            Originally posted by yaximik View Post
            I am bugging Illumina Tech Support with that. For example, read quality peaks at 100-120 cycles and sharply declines after that even with library size around 600 bp.
            Is this a "low complexity"/amplicon type library, if so this is a known issue.

            Are you running MCS v.2.1.1.13 on your MiSeq?

            Comment


            • #7
              No these were all genomic libraries. I just recently upgraded to v.2.1.13.0, the majority of runs were done with whatever was the previous version.
              ILMN support thinks it is a matrix issue, so I was asked to make a full 2x250 bp phiX run to get kinda baseline output and then go from there. Also, they noticed that in the majority of runs the first nucleotide is C, which is consistent with the fact that most of the samples are from ancient archeological samples. From Paabo work on Neanderthal it is known that DNA breaks preferentially either after or before G.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X