Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM file size difference between 2 sequencing platforms

    Hello everyone,

    It is my first time posting so I don't really know if I'm at the right place.
    I am new to sequencing and still a student so please tell me if my question is a dumb question:

    I'm working on a project in which we have sequencing datas of people coming from 2 platforms: one from Thermofischer and one from Illumina. However, the BAM file size per individual is 8 times bigger for the Thermofischer originated datas compared to the Illumina datas. How can that be explained ? Can it be the sequencing method (Sanger/ Ion torrent for Thermofischer), the coverage, or the alignment software used ? (For Illumina dataset, the iSAAC pipeline was used apparently).

    I really want to get a better understanding of all these things so if you have advices or documents that might help me, I'll be happy to take it.

    Thank you very much!!!
    Last edited by Ewiwi; 06-26-2017, 07:19 PM.

  • #2
    Originally posted by Ewiwi View Post
    I'm working on a project in which we have sequencing datas of people coming from 2 platforms: one from Thermofischer and one from Illumina. However, the BAM file size per individual is 8 times bigger for the Thermofischer originated datas compared to the Illumina datas. How can that be explained ? Can it be the sequencing method (Sanger/ Ion torrent for Thermofischer), the coverage, or the alignment software used ?
    Yes, it could be any of those! Normally a bam file's size is most closely related to the coverage, but the contents can vary considerably based on all of those factors - some aligners produce multiple lines per read, and add a bunch of optional fields to each line. The bam generation settings also have a large impact - the file could be compressed or uncompressed, sorted or unsorted. And different platforms produce very different quality score profiles; much of the size of a compressed bam comes from the quality scores.

    Comment


    • #3
      Thank you very much for your reply. I start to understand better how are generated BAM files now. So I guess that these different datasets can't really be compared in downstream applications even if they were aligned on the same reference genome? Since the alignment methods may differ (and I'm almost sure they are) can I really compare them in the end ?

      Thank you for helping a poor student

      Comment


      • #4
        You can usually compare data qualitatively if it was aligned by the same version of the same aligner, using the same reference.

        You cannot usefully compare data quantitatively if:

        1) The aligner was different.
        2) The reference was different.
        3) The platform was different.
        4) The preprocessing was different.
        5) The lysing was different.
        6) The amplification was different.
        7) The DNA quantity or concentration was different.
        8) Anything was different, other than the sample.

        You can usefully compare things if a single variable differs, for the purposes of evaluating the single variable's effect on results. It is pointless to compare things when there are many differences in methodology, unless you are following something like a rigorous fractional factorial design, which requires design prior to experimentation.

        Comment


        • #5
          Thank you very much for your quick reply. I'll keep that in mind !

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X