Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM file size difference between 2 sequencing platforms

    Hello everyone,

    It is my first time posting so I don't really know if I'm at the right place.
    I am new to sequencing and still a student so please tell me if my question is a dumb question:

    I'm working on a project in which we have sequencing datas of people coming from 2 platforms: one from Thermofischer and one from Illumina. However, the BAM file size per individual is 8 times bigger for the Thermofischer originated datas compared to the Illumina datas. How can that be explained ? Can it be the sequencing method (Sanger/ Ion torrent for Thermofischer), the coverage, or the alignment software used ? (For Illumina dataset, the iSAAC pipeline was used apparently).

    I really want to get a better understanding of all these things so if you have advices or documents that might help me, I'll be happy to take it.

    Thank you very much!!!
    Last edited by Ewiwi; 06-26-2017, 07:19 PM.

  • #2
    Originally posted by Ewiwi View Post
    I'm working on a project in which we have sequencing datas of people coming from 2 platforms: one from Thermofischer and one from Illumina. However, the BAM file size per individual is 8 times bigger for the Thermofischer originated datas compared to the Illumina datas. How can that be explained ? Can it be the sequencing method (Sanger/ Ion torrent for Thermofischer), the coverage, or the alignment software used ?
    Yes, it could be any of those! Normally a bam file's size is most closely related to the coverage, but the contents can vary considerably based on all of those factors - some aligners produce multiple lines per read, and add a bunch of optional fields to each line. The bam generation settings also have a large impact - the file could be compressed or uncompressed, sorted or unsorted. And different platforms produce very different quality score profiles; much of the size of a compressed bam comes from the quality scores.

    Comment


    • #3
      Thank you very much for your reply. I start to understand better how are generated BAM files now. So I guess that these different datasets can't really be compared in downstream applications even if they were aligned on the same reference genome? Since the alignment methods may differ (and I'm almost sure they are) can I really compare them in the end ?

      Thank you for helping a poor student

      Comment


      • #4
        You can usually compare data qualitatively if it was aligned by the same version of the same aligner, using the same reference.

        You cannot usefully compare data quantitatively if:

        1) The aligner was different.
        2) The reference was different.
        3) The platform was different.
        4) The preprocessing was different.
        5) The lysing was different.
        6) The amplification was different.
        7) The DNA quantity or concentration was different.
        8) Anything was different, other than the sample.

        You can usefully compare things if a single variable differs, for the purposes of evaluating the single variable's effect on results. It is pointless to compare things when there are many differences in methodology, unless you are following something like a rigorous fractional factorial design, which requires design prior to experimentation.

        Comment


        • #5
          Thank you very much for your quick reply. I'll keep that in mind !

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Working...
          X