Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • different number of reads fastq file - bam file

    Hello,

    I am working with PE 150bp Illumina HiSeq X-ten sequencing data.
    I have generated the number of reads in my trimmed forward + reserve paired fastq files using FASTQC, by counting the number of lines with zcat and by looking at Trimmomatic results. The counts of the number of reads are concordant.

    Then, I have mapped the forward+reverse paired reads to the reference genome using bwa-mem and run bamtools on the output to get the total number of reads and the number of mapped reads. Both the total number of reads and the number of mapped reads are higher than the number of reads in the fastq files (reverse+forward).

    Could it be because the mapped reads can have more than one alignment and the total number of reads does not correspond to the input number of reads but the number of mapped reads which may have multiple alignment + the number of unmapped reads?

    Thanks,

    Marie

  • #2
    It depends on the command options that you used to run bwa-mem. Default output for multi-mappers is to arbitrarily return one hit, while chimeric reads report multiple hits. Chimeric reads contain the bit flag 0x800. Those can be identified using SAMtools command 'samtools view -f 256 aligned.bam'.

    Comment


    • #3
      Thanks for your answer.
      I have used the default options in bwa mem. I have run the samtools command and can't find any 0x800 flag.
      However, I have read the following on bwa manual: The BWA-MEM algorithm performs local alignment. It may produce multiple primary alignments for different part of a query sequence. This is a crucial feature for long sequences. However, some tools such as Picard’s markDuplicates does not work with split alignments. One may consider to use option -M to flag shorter split hits as secondary.

      This may explain the high number of reads after mapping?

      Thanks,

      Marie

      Comment


      • #4
        I am analysing WGBS data (Illumina Hiseq) of Bovine FAT tissues for differential methylation. I used TrimGalore for adaptor removal and qulity check for the the pilot sample . All the adoptors were removed and quality was good (both survived paired end reads 98%). Then I used Bismark for unique alignment to the Bisulfite converted reference genome and I got 57.8% mapping efficiency.
        I want to ask wether 57.8% mapping efficiency is good to proceed further ? What is gold standard for mapping efficiency in WGBS? Please guide me.
        Thanks,
        Naveed.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X