Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Three reads with the same name in the BAM file

    Hi all,

    I am dealing with the paired-end BAM file, and come up with many warnings like this:

    Code:
    WARNING: Could not find pair for HWI-ST430:177:2:1:4979:15503#0
    WARNING: Could not find pair for HWI-ST430:177:2:1:5127:13427#0
    WARNING: Could not find pair for HWI-ST430:177:2:1:6521:21452#0
    I check the warning reads in the BAM file, and find all the warning reads have three reads with the same name. For example:

    Code:
    [COLOR="Red"]HWI-ST430:177:2:1:4979:15503#0[/COLOR]	65	chr32	26100696	60	79M21S	chr5	36697147	0	ACTTTGCAATTTAAGTTTTACTTACTTTTTAACTAATATACATGCCTAAAATTTACAAAAACAATAATAAAAACAACAGAACACTGGAAACATTTTTAAA	>;=<>=<<=======<====;===;=======<=>>>>>><=>>==>>>>=>>>>==>?>=<<==>?>>>?>?==><=?>><=<>>>?>?=>??>?===>	BD:Z:FFHFCIKKIHG@EEEHF??DGGEDGGE???DEEGGEFFFFGDHHHHGGE??FF?DGDG???EDGFGFGGF@@@FEHFEIEGFEEIJJIHBHGLJDD@EF@	MD:Z:79	PG:Z:MarkDuplicates	RG:Z:Basenji	BI:Z:FFIECHGIHFEAFEEHEAAFFHDFFHDAAAFEEIHFGGHGGGHHGHHHFBBGFBGGGHBBBFGHGGFGGFBBBGHIGHJGHGHFKJJJJEIKLJGHBGFB	NM:i:0	AS:i:79	XS:i:19
    [COLOR="red"]HWI-ST430:177:2:1:4979:15503#0[/COLOR]	129	chr5	36697147	60	72M28S	chr32	26100696	0	ATTTGCCCCTGGGCTATTTTTTTCCTNCCATGTAAGATTCCGTTTTAAAAATGTTTCCAGTGTTCTGTTGTTTTTATTATTGTTTTTGTAAATTTTAGGC	===<=<<<<====<=>========<<!<<<=><<=>>>>>=5=>>>>>>>>>>=>>>==>=>=>>>>=?>=>>>>>>>>=?>=>>>?>>>??>??>;<=>	SA:Z:chr32,26100739,-,36M64S,60,0;	BD:Z:FFG@JKKFFHIIEHIGFF?????EGGEEEGHHEGEEDGFEGEGF??DE???FHEF?EGGHIFFGFEIFGGFG@@@EGGEGGGFHAAAHGJHBJJDDEHHI	MD:Z:26T37T7	PG:Z:MarkDuplicates	RG:Z:Basenji	BI:Z:FFFBHHHFFHGGDGHGGEAAAAADFGEEEIHHGHFFFGFEGHHFBBGFBBBGHGFBEGIIIFGFEFHGFHHGCCCHIGHIGHHGDDDIIKIFKJGHGHGH	NM:i:2	AS:i:65	XS:i:21
    [COLOR="red"]HWI-ST430:177:2:1:4979:15503#0[/COLOR]	401	chr32	26100739	60	36M64H	=	26100696	-79	GCCTAAAATTTACAAAAACAATAATAAAAACAACAG	===<=>>=>>===>===<=>===========>;===	SA:Z:chr5,36697147,+,72M28S,60,2;	BD:Z:IHHE??FF?EGEF???FEFFFDFGE@@AHHIJFIFF	MD:Z:36	PG:Z:MarkDuplicates	RG:Z:Basenji	BI:Z:HGHGBBFFAEGFFAAAEFFEGFEGFABBFGHGGHFF	NM:i:0	AS:i:36	XS:i:22
    The BAM file is alignment of HiSeq reads aligned to the reference genome using bwa, and use picard to remove redundancy. Base realignments were done using gatk.


    My confusion is:
    1、Why there are three reads with the same name, but have no relation?
    2、Maybe the first two are treated as mate pairs and the third as a single read. So could I just ignore it?

    Could eveyone help me? Many thanks for your help!

  • #2
    3rd read flag value 401 has not primary alignment bit.

    2nd read has "SA" tag:
    SA is : Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list: ( rname , pos , strand , CIGAR , mapQ , NM [[...]+. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the [...] element points to the primary line.

    it's pointing to 3rd read via the location.

    So, looks like your software suppors reads that have parts that maps to different locations.

    Comment


    • #3
      Originally posted by Richard Finney View Post
      3rd read flag value 401 has not primary alignment bit.

      2nd read has "SA" tag:
      SA is : Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list: ( rname , pos , strand , CIGAR , mapQ , NM [[...]+. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the [...] element points to the primary line.

      it's pointing to 3rd read via the location.

      So, looks like your software suppors reads that have parts that maps to different locations.
      Thank you for your reply!

      I read your reply carefully but there is some difficulty for me to understand.

      Could you explain the three reads more easy to understand? or how can I solve the warnings "Could not find pair for HWI-ST430:177:2:1:4979:15503#0".

      Thank you very much!
      Last edited by Alphabets; 03-28-2016, 07:51 PM.

      Comment


      • #4
        What is your goal?

        What program is reporting the warning?

        Check the manual for your alignment software and check the notes on when it produces an "SA" tag.

        Read one is one mate pair.
        The next two represent the other read with two entries , that is it is a "chimeric" read [ I think ].

        Ignoring it could be thing to do, depending on your goals.

        If you are looking for chimeric reads or possible errors in the reference, then you have struck gold

        Comment


        • #5
          Originally posted by Richard Finney View Post
          What is your goal?

          What program is reporting the warning?

          I want to call STRs with lobSTR dealing with the BAM file.

          I run lobSTR with the paired-end BAM file and it occurs many warnings like that.

          The BAM file I use is downloaded from web and I don't know more about it.

          When I run lobSTR treating it as the single-end BAM file, there is no warnings.
          The lobSTR to run single-end and single-end BAM file have different parameters.

          So, any other suggestions? Thanks!
          Last edited by Alphabets; 03-28-2016, 10:27 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X