Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird Bowtie Alignment Result (unaligned remaining in output file)

    Hello everyone.

    I'm getting a strange Bowtie output where the resulting sequence doesn't find a match to the genome, but remains included in my SAM output file. I only noticed this when running another process downstream and the number of alignments put in matched the number of inputs for Bowtie, although the inputs did not all align to the genome.

    Code:
    brandon@brandon:~/brachy/data/wt$ bowtie -t -p 4 --sam-nohead --sam-nosq -f -S -k 1 -v 0 ~/brachy/genome/brachygen wt.collapse.fq wt.collapse.sam
    Time loading forward index: 00:00:01
    Time for 0-mismatch search: 00:00:10
    # reads processed: 2009754
    # reads with at least one reported alignment: 1636731 (81.44%)
    # reads that failed to align: 373023 (18.56%)
    Reported 1636731 alignments to 1 output stream(s)
    Time searching: 00:00:13
    Overall time: 00:00:13
    All of the 3732023 reads that failed to align to the genome remain in the output file and are not given a chromosome assignment, but a "*" and the start and stop regions are reported as zero. See below:

    Code:
    813501-1	0	Bd1	17008793	255	24M	*	0	0	TGGAAAAGATTCTGGATCCTGTGC	IIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:24	NM:i:0
    813504-1	4	*	0	0	*	*	0	0	GAACGGAATTACAGAACAAAATA	IIIIIIIIIIIIIIIIIIIIIII	XM:i:0
    813506-1	4	*	0	0	*	*	0	0	GATAACCGTAGTAATTCTAGAGCTGATACGTGC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	XM:i:0
    813505-1	16	Bd1	30522658	255	24M	*	0	0	GCCACTGATTCCACCTGTAACCAA	IIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:24	NM:i:0
    813503-1	16	Bd4	34965205	255	25M	*	0	0	CCCACTAAGTGTAGTTAATTTTAGG	IIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:25	NM:i:0
    813507-1	0	Bd2	11654350	255	24M	*	0	0	ACCCTGCCGATGGGACTCAGTGGA	IIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:24	NM:i:0
    813510-1	4	*	0	0	*	*	0	0	CGCTATCAGATGAGCCTAGGTCGGATTA	IIIIIIIIIIIIIIIIIIIIIIIIIIII	XM:i:0
    813508-1	0	Bd3	3399363	255	24M	*	0	0	GCGGCTGATTCTGAATAATACCAA	IIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:24	NM:i:0
    813509-1	16	Bd5	24802257	255	21M	*	0	0	ACGCCTTTGCTCAGGTGCCAT	IIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:21	NM:i:0
    813512-1	0	Bd4	36841227	255	24M	*	0	0	ATGGTCGAAATATTACATGACGCA	IIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:24	NM:i:0
    When I discovered this I used gawk to filter out all alignments with a chromosome assignment = * and the remaining result of alignments was the number of alignments reported to have been aligned.

    Does anyone understand this problem? If so, is there a way to exclude these results from the output file other than using gawk or another utility to filter them out? I've used Bowtie quite a bit and I've never experienced a problem like this.

    Thanks.

  • #2
    I think this is a feature rather than a problem.
    All my reads are usually written out to the SAM output by Bowtie, and the next step in my pipeline is to weed out reads that did not map by looking for a "*" in the third column (which would contain the name of the ref sequence where the read aligned, if it did) by gawk or some such utility.

    Comment


    • #3
      Thanks.

      I used gawk before to filter them out:

      gawk '$3!="*"' in.sam > out.sam

      But, since Bowtie didn't find a region of alignment to the genome for these 'features' I'm unsure as to why it would still be contained within the alignment file.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X