Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alignment produces BAM file with sorted reads, but cannot see alignment

    Hi all,

    This is my first time performing NGS alignment. I am using command line tools. What I did so far, was to first use cat to combine all chromosomes of my reference genome. I then indexed the genome with bwa (-a bwtsw). I then de-multiplexed and sorted into separate FASTQ files by barcode using FASTX Toolkit's Barcode Splitter. Next, I used FASTX trimmer to remove 75bp of barcode, primer, and transposon sequence (I am over trimming, but I started with 300 bases). I then aligned with bwa, trying both the mem and samse algorithms. Then, with samtools, I converted to BAM format with the view command, sorted, and indexed. In viewer programs such as Tablet, IGV, and BamView, I either saw no reads/blank or, in the case of Tablet, I saw that it had categorized reads by chromosome, but clicking on a chromosome showed nothing. This is true for the output of both algorithms. Not sure if it is worth noting, but in Tablet, the mismatches column for each chromosome all show a "?". What could go wrong along the way?

    Thanks for your help.

  • #2
    Did you check to see if chromosome names of your reference and the names that are in your BAM files are still matching (after all that processing)? If you look at the headers of the BAM file that should show you the names there. e.g. chr1 needs to stay chr1, it can't become anything else. Hopefully you did not have spaces in the names.

    Note: This may sound stupid but have you "zoomed in" far enough in IGV. You won't see the alignments until a certain point.
    Last edited by GenoMax; 04-13-2015, 10:31 AM.

    Comment


    • #3
      Can you dump the reads using "samtools view" ?

      What is the command you used in samtools?

      What does the bam header look like?

      Comment


      • #4
        The commands I used in samtools are:
        Code:
        samtools view -u -S alignment.sam > alignment.bam
        samtools sort alignment.bam alignment.bam.sorted
        samtools index alignment.bam.sorted.bam
        The headers appear to match:

        This is the genome fastq file:
        >Chr1 CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02
        [Chr1 sequence goes here]
        >Chr2 CHROMOSOME dumped from ADB: Jun/20/09 14:54; last updated: 2009-02-02
        [Chr2 sequence goes here]
        [etc.]

        This are the headers I get for both the unsorted and sorted bam files:
        Code:
        samtools view -H alignment.bam 
        @SQ	SN:Chr1	LN:30427671
        @SQ	SN:Chr2	LN:19698289
        @SQ	SN:Chr3	LN:23459830
        @SQ	SN:Chr4	LN:18585056
        @SQ	SN:Chr5	LN:26975502
        @SQ	SN:chloroplast	LN:154478
        @SQ	SN:mitochondria	LN:366924
        Code:
        samtools view -H alignment.bam.sorted.bam
        @HD	VN:1.3	SO:coordinate
        @SQ	SN:Chr1	LN:30427671
        @SQ	SN:Chr2	LN:19698289
        @SQ	SN:Chr3	LN:23459830
        @SQ	SN:Chr4	LN:18585056
        @SQ	SN:Chr5	LN:26975502
        @SQ	SN:chloroplast	LN:154478
        @SQ	SN:mitochondria	LN:366924
        It does appear that I can dump the reads using samtools view.


        I think it may be worth noting that I just found out that I can view the reads if I open the SAM file, but not the BAM file. Maybe something went wrong during the conversion?

        Comment


        • #5
          BAM is a compressed binary format, it's not human readable. Perhaps your problem is that "Chr1" should be "chr1" (and so on).

          Comment


          • #6
            dpryan,
            I got the headers from the BAM file through the "view" command in samtools. The chromosome names seem to match the reference exactly (e.g. Chr1). Will making them lowercase make a difference? Wouldn't I have to redo the alignment if I want to change to lowercase?

            Comment


            • #7
              I think "CHROMOSOME dumped from ADB: Jun/20/09 14:53; last updated: 2009-02-02" part in the fasta header is throwing IGV off.

              Can you try with a genome file that has this part removed from chromosome names? Names should be just Chr1, Chr2 etc.

              Comment


              • #8
                I just deleted the ends from the fastq genome file to get >Chr1, >Chr2, etc. Reloading this in IGV didn't make a difference. Do I need to reindex/realign?

                Comment


                • #9
                  You might have to reindex the fasta file (actually, you might have to just remake the genome in IGV/tablet/etc., but this is quick). You do not need to realign (you'd only need to do that if you changed the actual sequence).

                  Comment


                  • #10
                    Hi guys,
                    As it turns out, I realized that I can view the reads in the sorted BAM file. However, that's only if 1) I zoom in enough and 2) I know exactly where to scroll to within the chromosome. Is there a way to instantly locate reads? IGV and other viewers I've tried do not show any preview of where reads are and I'm just mindlessly scrolling until I find reads appear in the window.

                    Thanks.

                    Comment


                    • #11
                      How many reads did you have and what was the alignment %? Do you know roughly what fold coverage to expect?

                      If this was a re-sequencing experiment you should have uniform coverage across the genome so reads should be easy to locate. If this was an RNA- or ChIP-seq experiment then you may have to find areas where there are genes/motifs that are enriched.

                      Comment


                      • #12
                        About 42,000 reads were aligned. I am sequencing to find transposon tags. They should only appear about 1-2 times in the genome. Is there a better downstream analysis program to do what I need to do? I just need to know a summary of the alignments: so, where a particular sequence(s) aligned (e.g. chromosome), the # of reads of that particular sequence(s), and the sequence itself.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X