Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting data into Golden Helix Genome Broswer

    Hello I'm trying to get my sorted bam files to read into golden helixs genome broswer. I'm currently using version 2.1.0. Alot of my data comes into the broswer as Did not successfully write coverage data amoung other issues.

    I've done this process

    1.Download Data from NCBI in form of Fastq/Fasta format

    2.Download data from igenome

    3.Unzip Igenome data in linux

    4.Build alignment Bowtie2-align -x “name of Built index” “.fastq” -S “.Sam”

    5. Convert Sam file to Bam file with samtools view -Sb “Sam File” > “Bam File”

    6. samtools sort -n*if not sorted for cufflinks “bam file”

    The data will go through things like cufflinks/tophats. They are also 7gig files but doesnt work in the broswer.


    What can I do to fix this?

  • #2
    Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).

    Comment


    • #3
      samtools sort -n
      According to the command you've posted, you're sorting the BAM file by read name.
      To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

      Code:
      samtools sort
      As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
      I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.

      Comment


      • #4
        Originally posted by GenoMax View Post
        Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).
        I want to compare things in the liver over multiple SRAs and I will try to sort it and report back

        Comment


        • #5
          Originally posted by blancha View Post
          According to the command you've posted, you're sorting the BAM file by read name.
          To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

          Code:
          samtools sort
          As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
          I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.
          Considering my current method can you give me a quick run down on how to insert it into IGV. I'm not getting anything.

          What is the process from fasta or fastq or whatever format to IGV

          Comment


          • #6
            Your "process" up to step 5 is ok.

            6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
            7. Samtools "index" your_file_sorted.bam.

            Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.

            Comment


            • #7
              Originally posted by GenoMax View Post
              Your "process" up to step 5 is ok.

              6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
              7. Samtools "index" your_file_sorted.bam.

              Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.
              gave it a run with a mouse genome and .bam. Got this



              As you can see the file is over a gigabyte so there is data in there

              Comment


              • #8
                Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                  By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.
                  Went all the way and didnt see anything. Nothing is showing up

                  Comment


                  • #10
                    My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                    Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                    Can you post the header from your bam?

                    Code:
                    $ samtools view -H your_bam

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                      Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                      Can you post the header from your bam?

                      Code:
                      $ samtools view -H your_bam
                      Thanks for the help. I got the reference one from igenome. Trying to get this mouse data to show.

                      Comment


                      • #12
                        Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                        I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                          I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.
                          I downloaded the NCBI GRCm38 version. Should I use something else? Which one would you use?

                          Comment


                          • #14
                            In IGV, just pick the mm10 genome instead.
                            It's the same genome as GRCm38.
                            I think you'll have to reload the BAM file after you've selected the correct genome.

                            Then, zoom into a location where you know you will have coverage.
                            For RNA-Seq for example, you could pick a housekeeping gene, like GAPDH.
                            Just type GAPDH in the search box, and click on the Go button.

                            Comment


                            • #15
                              @blancha: mm10 may not work if the one included in IGV is UCSC version which has the "chr" in front of all chromosome numbers.

                              @Milestailsprowe: If above does not work, create a new "genome" by pointing to the iGenomes (/path_to/WholeGenomesFasta/genome.fa file) and use the corresponding GTF file from (/path_to/Annotations/Genes/genes.gtf). Open your BAM file in IGV.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X