Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting data into Golden Helix Genome Broswer

    Hello I'm trying to get my sorted bam files to read into golden helixs genome broswer. I'm currently using version 2.1.0. Alot of my data comes into the broswer as Did not successfully write coverage data amoung other issues.

    I've done this process

    1.Download Data from NCBI in form of Fastq/Fasta format

    2.Download data from igenome

    3.Unzip Igenome data in linux

    4.Build alignment Bowtie2-align -x “name of Built index” “.fastq” -S “.Sam”

    5. Convert Sam file to Bam file with samtools view -Sb “Sam File” > “Bam File”

    6. samtools sort -n*if not sorted for cufflinks “bam file”

    The data will go through things like cufflinks/tophats. They are also 7gig files but doesnt work in the broswer.


    What can I do to fix this?

  • #2
    Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).

    Comment


    • #3
      samtools sort -n
      According to the command you've posted, you're sorting the BAM file by read name.
      To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

      Code:
      samtools sort
      As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
      I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.

      Comment


      • #4
        Originally posted by GenoMax View Post
        Is there a specific reason to use Golden Helix? Why not use IGV from Broad, if you just want to look at your bam files. You also need to index your sorted bam file (would be step 7 in your workflow).
        I want to compare things in the liver over multiple SRAs and I will try to sort it and report back

        Comment


        • #5
          Originally posted by blancha View Post
          According to the command you've posted, you're sorting the BAM file by read name.
          To view the BAM file in any genome browser, you'll want to sort it by coordinates. Just samtools sort, without the -n.

          Code:
          samtools sort
          As pointed out by @GenoMax, you'll want to index the sorted BAM file, at least for IGV, and the genome browsers I know.
          I'll second @GenoMax's vote for IGV. Always choose open source software over commercial software, even freeware. That being said, GenomeBrowse may have features that justify picking it over IGV, but IGV is a pretty nifty program.
          Considering my current method can you give me a quick run down on how to insert it into IGV. I'm not getting anything.

          What is the process from fasta or fastq or whatever format to IGV

          Comment


          • #6
            Your "process" up to step 5 is ok.

            6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
            7. Samtools "index" your_file_sorted.bam.

            Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.

            Comment


            • #7
              Originally posted by GenoMax View Post
              Your "process" up to step 5 is ok.

              6. Samtools "sort" (no -n) your_fle.bam your_file_sorted.
              7. Samtools "index" your_file_sorted.bam.

              Read the IGV user guide in case you are not able to figure things out by starting IGV and pointing it to the directory containing the sorted bam and the bai index file. Remember to select the correct genome build before loading the sorted bam file.
              gave it a run with a mouse genome and .bam. Got this



              As you can see the file is over a gigabyte so there is data in there

              Comment


              • #8
                Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.

                Comment


                • #9
                  Originally posted by GenoMax View Post
                  Did you miss the note up in the main browser window that says "Zoom in to see alignments"?

                  By default IGV shows you the entire genome. You have to select a chromosome (or type a gene name in the "go" box to select a region. Even then you may have to click on the "+" sign in top right corner before you start seeing actual reads aligned to genome. You can keep going till you actually see individual bases.
                  Went all the way and didnt see anything. Nothing is showing up

                  Comment


                  • #10
                    My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                    Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                    Can you post the header from your bam?

                    Code:
                    $ samtools view -H your_bam

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      My guess is that your BAM file has chromosome names that do not match what is provided by IGV in terms of the reference (e.g. chr2 vs 2). That is assuming you have selected the correct reference (I see Mouse 129S1 etc in the screenshot above).

                      Where did you get your reference genome from? If you are using a non-standard genome then you can load your own reference sequence in and use it to display data against.

                      Can you post the header from your bam?

                      Code:
                      $ samtools view -H your_bam
                      Thanks for the help. I got the reference one from igenome. Trying to get this mouse data to show.

                      Comment


                      • #12
                        Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                        I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Which version did you download from iGenomes? This looks like NCBI or Ensembl since UCSC versions have the word chr in front of the chromosome number. This is certainly not the 129S1 mouse genome as you had selected in the screenshot above.

                          I am going to suggest that you use the sequence and the annotation in your iGenomes download in IGV so everything matches and you can display the data. See "Loading a genome" section.
                          I downloaded the NCBI GRCm38 version. Should I use something else? Which one would you use?

                          Comment


                          • #14
                            In IGV, just pick the mm10 genome instead.
                            It's the same genome as GRCm38.
                            I think you'll have to reload the BAM file after you've selected the correct genome.

                            Then, zoom into a location where you know you will have coverage.
                            For RNA-Seq for example, you could pick a housekeeping gene, like GAPDH.
                            Just type GAPDH in the search box, and click on the Go button.

                            Comment


                            • #15
                              @blancha: mm10 may not work if the one included in IGV is UCSC version which has the "chr" in front of all chromosome numbers.

                              @Milestailsprowe: If above does not work, create a new "genome" by pointing to the iGenomes (/path_to/WholeGenomesFasta/genome.fa file) and use the corresponding GTF file from (/path_to/Annotations/Genes/genes.gtf). Open your BAM file in IGV.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X