Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • iGenomes reference genome not accurate?

    I grabbed the mm9 reference genome from iGenome. When running cuffcompare, I noticed that the genes.gtf refers to chr#_random whereas the chromosome folder only contained chr#, and no files for _random. I corrected this by downloading the chr#_random files from the UCSC mm9 build so now everything matches with the genes.gtf file.

    However, the mm9 build came with a pre-index bowtie2 file to use when aligning to the genome (I used TopHat). My concern is that the bowtie2 index was not created with the chr#_random files. Is there a way to check this?

    Does anyone know of a better place to grab an accurate mm9 build with bowtie2 index?

    If you think the bowtie2 index might be questionable, how can I index all the chromosomes at once to create a version I trust?

    Have you guys ran into similar issues?

    Thanks a lot for your help.

  • #2
    Did you get the mm9 from cufflinks igenomes site?

    Try the iGenomes mm9 directly from Illumina: http://support.illumina.com/sequenci...e/igenome.html
    Last edited by GenoMax; 10-26-2014, 05:30 PM.

    Comment


    • #3
      I am 90% sure that's where I grabbed it from but I'll download it again just to verify.

      Comment


      • #4
        Downloaded it again:

        Here's the chromosome list:
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrY.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr5.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr3.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr2.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr6.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr16.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr15.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr12.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrM.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr1.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr4.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr9.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr18.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr10.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr14.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chrX.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr11.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr13.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr19.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr8.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr17.fa
        Mus_musculus/UCSC/mm9/Sequence/Chromosomes/chr7.fa

        It's missing the _random.fa files for each chromosome, which are referenced in the genes.gtf file...

        Comment


        • #5
          If you are interested in the "chr*_random" sequences that are not uniquely placed on the chromosome then you should build a genome file/index on your own.

          Comment


          • #6
            *_random sequences could be unique sequences in heterochromatin or a large segdup where the flanking cannot be localized or placed. We use these sequences in mapping to reduce mapping artifacts, not really because we are interested in them. I am always curious why Illumina excludes them.

            Comment


            • #7
              I downloaded the _random chromosomes from the UCSC website and built a new Bowtie2 index using this. I ran Tophat with the -G genes.gtf option and the new index.

              However....
              Here's an error I got from CuffLinks:
              GFF warning: merging adjacent/overlapping segments (many of these)
              Kept 32976 ref transcripts out of 33802
              826 duplicate reference transcripts discarded.

              Here's a similar error I got from CuffCompare
              Kept 33035 transfrags out of 33262
              227 redundant cufflinks transfrags discarded.

              So now my GTF file isn't accurate? I got it from the Illumina iGenome mm9 build. I'm starting to think downloading anything from Illumina is more trouble than it's worth.

              If I was going to download the unmasked genome, build an index, and download an accurate GTF file for the genome, where would I best go? Or is there a way to verify my GTF file against my index?

              What has worked for you in this situation?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              29 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X