Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Super Large Reference Genome

    I am working on a project in which i am analyzing RNAseq data from fused interspecific cell types, specifically mouse cells and rat cells, and then performing. Being confident that a given read came from the mouse genome or the rat genome is crucial thus the optimal reference genome would be the union of mm9.fa and rn4.fa, but the size is too large for build with bowtie/tophat. Is their anyway to build this reference genome? Why is there a set limit on the size that a reference genome can be. Any help would be greatly appreciated. I know there are work arounds by performing alignemnts to one genome then the other and looking at differences and overlap so on and so forth but this is not optimal.

    Cheers,

  • #2
    Are you trying to use a single concatenated sequence? Is so, why not use a multi-entry FASTA file containing both the rat and the mouse chromosomes?

    The SAM/BAM format itself has a limit of 2^31 - 1 base pairs for each reference sequence, or about 2Gbp (2 billion base pairs). In theory this could be raised to 2^32 - 1 or about 4Gbp but it would cause trouble for Java tools. However, you are much more likely to hit a limitation in the current BAM indexing scheme (BAI files) of 512Mbp (or half a billion base pairs), which is a problem for some organisms - but not for mice, rats or humans!

    Perhaps there is some other limiting factor in bowtie/tophat as well?

    Comment


    • #3
      Yes, bowtie indices use 32 unsigned integers which limits them to about 4gb.
      Going up to 64 bit integers would double the memory requirement and probably also slow down the alignment process.

      You could extend bowtie to allow larger genomes - the SeqAn library it uses should even make this a pretty straightforward endeavour. Better ask the authors how to go about it though.

      Comment


      • #4
        Thanks ffinkernagel, thats really helpful, i'll start looking at the SeqAn library and try to get in contact with the authors.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Innovations in Spatial Biology
          by seqadmin


          Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

          3D Genomics
          While spatial biology often involves studying proteins and RNAs in their...
          01-01-2025, 07:30 PM
        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 01-09-2025, 04:04 PM
        0 responses
        431 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-09-2025, 09:42 AM
        0 responses
        440 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-08-2025, 03:17 PM
        0 responses
        452 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 01-03-2025, 11:18 AM
        1 response
        50 views
        1 like
        Last Post Tonia
        by Tonia
         
        Working...
        X