Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the reference sequence? ( can I find it in a .bai index?)

    Working with the new 1000 genomes (pilot2) release, there are .bai files associated with the new .bam files. I've used 'samtools pileup' to create the pileup files, but the reference column is populated with 'N'. I am trying to determine the reference base (without simply looking at the ncbi human reference and comparing positions), but can't seem to find out how. Is there a way to a) use the .bai file in the process of making the pileup so the reference base column is filled correctly, or b) extract the reference directly from the .bai file?

    Sorry if I've missed something obvious, but I can't find anything in the samtools documentation that answers my question.

    Thanks,
    Jonathan

  • #2
    Going to try setting the -f flag when running samtools pileup..

    Comment


    • #3
      Hi,

      have you managed to solve this problem at all? I am seeing exactly the same and can't quite figure out where I've gone wrong.

      Thanks,

      Jacky

      Comment


      • #4
        I ended up finding downloading the NCBI human reference build 36 in fasta format (split by chromosome), and then using the -f flag when creating the pileup. The reference column seems to be correct after doing this.

        samtools pileup -f ref.fasta alignment.bam > alignment.pu

        Hope that helps!
        Jonathan

        Comment


        • #5
          Mh, that's curious. I have tried that with RefSeq as a reference but I still don't see the reference base. Maybe it's an issue with the format of the Fasta header (contains a colon in my case).

          Thanks for your fast reply anyhow!
          Jacky

          Comment


          • #6
            Still can't get correct reference sequence column

            Hi,

            I still can't get a pileup file where the reference sequence shows bases instead of "N"s. I'd like to create pileup files of sequences from the 1,000 genomes project aligned with NCBI human reference sequences. I am using the -f flag--indicating that the reference sequence is in FASTA format--and also need to use the -c flag--indicating that the pileup file should have the consensus sequence for the original .bam file. In the main samtools-0.1.7a folder of a GNU/Linux computer, I've typed many variants of the following:

            ./samtools pileup -cf /ifs/scratch/.../humanReferenceGenome/UCSCBuild36/chrX.fa /ifs/scratch/.../fatherAlignment/NA12891.chromX.ILLUMINA.bwa.CEU.high_coverage.20100517.bam > NA12891.ChrX.UCSC36.pileup

            The resulting pileup file contains the NA12891 consensus sequence. I've tried using a number of reference sequences, including builds 36.1, 36.2, 36.3 and 37 of the NCBI reference genome and build 36 of the UCSC reference genome, in the hope that one of these reference sequences would also appear in the pileup file. I would very much appreciate any suggestions.

            Thanks,
            Rebecca

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 07:03 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-10-2024, 06:35 AM
            0 responses
            31 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-09-2024, 02:46 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Working...
            X