Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • saving samtools mpileup output from a cluster

    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).

  • #2
    Originally posted by Fedster View Post
    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).
    You can generate a consensus like this:

    Code:
    /samtools-0.1.18/samtools/ mpileup -q 5 -Q 15 -l [Interval_File] -uABf [reference_sequence.fa] [aligned_file.bam] | /samtools-0.1.18/bcftools/bcftools view -bcg - > [intermediate_file.bcf] &
    
    /samtools-0.1.18/bcftools/bcftools view [intermediate_file.bcf] > [consensus.txt]
    Where your [Interval_File] must be in a format where positions are denoted as:

    chr1 3301721
    chr1 3313108
    chr1 3319339

    and intervals are denoted as:

    chr1 2985720 2985880
    chr1 3102667 3103058
    chr1 3160629 3160721

    Comment


    • #3
      Originally posted by Heisman View Post
      Where your [Interval_File] must be in a format where positions are denoted as:

      chr1 3301721
      chr1 3313108
      chr1 3319339

      and intervals are denoted as:

      chr1 2985720 2985880
      chr1 3102667 3103058
      chr1 3160629 3160721
      Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

      chr6 3301721
      chr19 3301721
      chr6 2985720 2985880
      chr19 2985720 2985880

      (the positions/intervals I just used are random), or have a position/interval for each splice?

      many thanks!

      Comment


      • #4
        Originally posted by Fedster View Post
        Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

        chr6 3301721
        chr19 3301721
        chr6 2985720 2985880
        chr19 2985720 2985880

        (the positions/intervals I just used are random), or have a position/interval for each splice?

        many thanks!
        You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.

        Comment


        • #5
          Originally posted by Heisman View Post
          You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.
          Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

          Again, amy thanks!

          Comment


          • #6
            Originally posted by Fedster View Post
            Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

            Again, amy thanks!
            You don't need to change the suffix of your files.

            You can specify multiple bam files, see this: http://samtools.sourceforge.net/samtools.shtml

            Alternatively, you could merge all of the bam files together and then run mpileup on the merged bam file. I don't know if one would be faster than the other.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X