Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mpileup output

    Hello!
    I'm using "mpileup -p reference.fasta my_file.bam" to generate a file that gives me the coverage for each base. This command generates a file with 6 columns but only the column 1, 2 and 4 (scaffold, position and coverage, respectively) are useful for me in this case. A can easily delete the useless columns using awk after, but in order to save disk space (around 50% for each file) I need some alternative to generate the mpileup report whitout the useless columns already. I need only the 1st, 2nd and 4th columns only. Is there some parameter that give-me this? (sorry about my poor english).

  • #2
    Pipe the mpileup output to awk (or cut would work too), and then as it gets each line, it will cut out the parts you want, and only output that.

    Comment


    • #3
      'cut' is so much more simple than 'awk' (in my opinion, of course). To use 'cut' just do:

      mpileup -p reference.fasta my_file.bam | cut -f 1,2,4

      Comment


      • #4
        Hello

        I used pipe awk and worked very well, thank you for the two answers.
        Despite generated the files as I wanted, I detected a little problem in results.
        The columns of the coverage shows some wrong nombers compared to the grafical view of the assembly.
        I'm using Tablet (http://bioinf.scri.ac.uk/tablet/) and IGV (http://www.broadinstitute.org/igv/) for visualize the reads. And looking to both visual and mpileup output, some positions shows a different coverage number, for example: The positions 1-4 is has exactly the same coverage in Tablet, IGV and mpileup. But the position 5-8 shows me one coverage point more in tablet and IGV than in mpileup (was this clear for you?).
        Is there some error in Tablet and IGV or in mpileup output? Or is the mpileup disregarding some reads because some quality problem or other stuff, resulting in diferences in coverage value?
        Thank you

        Comment


        • #5
          The two softwares might differ in how they treat anamalous reads, or reads with zero mapq. For instance, the default on mpileup is to ignore anamlous pairs, and you change that with the command line option -A. I bet IGV counts them all.

          Comment


          • #6
            Hello

            I tried using -A and worked very well Thank you for the answer.
            One more question:
            When I have some gap on assembly, mpileup jumps directly to the next position presenting a coverage:
            scaffold_0 1 4
            scaffold_0 2 4
            scaffold_0 3 4
            scaffold_0 7 8
            scaffold_0 8 8
            scaffold_0 9 8

            I need the positions with 0 coverage also be included on mpileup output. Something like this:
            scaffold_0 1 4
            scaffold_0 2 4
            scaffold_0 3 4
            scaffold_0 4 0
            scaffold_0 5 0
            scaffold_0 6 0
            scaffold_0 7 8
            scaffold_0 8 8
            scaffold_0 9 8

            Comment


            • #7
              hi bfantinatti,

              did you manage to get positions with 0 coverage in your mpileup output?

              cheers,

              D.

              Comment


              • #8
                Yes

                Hello dnusol, yes i did. Sorry, I forgot to post the solution here. I got the solution on annother forum related to bash issues.
                The solution was to apply the following code:

                awk '($2-p2)>1{
                for(i=p2+1;i<$2;i++)
                print $1,i,0
                }
                {p2=$2}1' file

                This will add lines where its lacks, keeping the sequence of the second column, and adding 0 on the respective 3rd column.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X