Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Corona Lite and coverage histogram

    Hello to all! I have a little expirience in bioinformatics but I want to create coverage histogram after mapping. I have: F3.unique.25.2.coverage.unique and R3.unique.25.2.coverage.unique for several runs, also I have F3.unique.csfasta.ma.25.2 and R3.unique.csfasta.ma.25.2 files, but I haven't a some ideas what I can do next Please say to me what i need to do?

  • #2
    It depends on what tools you have available, what organism you are sequencing and what you want to accomplish. Those 'coverage' files should be in the format of one line per base with each line being the coverage count. Thus you could import these into a spreadsheet program and draw graphs. Or toss the data into a genome browser that is loaded with all sorts of good stuff about your organism. Or ... well, there are lots of possibilities. The ma (match) files can be used for SNP calling, InDel detection, or transcriptome counting. What you need to do depends on what you want to get done.

    Comment


    • #3
      Originally posted by nedoluzhko View Post
      Hello to all! I have a little expirience in bioinformatics but I want to create coverage histogram after mapping. I have: F3.unique.25.2.coverage.unique and R3.unique.25.2.coverage.unique for several runs, also I have F3.unique.csfasta.ma.25.2 and R3.unique.csfasta.ma.25.2 files, but I haven't a some ideas what I can do next Please say to me what i need to do?
      Parse the unique.csfasta.ma.25.2. That file contains the unique mapped
      reads and the coordinates of the alignment:

      Use that to keep track of how much coverage you have per each base.
      Once you are done, dump the data in a fasta file:

      >chr1
      0
      2
      .
      .

      Each line represent the coverage you have for that genome at that position.

      Once you have that file you can display that easily. I would use either gnuplot
      or circos.

      -drd
      -drd

      Comment


      • #4
        SOLiD number of reads mapped in whole exome target locations

        Hi,
        I wrote a simple perl code to get the number of reads were mapped (SOLiD Corona_lite) in the target location. The perl code is working perfectly, but its taking long time to run with whole exome target locations (more than 5 days with 17million reads). If anybody can you help me to improve this code or easy way to find out the number of reads were mapped into the target location, then it will be great?

        I copied the example input files for the perl program,,

        *********example of whole exome target location SureSelect_All_Exon_G3362_with_names.v2.bed**********
        chr exon_start exon_end totalBase ID
        1 20138 20258 120 gb|BC048429
        1 58932 59892 960 entg|OR4F5:ccds|CCDS30547.1
        1 860956 861196 240 entg|SAMD11:ccds|CCDS2.2
        1 864267 864387 120 entg|SAMD11:ccds|CCDS2.2
        1 864490 864730 240 entg|SAMD11:ccds|CCDS2.2
        2 9016029 9016269 240 entg|MBOAT2:ccds|CCDS1660.1
        2 9061053 9061173 120 entg|MBOAT2:ccds|CCDS1660.1
        3 15591480 15591600 120 entg|HACL1:ccds|CCDS2627.1
        3 15596368 15596608 240 entg|HACL1:ccds|CCDS2627.1
        3 15599383 15599503 120 entg|HACL1:ccds|CCDS2627.1
        4 47648088 47648304 216 entg|CNGA1:ccds|CCDS43226.1
        4 47713576 47713696 120 entg|NIPAL1:ccds|CCDS3479.1
        4 47721794 47722154 360 entg|NIPAL1:ccds|CCDS3479.1
        ************************************************

        ******head of SureSelect_20091218_4_F3.unique.csfasta.ma.50.6*****************
        >1_48_337_F3,3_137353925.5
        T00120201000200020311112100111100230002023120222022
        >1_49_100_F3,2_-86591224.5
        T31202300221201300132312132112130032010203103221031
        >1_50_326_F3,20_-43849935.6
        T01311332010321311110222301111321200112222200011020
        >1_51_297_F3,17_-26283180.4
        T33003203132010100002232223221303000113022300222002
        >1_54_263_F3,5_-102923763.3
        T31120203312212121300220132222022210312001202012020
        ******************************************************

        ********targetMaps.pl**********
        #usr/bin/perl
        #AGRV[0] is the whole exome target locations file
        open(FILE,$ARGV[0]) || die "unable to open file $ARGV[0]";
        while (<FILE>)
        {
        $line = $_;
        chomp $line;
        $a[$i] =$line ;
        $i++;
        }

        open (FILE2,$ARGV[1]) || die " unable to open the file $ARGV[1]";
        #ARGV[1] is the .unique.csfasta.ma file
        $count = 0; while (<FILE2>)
        {
        $line = $_;
        chomp $line;
        #>1303_25_81_F3,12_2661894.5
        if($line =~ /^>\S+\_\S+\_\S+\_\S+\,(\S+)\_(\S+)\.\d+/)
        {
        $chr = $1;
        $pos = $2;
        if($pos =~ /^-(\d+)/){$pos = $1};
        # print "$chr\t$pos\n";
        for($j=0; $j<=$i;$j++)
        {
        $capture = $a[$j];
        @d = split (/\t/, $capture);
        if(($chr eq $d[0]) && ($pos >= $d[1] && $pos <= $d[2]))
        {
        # print "$chr\t$pos\n";
        # print "$d[0]\t$d[1]\t$d[2]\n";
        $count++;
        }
        }
        }

        }
        print "\n Total reads mapped to target locations : $count \n";
        *******************************

        Perl program run :
        perl targetMaps.pl SureSelect_All_Exon_G3362_with_names.v2.bed SureSelect_20091218_4_F3.unique.csfasta.ma.50.6

        Thanks in advance.

        Thanks,
        Shibu
        Last edited by shibujohn; 03-31-2010, 03:07 AM.

        Comment


        • #5
          17million reads compared to even a 1000 feature target file yields 17 billion comparisons in your loop in the worst case. 10000 features = 170 billion comparisons, etc.

          You could implement a binning approach (see Kent et al 2002, the SAMTools paper, etc. for details). Alternatively, you could use intersectBed or coverageBed in BEDTools or use Galaxy.

          This should run in a minute or so with BEDTools.

          Comment


          • #6
            Thanks for your information. I am using the BEDTools to get this information,

            Comment


            • #7
              Help with .csfasta.ma parsing

              Hi,
              I understand from this thread that you're pretty familiar with .csfasta.ma format.
              I'd like to convert it to a simple .bed.
              Can someone help me with that. I see how you get the position in the perl script, but how do I understand where that position is in the reference?

              Any help would be appreciated.

              Thanks!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X