Seqanswers Leaderboard Ad

**westerman** · 10-22-2009, 10:59 AM

It depends on what tools you have available, what organism you are sequencing and what you want to accomplish. Those 'coverage' files should be in the format of one line per base with each line being the coverage count. Thus you could import these into a spreadsheet program and draw graphs. Or toss the data into a genome browser that is loaded with all sorts of good stuff about your organism. Or ... well, there are lots of possibilities. The ma (match) files can be used for SNP calling, InDel detection, or transcriptome counting. What you need to do depends on what you want to get done.

**drio** · 10-22-2009, 07:38 PM

Originally posted by nedoluzhko View Post

Hello to all! I have a little expirience in bioinformatics but I want to create coverage histogram after mapping. I have: F3.unique.25.2.coverage.unique and R3.unique.25.2.coverage.unique for several runs, also I have F3.unique.csfasta.ma.25.2 and R3.unique.csfasta.ma.25.2 files, but I haven't a some ideas what I can do next

Please say to me what i need to do?

Parse the unique.csfasta.ma.25.2. That file contains the unique mapped
reads and the coordinates of the alignment:

Use that to keep track of how much coverage you have per each base.
Once you are done, dump the data in a fasta file:

>chr1
0
2
.
.

Each line represent the coverage you have for that genome at that position.

Once you have that file you can display that easily. I would use either gnuplot
or circos.

-drd

**shibujohn** · 03-31-2010, 03:02 AM

SOLiD number of reads mapped in whole exome target locations

Hi,
I wrote a simple perl code to get the number of reads were mapped (SOLiD Corona_lite) in the target location. The perl code is working perfectly, but its taking long time to run with whole exome target locations (more than 5 days with 17million reads). If anybody can you help me to improve this code or easy way to find out the number of reads were mapped into the target location, then it will be great?

I copied the example input files for the perl program,,

*********example of whole exome target location SureSelect_All_Exon_G3362_with_names.v2.bed**********
chr exon_start exon_end totalBase ID
1 20138 20258 120 gb|BC048429
1 58932 59892 960 entg|OR4F5:ccds|CCDS30547.1
1 860956 861196 240 entg|SAMD11:ccds|CCDS2.2
1 864267 864387 120 entg|SAMD11:ccds|CCDS2.2
1 864490 864730 240 entg|SAMD11:ccds|CCDS2.2
2 9016029 9016269 240 entg|MBOAT2:ccds|CCDS1660.1
2 9061053 9061173 120 entg|MBOAT2:ccds|CCDS1660.1
3 15591480 15591600 120 entg|HACL1:ccds|CCDS2627.1
3 15596368 15596608 240 entg|HACL1:ccds|CCDS2627.1
3 15599383 15599503 120 entg|HACL1:ccds|CCDS2627.1
4 47648088 47648304 216 entg|CNGA1:ccds|CCDS43226.1
4 47713576 47713696 120 entg|NIPAL1:ccds|CCDS3479.1
4 47721794 47722154 360 entg|NIPAL1:ccds|CCDS3479.1
************************************************

******head of SureSelect_20091218_4_F3.unique.csfasta.ma.50.6*****************
>1_48_337_F3,3_137353925.5
T00120201000200020311112100111100230002023120222022
>1_49_100_F3,2_-86591224.5
T31202300221201300132312132112130032010203103221031
>1_50_326_F3,20_-43849935.6
T01311332010321311110222301111321200112222200011020
>1_51_297_F3,17_-26283180.4
T33003203132010100002232223221303000113022300222002
>1_54_263_F3,5_-102923763.3
T31120203312212121300220132222022210312001202012020
******************************************************

********targetMaps.pl**********
#usr/bin/perl
#AGRV[0] is the whole exome target locations file
open(FILE,$ARGV[0]) || die "unable to open file $ARGV[0]";
while (<FILE>)
{
$line = $_;
chomp $line;
$a[$i] =$line ;
$i++;
}

open (FILE2,$ARGV[1]) || die " unable to open the file $ARGV[1]";
#ARGV[1] is the .unique.csfasta.ma file
$count = 0; while (<FILE2>)
{
$line = $_;
chomp $line;
#>1303_25_81_F3,12_2661894.5
if($line =~ /^>\S+\_\S+\_\S+\_\S+\,(\S+)\_(\S+)\.\d+/)
{
$chr = $1;
$pos = $2;
if($pos =~ /^-(\d+)/){$pos = $1};
# print "$chr\t$pos\n";
for($j=0; $j<=$i;$j++)
{
$capture = $a[$j];
@d = split (/\t/, $capture);
if(($chr eq $d[0]) && ($pos >= $d[1] && $pos <= $d[2]))
{
# print "$chr\t$pos\n";
# print "$d[0]\t$d[1]\t$d[2]\n";
$count++;
}
}
}

}
print "\n Total reads mapped to target locations : $count \n";
*******************************

Perl program run :
perl targetMaps.pl SureSelect_All_Exon_G3362_with_names.v2.bed SureSelect_20091218_4_F3.unique.csfasta.ma.50.6

Thanks in advance.

Thanks,
Shibu

**quinlana** · 03-31-2010, 05:18 AM

17million reads compared to even a 1000 feature target file yields 17 billion comparisons in your loop in the worst case. 10000 features = 170 billion comparisons, etc.

You could implement a binning approach (see Kent et al 2002, the SAMTools paper, etc. for details). Alternatively, you could use intersectBed or coverageBed in BEDTools or use Galaxy.

This should run in a minute or so with BEDTools.

**shibujohn** · 04-01-2010, 11:48 AM

Thanks for your information. I am using the BEDTools to get this information,

**ilivyatan** · 02-12-2012, 12:15 AM

Help with .csfasta.ma parsing

Hi,
I understand from this thread that you're pretty familiar with .csfasta.ma format.
I'd like to convert it to a simple .bed.
Can someone help me with that. I see how you get the position in the perl script, but how do I understand where that position is in the reference?

Any help would be appreciated.

Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Corona Lite and coverage histogram

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News