SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl script to convert BAM to BED? rebrendi Bioinformatics 3 03-22-2012 10:34 PM
just perl script semna Bioinformatics 3 07-02-2011 08:42 AM
vcftools perl script weiyulin Bioinformatics 6 12-09-2010 02:13 PM
perl script bioenvisage Bioinformatics 5 02-01-2010 08:11 AM
perl script bioenvisage Bioinformatics 0 02-01-2010 07:23 AM

Reply
 
Thread Tools
Old 01-28-2010, 05:40 AM   #1
bioenvisage
Member
 
Location: it

Join Date: Oct 2009
Posts: 40
Default Perl script

Can any one help me with a perl script for removing the repeats in the reads , for eg i will paste the format of the seq below

HWI-EAS373:2:100:1792:1509#0/1 AAAAAAAAAAAAAAAAAACAACAAAAAAACAAAACAAAAACAAAACCAACACC ]_a`_Z_IT`b_\[_Ya\[\[]S\[RHUR^a^YY_V]aa^[TaW\Y\W_`^][aYR_BBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1792:1509#0/2 ACACACACATGGTCCACCATATTTTTTTACTTGGTTGTA aRaPZ__\__]VG[]RMGX\_Z_aa_P_NQ[_\VTFZTOa`R_[Q]ZZZXaBBBBBBBBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1792:1691#0/1 ACACACACAGTGTAGCTGGGGAGCAGGGATCCATTGATC abaa^]Waa]b_`Vb_b`aa[^`aa_aaXD^H]`]QWYa`ZaZaH]`TMS]`^BBBBBBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1792:1691#0/2 GGCTTTTTTGGTATCCTTTTCTCATGTTAGATGATGGGAGCATTTTTCTTCAGTgggatggatggtctggtagggc a^aY`_aaVa`UUabWaWa_bab_`a`b`aaOb``YN[a]GR`a`a`ba]_[J[XYBBBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1792:198#0/1 CGGCATTCCTTTTATTATAGCCCCTCTAGCTAGTTACAGTAGATAGGAACGtgcatgaatctntaaatggntgnan aZ]`]ab``aaab`a`]`YT`a^`aa`UZ\^X_Y]^Z^aYY[TYV[\XVLYBBBBBBBBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1792:198#0/2 agCTGATCTAGCGTCGTCTGCAACAACAACCGCGGGGGCGTCatcaacggcaagtgcggctcagcctcgggtgttg HOT_TTGYZGV_]GUQ_XNGSQZ\QIYTXT\_RKQGGL]O\ZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NF
HWI-EAS373:2:100:1793:876#0/1 CCNCTGCCTCTACCTCCACGCCCTCGGCCTCTGCCACGCCCGCGGCCTGTATCTccagtgctctactcgcacanan `WDV^`a``a`aa^a`_aa[a]aY[`a\][a`\^``\`\]\^^S]Z[ZXW]ZP\SQBBBBBBBBBBBBBBBBBBBB NF

Last edited by bioenvisage; 01-28-2010 at 05:43 AM.
bioenvisage is offline   Reply With Quote
Old 01-28-2010, 07:43 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I'm having trouble making out what the data should look like - try wrapping the example with code tags,
[ code ] sequence data [ /code ]
maubp is offline   Reply With Quote
Old 01-28-2010, 08:14 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

By repeats in the reads do you mean
(a) Within a read, the repetition of a single nucleotide or simple sequence
(b) For sets of reads which are identical (presumed PCR duplicates), report only one
(c) Something else

SAMTools will do (b)
krobison is offline   Reply With Quote
Old 01-28-2010, 10:14 AM   #4
Dave S.
Junior Member
 
Location: SF Bay Area

Join Date: Jan 2010
Posts: 4
Default

If you just want to remove homopolymers of DNA of some arbitrary length, use something like:
Code:
$min = 4;

while (<>)
{
s/(G){$min,}|(A){$min,}|(T){$min,}|(C){$min,}/$1$2$3$4/g;
print;
}
You are probably better off determining where and when they occur before wiping them out, e.g. see http://www.bioperl.org/wiki/Finding_...hes_in_contigs

If your sequencing method is generating spurious homopolymers you will need a much more sophisticated approach to determining which ones are real.
Dave S. is offline   Reply With Quote
Old 01-28-2010, 12:25 PM   #5
bioenvisage
Member
 
Location: it

Join Date: Oct 2009
Posts: 40
Default

hi krobison ...iam telling about with in the read the repetation of single nucleotide and also simple repeats.
bioenvisage is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO