![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Perl script to convert BAM to BED? | rebrendi | Bioinformatics | 3 | 03-22-2012 10:34 PM |
just perl script | semna | Bioinformatics | 3 | 07-02-2011 08:42 AM |
vcftools perl script | weiyulin | Bioinformatics | 6 | 12-09-2010 02:13 PM |
perl script | bioenvisage | Bioinformatics | 5 | 02-01-2010 08:11 AM |
perl script | bioenvisage | Bioinformatics | 0 | 02-01-2010 07:23 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: it Join Date: Oct 2009
Posts: 40
|
![]()
Can any one help me with a perl script for removing the repeats in the reads , for eg i will paste the format of the seq below
HWI-EAS373:2:100:1792:1509#0/1 AAAAAAAAAAAAAAAAAACAACAAAAAAACAAAACAAAAACAAAACCAACACC ]_a`_Z_IT`b_\[_Ya\[\[]S\[RHUR^a^YY_V]aa^[TaW\Y\W_`^][aYR_BBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1792:1509#0/2 ACACACACATGGTCCACCATATTTTTTTACTTGGTTGTA aRaPZ__\__]VG[]RMGX\_Z_aa_P_NQ[_\VTFZTOa`R_[Q]ZZZXaBBBBBBBBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1792:1691#0/1 ACACACACAGTGTAGCTGGGGAGCAGGGATCCATTGATC abaa^]Waa]b_`Vb_b`aa[^`aa_aaXD^H]`]QWYa`ZaZaH]`TMS]`^BBBBBBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1792:1691#0/2 GGCTTTTTTGGTATCCTTTTCTCATGTTAGATGATGGGAGCATTTTTCTTCAGTgggatggatggtctggtagggc a^aY`_aaVa`UUabWaWa_bab_`a`b`aaOb``YN[a]GR`a`a`ba]_[J[XYBBBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1792:198#0/1 CGGCATTCCTTTTATTATAGCCCCTCTAGCTAGTTACAGTAGATAGGAACGtgcatgaatctntaaatggntgnan aZ]`]ab``aaab`a`]`YT`a^`aa`UZ\^X_Y]^Z^aYY[TYV[\XVLYBBBBBBBBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1792:198#0/2 agCTGATCTAGCGTCGTCTGCAACAACAACCGCGGGGGCGTCatcaacggcaagtgcggctcagcctcgggtgttg HOT_TTGYZGV_]GUQ_XNGSQZ\QIYTXT\_RKQGGL]O\ZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NF HWI-EAS373:2:100:1793:876#0/1 CCNCTGCCTCTACCTCCACGCCCTCGGCCTCTGCCACGCCCGCGGCCTGTATCTccagtgctctactcgcacanan `WDV^`a``a`aa^a`_aa[a]aY[`a\][a`\^``\`\]\^^S]Z[ZXW]ZP\SQBBBBBBBBBBBBBBBBBBBB NF Last edited by bioenvisage; 01-28-2010 at 05:43 AM. |
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
I'm having trouble making out what the data should look like - try wrapping the example with code tags,
[ code ] sequence data [ /code ] |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
By repeats in the reads do you mean
(a) Within a read, the repetition of a single nucleotide or simple sequence (b) For sets of reads which are identical (presumed PCR duplicates), report only one (c) Something else SAMTools will do (b) |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: SF Bay Area Join Date: Jan 2010
Posts: 4
|
![]()
If you just want to remove homopolymers of DNA of some arbitrary length, use something like:
Code:
$min = 4; while (<>) { s/(G){$min,}|(A){$min,}|(T){$min,}|(C){$min,}/$1$2$3$4/g; print; } If your sequencing method is generating spurious homopolymers you will need a much more sophisticated approach to determining which ones are real. |
![]() |
![]() |
![]() |
#5 |
Member
Location: it Join Date: Oct 2009
Posts: 40
|
![]()
hi krobison ...iam telling about with in the read the repetation of single nucleotide and also simple repeats.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|