SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAP alignment format convert to SAM/BAM KevinLam Bioinformatics 31 01-10-2018 08:05 PM
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 12:35 AM
SAM to CUFFLINKS SAM format repinementer Bioinformatics 4 03-15-2012 08:53 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 02:09 AM
anyone help me on bowtie format -> sam format! tninja Bioinformatics 2 04-25-2010 09:33 PM

Reply
 
Thread Tools
Old 10-22-2010, 04:00 AM   #261
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by carole_smadja View Post
Dear All,

I have generated a mapping assembly in fasta format and I now need to convert it into the sam format accepted by samtools.
Would anybody know a fasta2sam converter I could get access to?

Thank you very much for your assistance.

Carole
That isn't possible - FASTA files don't have enough information to describe an assembly, they can only be used to store the raw reads (without qualities or mapping information) or the contigs (without qualities or mapping information).

What assembly tool did you use? Does it have any other output files?

Last edited by maubp; 10-22-2010 at 04:12 AM. Reason: fixed typo
maubp is offline   Reply With Quote
Old 10-22-2010, 04:14 AM   #262
carole_smadja
Junior Member
 
Location: Montpellier

Join Date: Oct 2010
Posts: 5
Default

I carried out a NimbleGen array capture experimentm followed by 454 sequencing. I first used gsMapper to get a mapping assembly (output ace and fasta alignments). However, I did perform a series of subsequent manipulations : a second assembly using SSAHA2 (fasta output), some curation steps and a division of the initial alignment into segments of invariable depth of coverage (still as fasta). what would you recommend?

Thanks
Carole
carole_smadja is offline   Reply With Quote
Old 10-22-2010, 04:19 AM   #263
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by carole_smadja View Post
I carried out a NimbleGen array capture experimentm followed by 454 sequencing. I first used gsMapper to get a mapping assembly (output ace and fasta alignments). However, I did perform a series of subsequent manipulations : a second assembly using SSAHA2 (fasta output), some curation steps and a division of the initial alignment into segments of invariable depth of coverage (still as fasta). what would you recommend?

Thanks
Carole
Search for ACE to SAM/BAM conversion. It is possible, but you will also
need the original SFF files (or FASTQ or QUAL) for the read qualities.
maubp is offline   Reply With Quote
Old 10-24-2010, 08:11 PM   #264
xmluo
Junior Member
 
Location: Canada

Join Date: Nov 2009
Posts: 1
Default

I used maq2sam-long to convert maq output to sam format, but all pairing information is missed in the results: MRNM is "*", and MPOS and ISIZE are 0. Can you recommend how to get these information? Thanks, Mei
xmluo is offline   Reply With Quote
Old 11-25-2010, 05:10 AM   #265
maximilianh
Member
 
Location: UK

Join Date: Oct 2009
Posts: 15
Default

The MACS distribution includes two scripts: elandexport2bed.py elandmulti2bed.py elandresult2bed.py

These might be more up-to-date than your scripts and easier to use in some cases.
maximilianh is offline   Reply With Quote
Old 12-08-2010, 01:45 PM   #266
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

What is the safest way to reheader a BAM file generated by an alignment to the human_g1k_v37.fasta genome, i.e.
Quote:
cat GRCh37/human_g1k_v37.fasta.fai
1 249250621 52 60 61
2 243199373 253404903 60 61
3 198022430 500657651 60 61
4 191154276 701980507 60 61
5 180915260 896320740 60 61
6 171115067 1080251307 60 61
7 159138663 1254218344 60 61
8 146364022 1416009371 60 61
9 141213431 1564812846 60 61
10 135534747 1708379889 60 61
11 135006516 1846173603 60 61
12 133851895 1983430282 60 61
13 115169878 2119513096 60 61
14 107349540 2236602526 60 61
15 102531392 2345741279 60 61
16 90354753 2449981581 60 61
17 81195210 2541842300 60 61
18 78077248 2624390817 60 61
19 59128983 2703769406 60 61
20 63025520 2763883926 60 61
21 48129895 2827959925 60 61
22 51304566 2876892038 60 61
X 155270560 2929051733 60 61
Y 59373566 3086910193 60 61
MT 16569 3147273397 70 71
GL000207.1 4262 3147290265 60 61
GL000226.1 15008 3147294661 60 61
GL000229.1 19913 3147309982 60 61
GL000231.1 27386 3147330289 60 61
GL000210.1 27682 3147358194 60 61
GL000239.1 33824 3147386400 60 61
GL000235.1 34474 3147420850 60 61
GL000201.1 36148 3147455961 60 61
GL000247.1 36422 3147492774 60 61
GL000245.1 36651 3147529866 60 61
GL000197.1 37175 3147567190 60 61
GL000203.1 37498 3147605047 60 61
GL000246.1 38154 3147643232 60 61
GL000249.1 38502 3147682084 60 61
GL000196.1 38914 3147721290 60 61
GL000248.1 39786 3147760915 60 61
GL000244.1 39929 3147801427 60 61
GL000238.1 39939 3147842084 60 61
GL000202.1 40103 3147882751 60 61
GL000234.1 40531 3147923585 60 61
GL000232.1 40652 3147964854 60 61
GL000206.1 41001 3148006246 60 61
GL000240.1 41933 3148047993 60 61
GL000236.1 41934 3148090687 60 61
GL000241.1 42152 3148133382 60 61
GL000243.1 43341 3148176299 60 61
GL000242.1 43523 3148220425 60 61
GL000230.1 43691 3148264736 60 61
GL000237.1 45867 3148309218 60 61
GL000233.1 45941 3148355912 60 61
GL000204.1 81310 3148402681 60 61
GL000198.1 90085 3148485409 60 61
GL000208.1 92689 3148577058 60 61
GL000191.1 106433 3148671355 60 61
GL000227.1 128374 3148779625 60 61
GL000228.1 129120 3148910202 60 61
GL000214.1 137718 3149041537 60 61
GL000221.1 155397 3149181614 60 61
GL000209.1 159169 3149339664 60 61
GL000218.1 161147 3149501549 60 61
GL000220.1 161802 3149665445 60 61
GL000213.1 164239 3149830007 60 61
GL000211.1 166566 3149997047 60 61
GL000199.1 169874 3150166453 60 61
GL000217.1 172149 3150339222 60 61
GL000216.1 172294 3150514304 60 61
GL000215.1 172545 3150689533 60 61
GL000205.1 174588 3150865017 60 61
GL000219.1 179198 3151042578 60 61
GL000224.1 179693 3151224826 60 61
GL000223.1 180455 3151407577 60 61
GL000195.1 182896 3151591103 60 61
GL000212.1 186858 3151777111 60 61
GL000222.1 186861 3151967147 60 61
GL000200.1 187035 3152157186 60 61
GL000193.1 189789 3152347402 60 61
GL000194.1 191469 3152540418 60 61
GL000225.1 211173 3152735142 60 61
GL000192.1 547496 3152949898 60 61
to something that would be more compatible with the UCSC-centric Bioconductor stack, i.e.

Quote:
cat hg19.fasta.fai
chr1 249250621 6 50 51
chr2 243199373 254235646 50 51
chr3 198022430 502299013 50 51
chr4 191154276 704281898 50 51
chr5 180915260 899259266 50 51
chr6 171115067 1083792838 50 51
chr7 159138663 1258330213 50 51
chr8 146364022 1420651656 50 51
chr9 141213431 1569942965 50 51
chr10 135534747 1713980672 50 51
chr11 135006516 1852226121 50 51
chr12 133851895 1989932775 50 51
chr13 115169878 2126461715 50 51
chr14 107349540 2243934998 50 51
chr15 102531392 2353431536 50 51
chr16 90354753 2458013563 50 51
chr17 81195210 2550175419 50 51
chr18 78077248 2632994541 50 51
chr19 59128983 2712633341 50 51
chr20 63025520 2772944911 50 51
chr21 48129895 2837230949 50 51
chr22 51304566 2886323449 50 51
chrX 155270560 2938654113 50 51
chrY 59373566 3097030091 50 51
chrM 16571 3157591135 50 51
So I want to toss the unscaffolded and haplotyped sequences and rename the rest.
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 12-08-2010, 09:51 PM   #267
ashokrags
Junior Member
 
Location: Boston, MA, USA

Join Date: Dec 2010
Posts: 8
Default

perhaps try using reheader option in samtools. i think you can filter out those reads yo want using the view option and then use the reheader option
ashokrags is offline   Reply With Quote
Old 12-20-2010, 09:47 AM   #268
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

Does anyone know what has happened to indel lines in the new samtools mpileup format? In the old pileup, each indel is followed by an additional line carrying some further info - when I run mpileup (samtools 0.1.12a) I don't get any such lines. The old pileup also has a flag ("-i") for outputting only indel variants - there does not seem to be such an option for mpileup.
gaffa is offline   Reply With Quote
Old 02-12-2011, 12:41 PM   #269
andrewm
Junior Member
 
Location: new york

Join Date: Nov 2010
Posts: 8
Default samtools index and alternative alignments

I would like to index/search the alternative alignments found by bwa with samtools.

Currently it seems like samtools index does not index the alternative alignments from the XA flag. It would be great if this worked (maybe as an option).

Another possibility would be for bwa to output alternative alignments as another sam line.

Are either of these possible or planned for future releases?

Andrew

Last edited by andrewm; 10-15-2012 at 07:44 AM.
andrewm is offline   Reply With Quote
Old 02-14-2011, 12:50 AM   #270
seqsyd
Junior Member
 
Location: london

Join Date: Jan 2011
Posts: 3
Default

can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks
seqsyd is offline   Reply With Quote
Old 02-14-2011, 03:07 AM   #271
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by seqsyd View Post
can any 1 tell me please how can i install SAMtools on my windows operating system...Help Please...thanks
Have you tried the provided Windows binaries on sourceforge? i.e. samtools-0.1.12a_i386-win32.zip

[I'd suggest getting hold of a Linux (or Mac OS X) machine if you can, since not that many of the NGS tools will work on Windows]
maubp is offline   Reply With Quote
Old 04-15-2011, 06:43 AM   #272
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default optional fields in SAM format

Could anybody advise me the best way to parse optional fields in SAM file? I'm interested in tags X0 and XA, and the problem is that they do not have a permanent place in the output string (e.g., sometimes 13th or 14th, 18th, 19th or 20th). Certainly, I can do it myself in perl, but such parsing would take too much time and resources.
What I'm looking for is:
1) either a way to control output tags (their number and order)
2) or (better) some bioperl module, like blast parser, that could parse it effectively.
thanks!
ElMichael is offline   Reply With Quote
Old 04-15-2011, 06:48 AM   #273
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

For perl, very easy and efficient:

perl -ne 'print "$1\n" if /XA:i\d+)/'
lh3 is offline   Reply With Quote
Old 04-15-2011, 06:52 AM   #274
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

Quote:
Originally Posted by lh3 View Post
For perl, very easy and efficient:

perl -ne 'print "$1\n" if /XA:i\d+)/'
It's exactly what I want to avoid - checking each line with IF condition.
ElMichael is offline   Reply With Quote
Old 04-15-2011, 06:57 AM   #275
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

This is fast. If present, the ordering of tags are fixed. You can

if /X0:i:(\d+).*XA:Z:(\S+)/

to grep out everything with a single pass. You need to read and split after all. Regex matching should be of similar speed, if not faster. The bad implementation is to loop through the optional tags and then try to find matching.

Last edited by lh3; 04-15-2011 at 06:59 AM.
lh3 is offline   Reply With Quote
Old 04-15-2011, 10:57 AM   #276
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

Thank you, Heng!
ElMichael is offline   Reply With Quote
Old 04-21-2011, 08:39 AM   #277
Strand SI
The Avadis NGS Team
 
Location: All over the World

Join Date: Feb 2011
Posts: 26
Default

I converted some ELAND export files to SAM and now tried to make BAM files out of those, but it is complaining about @SQ headers not being there?!
Does the export2sam.pl tool not create a SAM file format that is correct??
Strand SI is offline   Reply With Quote
Old 06-10-2011, 03:37 PM   #278
oudacontrol
Junior Member
 
Location: Bay Area

Join Date: Jan 2011
Posts: 3
Default

I am trying to view a portion of a sorted, indexed bam file and I am getting the error:
[sam_header_read2] 26 sequences loaded.
[main_samview] random alignment retrieval only works for indexed BAM files.

It is already sorted and indexed.... I have no idea how to resolve this.
oudacontrol is offline   Reply With Quote
Old 06-20-2011, 01:56 PM   #279
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

@oudacontrol: You should post the command you used. It seems like the command may have been mis-formatted, but there's no way to tell if you don't include it.
jnfass is offline   Reply With Quote
Old 08-03-2011, 05:48 AM   #280
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Dear all,

I'm had the same problem as oudacontrol above and Jeckow earlier in this thread.

Trying to get samtools view to extract one chromosome from a sorted and indexed bam file fails:
Code:
[sam_header_read2] 84 sequences loaded.
[main_samview] random alignment retrieval only works for indexed BAM files.
It took me a long while to figure out what caused it, so I thought I'd post my simple solution for poor sobs like me...

Code:
samtools sort in.bam in.sorted
Code:
samtools index in.sorted.bam
Code:
samtools view -bh -t human_g1k_v37.fa.fai -o in.sorted.chr9.bam in.sorted.bam 9
As it turned out, I should not have used the -t option. Without it, view finished as it should have, leaving me with a nice chr9-only bam file! (at least, it's a 279M file, I'll have to check later if trackster or igv will load it )

Cheers,
Bruins
Bruins is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO