SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOAP alignment format convert to SAM/BAM KevinLam Bioinformatics 31 01-10-2018 08:05 PM
SAM/BAM format to wiggle format pinki999 Bioinformatics 19 08-12-2015 12:35 AM
SAM to CUFFLINKS SAM format repinementer Bioinformatics 4 03-15-2012 08:53 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 02:09 AM
anyone help me on bowtie format -> sam format! tninja Bioinformatics 2 04-25-2010 09:33 PM

Reply
 
Thread Tools
Old 02-12-2010, 04:47 AM   #201
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Just modified a few lines of export2sam.pl to the new script sorted2sam.pl
sorted2sam.pl works very well to process S_N_sorted.txt (eland alignment) for single reads.
I did not test to it to parse paired end reads. It may not work well.

If someone wants to the script, please send email to jifeng.tang@keygene.com

Thank everyone for giving me some help.

By the way, i am going to the Illumina user symposim in April 27-29 in spain. I may meet someone of you there.
NSTbioinformatics is offline   Reply With Quote
Old 02-14-2010, 06:37 PM   #202
luisczul
Member
 
Location: Canada

Join Date: May 2009
Posts: 10
Default Maq2Sam not working propertly?

Hello,

I ran the script Maq2Sam on my mapping files and the pileup file coming from the sam file is not even as close (in size and number of bases assembled) as the maq pileup file.

Does anybody know how to fix this? Or maybe another way to go from Map. files from maq to samtools?

Thanks,
luisczul is offline   Reply With Quote
Old 02-16-2010, 07:22 PM   #203
wuhoucdc
Member
 
Location: Nashville

Join Date: Oct 2009
Posts: 14
Default

Hi lh3,

Does SAMtools call SV currently? Thanks

Wuhoucdc
wuhoucdc is offline   Reply With Quote
Old 02-17-2010, 06:05 AM   #204
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Try breakdancer.
lh3 is offline   Reply With Quote
Old 02-22-2010, 02:25 AM   #205
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

Question about the output of bwa?

I got the output, see below:
HWI-EAS307:1:54:758:902#0 20 19641_CLSZ1904.b1_P20.ab1_CLSZ_L._sativa_library_forward_335 301 20 36M * 0 0 CAAATCGGTGTGTTTTCACTGGTCGTGCTCGTTCCG aabaaaaaaaaababaa`aaaabaabaabbabaaaa XT:A:U NM:i:1 X0:i:1 X1:i:2 XM:i:1 XO:i:0 XG:i:0 MD:Z:35T0 XA:Z:13134_QGB27J17.yg.ab1_QGB_L._sativa_library_forward_448,-58,36M,2;7061_CLS_S3_Contig6993_CLS_S3_L._sativa_library_forward_968,-404,36M,2;

I can not understand the flag value 20. I used "samse" to process single reads.
"XT:A:U" indicates the read uniquely mapped to the reference, why i still got XA for alternative alignment inforamtion?
It is confused me. Someone could help me a bit for that? Thank you very much
NSTbioinformatics is offline   Reply With Quote
Old 02-23-2010, 05:26 AM   #206
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

This read is mapped to the junction between two adjacent reference, so it gets an "unmapped" flag.
lh3 is offline   Reply With Quote
Old 02-24-2010, 12:41 AM   #207
NSTbioinformatics
Member
 
Location: netherlands

Join Date: Apr 2009
Posts: 24
Default

I think, "4" is the "unmapped" flag, is it right?

What is difference between flag 4 and 20?

Thank you very much.
NSTbioinformatics is offline   Reply With Quote
Old 02-24-2010, 12:57 AM   #208
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by NSTbioinformatics View Post
I think, "4" is the "unmapped" flag, is it right?

What is difference between flag 4 and 20?

Thank you very much.
Use the "-X" option in "samtools view", it will probably help your interpretation of the FLAG field.
nilshomer is offline   Reply With Quote
Old 02-26-2010, 11:19 AM   #209
ylc
Junior Member
 
Location: United States

Join Date: Feb 2010
Posts: 2
Default pick chromosome before bam sort?

A newbie question:
I can view by chromosome after a .bam file is sorted and indexed. Is it possible to extract by a chromosome number from the bam file and then do sorting and indexing? It will save time if I'm only interested in certain chromosomes and have many samples.

Thanks.
ylc is offline   Reply With Quote
Old 02-28-2010, 07:55 PM   #210
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

Hi,

I have few queries about samtools.

1. I am using eland mapping output and start using export2sam.pl. All the PF reads from export are being used for down stream analysis.
2. How the uniquely mapped and multiple mapped are hadled during pileup command?
3. The extended CIGAR column always shows 35M (ie the length of the read). How did the mismatch information would be incorporated? The column 15 of export contains the match descriptor information.

Thanks.

Last edited by seq_GA; 02-28-2010 at 07:57 PM.
seq_GA is offline   Reply With Quote
Old 03-08-2010, 09:01 PM   #211
Solyris
Junior Member
 
Location: Singapore

Join Date: Mar 2010
Posts: 1
Default

Hi,

I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

I get the following error when i ran the command in Ubuntu OS

>./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
[samopen] SAM header is present: 24 sequences.
Parse error at line 113: CIGAR and sequence length are inconsistent
Aborted

I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

Any help is appreciated.

Thanks
Sol
Solyris is offline   Reply With Quote
Old 03-09-2010, 12:57 PM   #212
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by Solyris View Post
Hi,

I am quite new to NGS data here and I work with a commercial software from CLCbio which also offers a mapping algorithm of its own, called Genomic Workbench.

I would want to convert my SAM output from the software to BAM to allow using the samtools function like pileup.

I get the following error when i ran the command in Ubuntu OS

>./samtools view -huS -o DATA/test.bam DATA/s_2_1_sequence_SS200_LAwMM.sam
[samopen] SAM header is present: 24 sequences.
Parse error at line 113: CIGAR and sequence length are inconsistent
Aborted

I read somewhere in this thread that currently the samtools does not allow sam file processing without the reference sequence, so is the whats giving the problem? If so can anyone point me to a place to generate the correct reference sequence file, I tried reading through the manual but there is nowhere telling me how the reference file should be formatted. And I am looking at the whole human reference genome with 24 gbk files from NCBI.

Any help is appreciated.

Thanks
Sol
samtools performs some sanity checks in the CIGAR string and it is telling you something is not right. Have you looked to that particular alignment to confirm if the CIGAR is correct?
__________________
-drd
drio is offline   Reply With Quote
Old 03-15-2010, 08:09 AM   #213
GoneSouth
Member
 
Location: Vienna

Join Date: Aug 2008
Posts: 11
Default why do deletions in the pileup-file have a quality attached

Hi guys,

Does anyone know why deletions in the pileup file have an quality attached??? How can a deletion have a quality?
And how is this calculated??

For example:

YHet 23690 N 1 a-1n Q
YHet 23691 N 1 * [
YHet 23692 N 1 c [


or

YHet 25409 N 5 AAA-2NNa-2nnA-2NN VTW`a
YHet 25410 N 5 A$A$*** USR`a
YHet 25411 N 3 *** SG`


best ro
GoneSouth is offline   Reply With Quote
Old 04-05-2010, 11:47 AM   #214
jeffhsu3
Junior Member
 
Location: Cleveland, OH

Join Date: Jan 2010
Posts: 5
Default

If an insertion or deletion occurs at the end of the pileup read bases string, they don't seem to the extra character after the '\+[0-9]+[ACGTNacgtn]+' pattern.

For example:
chr1 2263 C 4 ,$.$.,+1t CC9C FFFF.

Am I missing something? The pattern is described here: pileup format, and it mentions the in/del pattern '\+[0-9]+[ACGTNacgtn]+' but there appears to be an extra character in the examples given on the page:

seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<

That extra character appears to be missing if the in/del occurs at the end of the read bases string. Including that extra character as part of the insertion/deletion it makes the read_bases match with the read number.

Last edited by jeffhsu3; 04-05-2010 at 12:03 PM. Reason: Made more clear and added examples.
jeffhsu3 is offline   Reply With Quote
Old 04-12-2010, 02:02 AM   #215
jdiezperezj
Junior Member
 
Location: spain

Join Date: Mar 2010
Posts: 3
Default

So, is it already possible to convert soap aligner output format to SAM or BAM formats.
Best.
Javi

Quote:
Originally Posted by lh3 View Post
To corthay:

You are quick. I am planning a new bwa release as I realized that I could improve it a little without much work (PS: the new version is released now). Wgsim, wgsim_eval.pl and converters for soap and bowtie are available from SVN only:

svn co https://samtools.svn.sourceforge.net...s/dev/samtools samtools
jdiezperezj is offline   Reply With Quote
Old 04-13-2010, 11:06 AM   #216
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default FLAGS for fusion detection

Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?
RockChalkJayhawk is offline   Reply With Quote
Old 04-13-2010, 01:04 PM   #217
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
Lets say I have RNA-Seq data (Paired-End) and I want to find out if the mates are mapped > 1 Mb on the same chromosome or map to 2 different chromosomes. How do I determine that from the FLAGS?
You can use the MRNM and MPOS fields in the SAM file.
nilshomer is offline   Reply With Quote
Old 04-13-2010, 01:14 PM   #218
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by nilshomer View Post
You can use the MRNM and MPOS fields in the SAM file.
So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

Is this correct?

Last edited by RockChalkJayhawk; 04-13-2010 at 01:17 PM. Reason: Incorrect assumption
RockChalkJayhawk is offline   Reply With Quote
Old 04-13-2010, 01:24 PM   #219
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by RockChalkJayhawk View Post
So in that case, my MRNM does not equal "=" OR MRNM equals "=" and the difference between POS and MPOS > 1 million.

Is this correct?
Perfect!
nilshomer is offline   Reply With Quote
Old 04-13-2010, 01:26 PM   #220
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Quote:
Originally Posted by nilshomer View Post
Perfect!
Thanks Nils! Youre the best!
RockChalkJayhawk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO