SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MAQ Map/Match Problem Fritz_Doll RNA Sequencing 1 06-03-2012 03:38 AM
maq match Lobaria Bioinformatics 1 07-13-2010 03:46 AM
maq map command... hcpark79 Bioinformatics 2 12-05-2009 06:29 AM
maq rmdup command fadista Bioinformatics 0 05-07-2009 10:45 AM
MAQ command jyli Illumina/Solexa 0 01-26-2009 10:43 AM

Reply
 
Thread Tools
Old 04-15-2009, 07:27 AM   #1
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default maq match command

Dear all,

I have a text file generated from the maq match command aligning paired end short reads to a reference genome. Any ideas how to filter out poor quality reads from this file, i.e. those reads that have been mapped to more than 1 location in the genome?

How are people generally dealing with multiple hits from single read in the genome?

Thanks for any help

L
Layla is offline   Reply With Quote
Old 04-15-2009, 10:31 AM   #2
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

It all depends!

Which text file are you referring to? Pileup? Dumped hits? If you specify how exactly you generated it, that will help others help you ...

Also, for the map and downstream steps (consensus, SNP calling), maq puts a read that maps equally well (and satisfies the specified cutoffs) to multiple locations in the reference, in one of those locations ... randomly. This may or may not suit your needs, but as far as I know, there's no way to change it. You should be able to determine which reads map multiple times (and thus exclude them in a second round of mapping) by parsing the dumped hits file .. specified in the '-H' option to the map/match command ...

Hope that helps
~Joe
jnfass is offline   Reply With Quote
Old 04-16-2009, 03:12 AM   #3
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

The file I was referring to was generated using the maq match command. I have re-run the maq match command using the -H and -u options. Do you think the multiple matches should be removed and the maq match command run again or would it be sufficient to remove the multiple matches and move on to do a pileup.

At this stage I shall focus on correctly paired reads (flagged 18), remove multiple hits (flag of 0) and also low mapping quality scores (<30).

Any other suggestions or comments how people would go about cleaning their chip-seq data?

Cheers

L
Layla is offline   Reply With Quote
Old 04-16-2009, 09:32 AM   #4
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Sounds like you're talking about the mapview output ... generated from the "mapview" command, using the binary map file generated by the map/match command. I'm not as familiar with that file - for instance, I didn't know that there was a flag for multiply mapped reads in mapview's output - but it sounds like you've got a good strategy for parsing that file and filtering your pairs.
jnfass is offline   Reply With Quote
Old 04-17-2009, 03:09 AM   #5
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Thanx Joe..

How did you convert the file created using the -H option in ./maq match command. The -H option was to generate the multiple hits and created a binary file. The ./maq mapview conversion does not work as it does for out.map. Is there a way to convert this binary file to text?

Cheers
L
Layla is offline   Reply With Quote
Old 04-17-2009, 05:05 AM   #6
Owen
Junior Member
 
Location: Leicester, UK

Join Date: Nov 2008
Posts: 1
Default

Layla, the file created with the -H option is not actually a binary file but a gzipped text file with information about the multiply mapped reads. I had the same confusion and finally figured this out!
Owen is offline   Reply With Quote
Old 04-20-2009, 08:47 AM   #7
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I figured the same fact as Owen explains, its gzipped!
For multiply mapped reads from mapview result, the reads with 0 quality are mapped to multiple locations, using -q 1 should do the trick in excluding multiply-mapped reads

Thoughts?
bioinfosm is offline   Reply With Quote
Old 04-22-2009, 03:42 PM   #8
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

I think q -1 sounds like a valid option. Or you can simple grep for reads with the 0 flag and remove them before down-stream processing.

Whilst on this note, anyone used SISSR instead of Maq? And if so, any thoughts on what to do with the data after SISSR gives 80,000 potential binding sites with p values < 0.001, high tag counts and fold changes?

The data never simplifies!!!!

L
Layla is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO