SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Which organism has the best GO annotations? RNAddict Bioinformatics 8 08-21-2012 06:51 AM
annotations roshanbernard Bioinformatics 3 01-18-2012 11:14 PM
annotations intervals problem roshanbernard Bioinformatics 1 01-18-2012 05:28 PM
Annotations of rRNA genes steven General 0 09-08-2010 06:31 AM
Annotations using GS Mpper pmishra1982 Bioinformatics 4 02-24-2010 10:13 AM

Reply
 
Thread Tools
Old 12-03-2012, 10:38 PM   #1
masylichu
Member
 
Location: Beijing, China

Join Date: Oct 2010
Posts: 30
Default How to merge the annotations

Hello,

I have the lincRNA annotation file from UCSC NR*, GENCODE, and other published lincRNA collections. However, i want to merge them into one larger lincRNA collections. what pipeline can do this ?
masylichu is offline   Reply With Quote
Old 12-03-2012, 11:32 PM   #2
pallevillesen
Member
 
Location: Bioinformatics Research Center, Aarhus University, Denmark

Join Date: May 2012
Posts: 19
Default

cat file1 >combinedset.txt
cat file2 >>combinedset.txt
cat file3 >>combinedset.txt

If you need to reformat:
# Column 1,2,3
cat file1 | awk -v "OFS=\t" '{ print $1, $2,$3;} >combinedset.txt
# Column 3,4,5
cat file2 | awk -v "OFS=\t" '{ print $3, $4,$5;} >>combinedset.txt
# Column 1, 2,3 : change col 2 from 1 based to 0 based
cat file3 | awk -v "OFS=\t" '{ print $1, int($2)-1, $3;} >>combinedset.txt
pallevillesen is offline   Reply With Quote
Old 12-05-2012, 10:46 PM   #3
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

Quote:
Originally Posted by masylichu View Post
Hello,

I have the lincRNA annotation file from UCSC NR*, GENCODE, and other published lincRNA collections. However, i want to merge them into one larger lincRNA collections. what pipeline can do this ?
can you paste those linCRNA annotation file's weblinks out? i want it either

Last edited by zinky; 12-05-2012 at 10:49 PM.
zinky is offline   Reply With Quote
Old 12-05-2012, 11:12 PM   #4
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

in your merging i assume you might need to check each separate annotation for duplicates between annotations. is that the case?

if not then 'catting' them together is the right thing to do (assuming you're using a *nix) based system or cygwin in windows. just a dorky note...you can do those cat's in one line:

Code:
cat file1 file2 file3 > combinedset.txt
and you could also do the reformats in one line:

Code:
cat <(cut -f1,2,3 file1) <(cut -f3,4,5 file2) <(cut -f1,2,3 file3) > combinedset.txt
sdriscoll is offline   Reply With Quote
Old 12-06-2012, 12:48 AM   #5
pallevillesen
Member
 
Location: Bioinformatics Research Center, Aarhus University, Denmark

Join Date: May 2012
Posts: 19
Default

Ok, if you end up with something like:

chr1 1002 9005 linRNA1 . + (BED FORMAT)

Then you can

cat combinedfile.bed | sort -k1,1 -k2,2n | uniq >combined.sorted.collapsed.bed

Then it is sorted by chromosome and only contains unique entries.
pallevillesen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO