Strategy for semi-automatic functional annotation and curation:

RIKEN has been working on the mouse full-length cDNA encyclopedia project since 1995. We have been focusing on the collection and sequencing of more than one million mouse cDNAs, Phase I of the project. In Phase II, we have re-arrayed the non-redundant clones and produced full-length sequence for those clones. Functional annotation of the full length mouse cDNAs and deposition of their sequence data with the annotation into the public databases will contribute to the progress of science.

In order to assign functional annotation to uncharacterized cDNAs, we have been developing a semi-automatic annotation tool which refers to the results from the following:

1. homology search including search for orthologous database (human, rat, drosophila, C. elegans, yeast,)
2. well-known protein motif search using Pfam and Prosite, and
3. other data such as, expression data, protein-protein interaction data and other data as may be applicable.

We use the term "functional annotation of genes" to refer to the assignment of attributes to genes. The attributes include Gene Ontology terms, classified into three categories;

consisting of authorized vocabularies by the Gene Ontology Consortium, loci on chromosomes, related disorders and so on.

However, there are limits to our semi-automatic methods similar in ways used in other databases such as Unigene. For example, curation by biologists is always necessary when annotating genes for which BLAST searches result in only low-similarity matches in E-value.

Based on these issues, we believe we should discuss what is necessary for the functional annotation for the mouse full length cDNAs. Some of the points which need to be discussed include; what is necessary for biologists to curate and the rules of functional annotation. We then want to annotate the mouse full length cDNAs as adequately as possible with experts in the fields of bioinformatics, genome science, biology and other fields during the proposed meeting.

Therefore, we have decided to hold a meeting for annotating our mouse full length cDNA, named FANTOM (Functional ANnoTation Of Mouse) meeting.