Biocompass: get your bioinformatics direction

Submitted by mattions on Mon, 05/12/2008 - 22:38

When I was at the predoc course in Heidelberg I saw that a lot of my friends, real experimental biologists, where a little bit confused in which tool use to do a bioinformatic job and where to grab it.

On top of that there was another problem that was related to the big number of bioinformatics software available out there and how to make a choice between them.

So I come up with the idea of the biocompass that I will try to explain.

Biocompass will be a web-portal where scientists can ask question about specific bioinformatics topic and other user can try to give an answer, giving the idea where it's worth to look.

Let me try to explain it with some examples:

Sarah is an experimental biologist and she has a sequence of a DNA. She want to know if this sequence is a gene and, in that case, if it function is known. However she doesn't know how to do it.

She comes to biocompass web-portal and post the question over there. The systems checks if there are similar questions already asked, if they are the system is going to show the first 5 hits that are similar to the problem proposed by Sarah.

If Sarah is satisfied with it, she will accept the answer and the system will store original question into the database for further reference and to increase the precision of the system.

In the other case Sarah decide to post the question to the attention of the other people subscribed to the portal.

The system will ask her to categorize the question into one of the available categories, like for example genetic, molecular dynamic, philogenetic, simulation, ...

She will be guided through an easy but systematica process to end up with a really well organized entry.

The system will propose a set of tags already present in the database to atick them to the question, to have a more fine-grained search later on.

Now comes another important innovation about biocompass, the filtering. Nowadays there is too much information that cannot be processed in a reasonable amount of time. That's why we need system to filter and select only the information that we really want to know.

Andrea is a bioinformatician working on genetic alignment. She has subscribed to biocompass and she decided to follow the genetic category only. This means that she is going to receive update only when there is a new question in the genetic category, ignoring all the others; they are not is field and she cannot be helpful over there.

Andrea sees the new question coming. She is following the new questions with a RSS feed that she prefer over the mail system.

She knows the answer. A BLAST search can give an hint about the DNA sequence. She suggest to use BLAST as a starting point for the research, linking it to the BLAST page entry.

This is an internal page in biocompass, where there is a small description about the software, the link to the ufficial website, and the number of user that have found the tool useful or not to solve this problem.

The rating is given by other user that has found the software good to solve this kind of problem, giving an idea how good and useful is the software itself.

The system shows also a bunch of related softwares that can be useful to solve this kind of question (ClustulW, BLAT, ...)

Sarah gets the update from biocompass with the answer coming from Andrea. She now can start the investigation about her DNA sequence.

Here we are. Biocompass has act as a real compass, to give the direction where to start. This time the needle was pointing to BLAST direction, next time is going to point somewhere else.

It's definitely a problem

But you could always just hassle Matias on Google Talk until he gives you the answer ;)

As Matias knows too well, I have this kind of problem quite a bit, and I would consider myself fairly well-informed about informatics relative to most other experimental biologists - meaning that, as you say, most people wouldn't have a clue where to start. It's nice to be able to do your own informatics on your data, but an informatics friend is really a prerequisite - something like this would be an informatics friend for everyone :)

While we're at it, these are the (er, some) other top problems I have as a rookie informatician:

- Not being able to find the bloody program/database on the Sanger network without a world of pain, even when I know what it is I need
- Crap instructions on how to install stuff for people without root privileges
- Parameters/methods etc defined in purely computing terms and not biological ones

quite familiar..

I can definitely relate to the 3 problems you're having, Steve.. I've pretty much given up trying to find the tools on the Sanger site, and unless they are on the farm in the usr/local/... directories, I just install them myself. Which is definitely a hassle without the root privileges :)

The Biocompass idea itself is really useful - the practice is all about execution to make it usable. Perhaps a natural way of thinking about it is how to add more value to a forum, where you just ask a question and get answers from people who are in a position to reply. You're trying to automatically filter questions to relevant groups of threads and to relevant people, and also search for similar questions that have already been asked. Semantic analysis would be a cool tool to try, but even simple word-based tagging could work quite well.

Great idea

I also think this is a very nice idea. It brings extra value to end users from for example wikis (more organised information and therefore easier to query and to make understandable to the user). I think one incentive to make people post stuff on such a site would be to get their contributions clearly recognised and connected to a real person. This also makes the information more trustworthy (for example they would log in using OpenID).

Another thought

It's a really nice thought to have the people who pose the questions and the people who answer the questions to both help in adding extra information about the questions (this question was related to this and this category or tag, etc). It might work much better in the scientific community than for a larger cloud of people.

Interpreting the queries such as to connect them with existing questions is the technically nasty part (semantic queries).

Handy semantic search library

Related to the issue of semantic search, here's a useful open source library (made by IBM), available as an Apache Incubator project:
http://incubator.apache.org/uima/