The bioinformatics survey that we have spoken before is over.
Same of the inital charts are already available:
Number of entries:
The data will be released soon so it would be possible to analyze them and have at least an initial figure of the bioinformaticians in the world.
I've just discovered on bionformatics zen a survey to collect data about the bioinformaticians around the world and to see what's going on.
The survey is up from the 1st of July and it will be on until the 1st of August.
Everybody who would like to analyze the data can contribute later than that date and the analyzed data should be available from (hopefully) the 1st of September.
The survey is here.
Fill it if you have 5 mins. :)
This entry in the Science magazine news blog outlines a pretty strange motive for someone to go for a career in science: The Power of Poop.
[...] For marine scientist Ellen Prager (left), the moment came 25 years ago when she took a summer job helping researchers who spent a lot of time swimming behind parrotfish with bags in hand to collect poop as it plopped out of their bottoms. "If that was science, I thought to myself, I could do it too," Prager told the audience.
Well Matias and I have been spotting these for a while so I thought this could be an online wall of fame/shame for the worst puns ever to appear in titles of papers in otherwise respectable journals. This was prompted by my discovery today of this - the culprit/genius punster being my supervisor in this case!
http://www.ncbi.nlm.nih.gov/pubmed/9843198
Groan! I also feel that this deserves a special mention.
http://www.ncbi.nlm.nih.gov/pubmed/12509281
Getting a joke like that past the editors is basically the goal of my life.
Anyone else have any favourites?
Following on from Matias' post, here's an example of an annoying problem faced by lab people, and a cool way to fix it.
The process of stitching together various bits of DNA to make plasmids for use in experiments occupies a large amount of time for many molecular biologists, especially in the early stages of a project. The most established way to do this is to use restriction enzymes and ligases in vitro to chop DNA up and stick it back together, and then stuff it into bacteria to select intact plasmids and amplify them. The fidelity of the process is generally good, but mutations do arise sometimes. I just found one such mutation in my plasmid - four base pairs of a critical recombinase recognition sequence deleted. Worse, this mutation is also present in the lab stock of the parent plasmid. Now here's the big difficulty in using restriction enzymes - they cut at a defined sequence, and if you need to modify part of the plasmid without convenient restriction sites (like this one), you're stuffed.
Or perhaps not. Recently (at least more recently than restriction enzymes) a new set of methods dubbed 'recombineering' have been developed. These take advantage of the fact that homologous recombination can occur in bacteria. So if you make a construct with some homology either side of the region you need to fix, and with a correct version in the middle, you can introduce this into the bacteria carrying the mutant plasmid and get replacement of the broken bit. I took advantage of the fact I was fixing a Cre recombinase site (lox site) and replaced the broken site with an antibiotic resistance gene (bsd) flanked by two working lox sites - meaning I can select for the replacement first, then use Cre to delete the marker and restore a single, functional site. Amazingly it actually worked - yay for small victories.
Most lab bacterial strains are recombination defective (recA mutation), to give better plasmid stability, so to use this technique you need to reintroduce the recombination function to the bacteria. Of course this carries the risk of causing other random plasmid mutations (due to recombination between small bits of homologous sequence) - so let's hope I didn't just make things worse :)
Highly technical diagram below - the lox site represented by a triangle. There are a lot of other uses for this technique, more info here http://www.nature.com/nrg/journal/v2/n10/abs/nrg1001-769a.html

SJP
I have a sequence analysis application which has to keep several thousand boolean matrices of dimensions up to ~ 200x2000 in memory. These matrices contain majority of the memory footprint of the application and essentially limits its use with very large sequence sets and with estimating more complicated models from those sequences.
Our current boolean matrix implementation is a boolean array accessed through a simple 2D matrix API. However, there's a large memory overhead in this because of booleans taking one byte in Java (according to the JVM specification) when in arrays (as attributes/variables they take 4 bytes). In other words almost 7/8 of memory stored with these matrices is wasted. How does one save it?
Well, as Leo helpfully pointed out to me yesterday, you can store the individual bits as part of integers and read them using bit shifting and mod 2. Brilliant! So I made an implementation of this idea yesterday evening and from a quick and dirty test it looks as though I even got it right on the first time!
Note that you can also do this with bytes or shorts but I decided to just use integers because then I don't have to worry about casting the indices in the get and set operations. The extra bits wasted in the end of the last 32 bits of the int array is not a major issue in my use case either (negligible memory waste when compared to wasting that 7/8 of all bytes spent).
Enjoy the source code:
public class BitMatrix2D implements Serializable { private static final long serialVersionUID = 29187494607829404L; private final int rows; private final int columns; private int[] data; public BitMatrix2D(int rows, int columns) { this.rows = rows; this.columns = columns; /* x >> 5 = x / 32, only faster */ this.data = new int[Math.max(1,(rows * columns + 1)>> 5)]; } public int rows() { return rows; } public int columns() { return columns; } public boolean get(int row, int col) { if (row < 0 || row >= rows) throw new IndexOutOfBoundsException( "Row index out of bounds:" + row); else if (col < 0 || col >= columns) throw new IndexOutOfBoundsException( "Column index out of bounds:" + col); int i = row * columns + col; return ( (data[i >> 5] >> (i % 32)) & 1 ) != 0; } public void set(int row, int col, boolean v) { if (row < 0 || row >= rows) throw new IndexOutOfBoundsException( "Row index out of bounds:" + row); else if (col < 0 || col >= columns) throw new IndexOutOfBoundsException( "Column index out of bounds:" + col); int i = row * columns + col; int idiv32 = i >> 5; int modBit = 1 << (i % 32); data[idiv32] = v ? data[idiv32] | modBit : data[idiv32] & ~modBit; } }
Fragment based protein fold prediction has been the most successful strategy in the latest CASP7 protein fold assessment experiment. Methods relying on fragment libraries also show great promise in designing new protein folds as Steve pointed out with his "Jetpacks, Enzymes And Jetpacks!" post (the Nature paper with the computationally designed and then optimised fold). However, what's been missing is a good quality public fragment library for any protein fold predicting group to experiment with (the CASP winning lot are naturally not giving theirs out).
This week's PLoS Computational Biology describes exactly such a fragment library: "Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments" by Baeten et al. This is exciting stuff and could well heat up the competition in CASP8!
Today was the Cambridge Computational Biology Institute Annual Symposium. A day of talks by investigators across Cambridge, it gave a nice slice of quite different kinds of research being done here.
The highlight of the day for me was Duncan Odom's talk on conservation of transcriptional regulation. He talked about older work of conservation of regulation between human and mouse, about more recent work on a mouse with a human chromosome, as well as plans to extend the comparative approach to a range of mammals - cool stuff.
Also, Sarah Teichmann talked about evolution of homo- and heterodimers, and Máté Lengyel discussed computational models of learning (Bayesian model selection and Markov Decision Process, and showed experiments comparing them to human learning.
Crossposted from YANNB
Modeling in biology is a kind of Cinderella branch of the field. Is not central as it is in physics and there is a lot of skepticism about it, especially from the wet lab guys. Biology has started as a descriptive subject and then it went to the quantitative approach.
Quantity. That’s exactly what you need if you want to do some modeling, that at the end of the day is crunching some numbers using a computer.
Let me just make a comparison with the engineering field. This guys usually:
* think about an idea
* model it to test if it’s worth to build it and it will resist
* build it in the reality
If for example you’re building an house, you hit a button and the program is going to make all the calculation to see if the house is safe and it will last, or it will just collapse under its own weight. Actually you’re testing your idea, modeling it on a virtual space.
In biology you have the same kind of approach:
* think about a question
* design the experiment to try to answer the question
* do the experiment
The modeling should be one point of the design part to let you know if your experiment would discover something or not, so you can save time and know on which parameter focus your attention or which proteins seem to be the important key role. It should help the biologist to design better experiment.
For example you have a cascade signaling involving something like 15 proteins. If you have a tool that is going to predict that the most interesting reaction over there involve protein 2 and protein 3 you can focus your attention over there, avoiding the scan of all the other proteins in the first place.
To do that we need of course a really rock-solid modeling framework and from the other hand a really easy and fast way to use it.
We are quite far from there, but it looks to me like an intriguing prospective.