Let scientists tell us about sciences...

"Let scientists tell us about sciences. But government involves questions about the good for man, and justice, and what things are worth having at what price; and on these a scientific training gives a man's opinion no added value. Let the doctor tell me I shall die unless I do so-and-so; but whether life is worth having on those terms is no more a question for him than for any other man. "

CS Lewis

 

 

Google Trick for Science Search

While specialised search engines will always have their place, I find myself drifting more and more toward google for the thorough, 'all document' search. Here is a simple example - I want to know if SCN2A (the gene encoding the Nav1.1 channel) has been expressed in HeLa cells. My first destination is PubMed:

- SCN1A HeLa → 1 result, a 'false' positive

Next, I'll try google:

- SCN1A HeLa → 6070 results

Far too many to search through. Most of the results appear to be mentions of one or both words on the same web page.

If I think about what I am actually search for it is: "We expressed SCN1A in HeLa cells…". I could search for that exact expression (0 results), but I know that there are a hundred different ways of saying the same thing. I would like to search for all those web pages where SCN1A is in the same sentence as HeLa cells. There is at least one commercial solution out there - the excellent DevonAgent, which I suggest is worth a look. But a similar, though limited, functionality is available in Google.

- "SCN1A * HeLa" → 1 result, a 'false' positive

(Enter with the quotation marks). This produces all the webpages with SCN2A preceding HeLa in the same sentence; of course HeLa may precede SCN2A, so I also need to enter the search phrase:

- "HeLa * SCN1A" → 0 results

This gives me some confidence that SCN1A has probably not been expressed in HeLa cells; at least not documented to have been expressed. I realise that this method may result in some misses - when HeLa and SCN1A are split by a single period for instance. But it is a far better method than the standard google search and has many uses beyond science. 

I hope for the day when one can supply a search modification in the google search bar along the lines of 'search for these two words, and only return those webpages where they are within [x] words of each other'.

Python vs. Mathematica - Numerical Solving

I've been thinking about moving my modelling from Mathematica (expensive, poorly managed activation, expensive) to Python/Scipy (free, friendly community, many scientific modules). In theory, all of the tools that I employ in Mathematica are available through the Scipy and Numpy Python modules.

I enjoy using Python for a number of reasons:

  • It's free
  • I am able to use TextMate, a fantastic text editor available on the Mac avoiding the clunky notebook entry of Mathematica
  • The community is active and friendly
  • Many, many tools are available for scientific research e.g. the excellent SciPy, and Biopython to name two that are relevant to my work

Granted that the visualisation tools aren't quite as user-friendly as those available in Mathematica and the symbolic computing power of Mathematica is sometimes missed, the tools available are more than adequate for my needs.

To directly compare the two systems, I obtained a simple Hodgkin-Huxley model from the Mathematica demonstration site titled: Neural Impulse: The Action Potential in Action. I stripped it of its interactive elements and plotting functions to the bare bones:

Vmi = 0; Gna = 120; Gk = 36; Gl = 0.3; Ena = 115; Ek = -12; El = 10.6;
alpham[Vm_] := (0.1 (25 - Vm))/(Exp[(25 - Vm)/10] - 1)
betam[Vm_] := 4 Exp[-Vm/18]
alphan[Vm_] := (0.01 (10 - Vm))/(Exp[(10 - Vm)/10] - 1)
betan[Vm_] := 0.125 Exp[-Vm/80]
alphah[Vm_] := 0.07 Exp[-Vm/20]
betah[Vm_] := 1/(Exp[(30 - Vm)/10] + 1)
gna[t_] := (Gna) ((m[t])^3) h[t]
gk[t_] := (Gk) (n[t])^4
duration = 20;
strength = 3
Istim[t_, duration_, strength_] := If[0 < t < duration, strength, 0]
For[i = 1, i <= 500, i ++, 
  NDSolve[{Vm'[t] == 
     Istim[t, duration, strength] - 1 (gna[t] (Vm[t] - Ena)) - 
      1 (gk[t] (Vm[t] - Ek)) - (Gl (Vm[t] - El)), 
    m'[t] == (alpham[Vm[t]] (1 - m[t])) - (betam[Vm[t]] m[t]), 
    n'[t] == (alphan[Vm[t]] (1 - n[t])) - (betan[Vm[t]] n[t]), 
    h'[t] == (alphah[Vm[t]] (1 - h[t])) - (betah[Vm[t]] h[t]), Vm[0] == Vmi, 
    m[0] == alpham[Vmi]/(alpham[Vmi] + betam[Vmi]), 
    n[0] == alphan[Vmi]/(alphan[Vmi] + betan[Vmi]), 
    h[0] == alphah[Vmi]/(alphah[Vmi] + betah[Vmi])}, {Vm, m, n, h}, {t, 0, 
    50}]] 

Time to completion: 9 seconds

Then I coded the same model in Python (source available here):

import pylab
import numpy
from scipy.integrate import odeint
import math
from numpy import array,linspace

G_L = 0.3
E_L = 10.6
G_Na = 120
E_Na = 115
G_K = 36
E_K = -12
e = math.e # Euler's constant

v0 = 0.0
alpham = (0.1*(25-v0)) / (e**((25-v0)/10)-1)
betam = 4 * e**(-v0/18)
alphah = 0.07 * e**(-v0 / 20)
betah = 1 / (e**((30-v0)/10)+1)
alphan = (0.01 * (10 - v0)) / (e**((10 - v0) / 10)-1)
betan = 0.125 * e**(-v0/80)

m0 = alpham/(alpham + betam)
n0 = alphan/(alphan + betan)
h0 = alphah/(alphah + betah)

def i_stim(t):
if t < 20: return 4
else: return 0.0

def alpha_m(x):
return (0.1 *(25-x))/(e**((25-x)/10)-1)

def beta_m(x):
return 4*e**(-x/18)

def alpha_h(x):
return 0.07 * e**(-x / 20)

def beta_h(x):
return 1.0 / (e**((30-x)/10)+1)

def alpha_n(x):
return (0.01 * (10-x)) / (e**((10-x)/10)-1)

def beta_n(x):
return 0.125 * e**(-x/80)

def diff_eqns(x,t):
"""Differential equations for the HH model"""
dm_dt = alpha_m(x[3]) * (1.0 - x[0]) - beta_m(x[3]) * x[0]
dh_dt = alpha_h(x[3]) * (1.0 - x[1]) - beta_h(x[3]) * x[1]
dn_dt = alpha_n(x[3]) * (1.0 - x[2]) - beta_n(x[3]) * x[2]
dv_dt = i_stim(t) - (G_L * (x[3] - E_L)) - (x[0]**3 * x[1] * G_Na * (x[3] - E_Na)) - (x[2]**4 * G_K * (x[3] - E_K))
return array([dm_dt, dh_dt, dn_dt, dv_dt])

t = linspace(0,50,1000)
i=1
while i < 500:
odeint(diff_eqns, array([m0,h0,n0,v0]), t)
i = i+1

Time to completion:  223 s

That is significantly slower than equivalent code in Mathematica. I enjoyed writing it in Python, but it was not difficult in Mathematica and considering the 10+ x speed up it does make me wonder whether I should make the switch. I have read a little about speeding up numerical solving in Python with the use of Fortran and C. But considering everything else that I have to learn it might just be better to stick with the tried and true. What do others think? Is there a simple way to speed up the Python code? I'll update the benchmarks as the code improves.

(Mac 10.6.2, Python 2.6.4 with 64 bit Numpy, SciPy; Mathematica 7.01)

Update: Now that I am running Mathematica 8, I thought I would repeat this very inaccurate benchmarking: time to completion 7s in Mathematica 8. A modest, but welcome improvement.

If you are new to Mathematica, the best help I have had after the included documentation is Mathematica Cookbook by Sal Mangano; well worth the small price.