A mouldy myth

WHAT-were-they-thinkingSomeone at my home institution, the University of the Western Cape, has decided that the way to attract students is to wave a picture of mouldy bread at them. Presumably they don’t think that having top class postgraduate programmes at places like BCB or PLAAS or SANBI is worth advertising. Nope, instead we should talk about mouldy bread. Or rather, a myth about Alexander Fleming and mouldy bread. Thus the modified (Gimped in fact) image to the left.

The original proclaims “What if [Fleming] never looked twice at something as ordinary as stale bread?”. Well, I don’t think we really know what Alexander Fleming thought of stale bread. What we do know is that stale bread had nothing to do with his (re)discovery of the antibacterial action of Penicillium moulds. Instead:

“Returning from holiday on September 3, 1928, Fleming began to sort through petri dishes containing colonies of Staphylococcus, bacteria that cause boils, sore throats and abscesses. He noticed something unusual on one dish. It was dotted with colonies, save for one area where a blob of mold was growing. The zone immediately around the mold—later identified as a rare strain of Penicillium notatum—was clear, as if the mold had secreted something that inhibited bacterial growth.” [source]

So it was a lazy attitude towards cleaning the lab — not mouldy bread — that led to Fleming’s discovery. That’s the first thing this blurb got wrong. What offends me more, however, is the clichéd image of the heroic scientist’s discovery sparking a paradigm shift. In reality, the antibacterial effect of Penicillum was known before Fleming, with a range of scientists and traditional knowledges describing the antibacterial effects of mould, or of Penicillum specifically. Just four years before Fleming’s discovery, Andre Gratia and Sara Dath discovered the antibacterial effect of a species of Penicillium, also as result of contamination of a bacterial culture.

What made Fleming’s discovery significant was not the moment of discovery and subsequent insight, but rather what he did afterwards: instead of merely publishing a paper and moving on to another topic, he spent years trying to get other scientists — chemists especially — interested in the new substance’s potential. It was just over a decade later that Howard Florey’s team assembled a strange collection of baths and milkchurns as part of the first penicillin production line. Before Florey, however, there was Dr Cecil Paine, a student of Fleming‘s, who used a crude penicillin extract to successfully treat an eye infection in 1931. (Paine later was a colleague of Florey’s) And Ernst Chain, the scientist in Florey’s lab that led the penicillin research, allegedly extracted the compound from a sample of mould that had been sub-cultured from Fleming’s original isolate. Florey’s science also drew on the clinical trials conducted by his wife, Ethel Florey. So the links between Fleming, the Floreys, Chain and the practical use of penicillin drew on a rich culture of openness and experimental. In addition, as efforts got under way to industrialise penicillin production, Sara Dath’s work in collecting a “monumental number of moulds and bacteria” proved useful. Science was then, and is now, a product of a process of collective enquiry and effort, and while “scientific interest” is key, reducing the story of penicillin to a Scot staring at stale bread does violence to history. And UWC could do better than to peddle this mouldy myth.


Cluster building at UWC

students assembling compute cluster

Motse, Eugene (hidden) and Saeed assembling a Dell R710 they’re going to use in their cluster

Last year the Centre for High Performance Computing (CHPC) ran a student cluster building competition for the first time, alongside their national meeting. The winning team progressed to the International Student Cluster Challenge in Leipzig and won top honours there. Observing the teams at work last year convinced me this is something we need to introduce to UWC, so his year when David Macleod from the CHPC’s ACE Lab is announced the second installment of  competition I contacted Computer Science to make sure we had a team. From that side Reg Dodds is facilitating things, and their sysadmin, Daniel Leenderts, is offering a helping hand. The team is being mentored by Motse Lehata, and includes Warren Jacobus, Saeed Natha, Nicole Thomas and Eugene de Beste.

On Tuesday Long and myself wandered over the CS to observe and assist with the unpacking and installation of the practice cluster that Dell had sponsored. I’ll be thin on the technical details in case they don’t want it shared, but it provides enough hardware for installing and testing an operating system and applications for benchmarking. I’m hoping to use this cluster building as an opportunity to get students (and faculty) interested in building cyberinfrastructure as an area for research and maybe even future careers. After all, right now I’ve got the distinct impression that the small number of people I know that run the (mostly Linux) servers that power South African e-Research infrastructure ended up in that career path largely by accident. With big international projects like the SKA and H3Africa coming on stream in the next few years, we’re going to need a much large pool of scientific computing, High Performance Computing, scientific workflows (my personal research bugbear), data curation, storage and re-use and so on expertise. Right now, as far as I can see, there is no decent curriculum out there to train these people, something that I’m trying to address in my small way as part of the H3ABionet, and there is no clear track through the educational institutions into the research infrastructure (as opposed to pure research) side of things. Its gotta change!

e-Research Africa 2013

Quite by accident I ended up attending (and speaking at) the e-Research Africa 2013 conference. This was held in Cape Town, and largely organised, I gather, by Ed Rybicki and Sakkie Janse van Rensburg, from UCT. Ed is the Academic Liason to the UCT Research Portal project, and Sakkie is the Executive Director of ICTS (basically Campus IT services) at UCT. Sakkie was at University of the Free State previously (which in my mind is currently most notable for providing employment to Albert van Eck, one of the more experience HPC admins I know).

The conference started with a keynote from Paul Bonnington, the Director of e-Research at Monash University, and what struck me about Paul’s presentation was the careful attention given to the human and institutional factors that got into e-Research productivity. The topic was “eResearch: Building the Scientific Instruments of the 21st Century – 10 Lessons Learned”, and it set the tone for the conference with a few key message:

  1. e-Research infrastructure is built for an unknown future. Paul gave the example of PlyC lysin, a novel bacteria-killing compound, data on whose structure was captured in 2008 and stored on Monash’s myTardis repository. This data was only analysed in 2011: i.e. careful capture and preservation of data from previous experiments was key to a major discovery. Contrast this with research and teaching pipelines that focus on single end points (papers or graduates). Which leads me to:
  2. e-Research infrastructure development should follow a spiral model. For those not familiar with spiral models, they’re a process model that Barry Boehm came up with in the 1980s and they’re specifically designed to manage successive iterations of requirements gathering, risk assessment, development and planning and…
  3. The role of the University is to be the enduring home for the e-Research process.

Think about this a bit: if research output is no longer (simply) papers, but also includes data and code, what allows research to have long term value? Long term, past research maintains value because it is kept accessible by a structure of support that provides it to present researchers. This is, in Paul’s vision, the university, but its also a set of people, technologies and processes. So its the data and code repositories, its the curation effort that ensures that data is stored in accessible ways and according to meaningful schema, its the metadata that allows us to find prior work. And value for who? At the biggest picture level, society, but in a more immediate sense, value for researchers. Thus three more things:

  1. That “unknown future” is best known by people actually doing academic research. So their input in the “spiral” process is vital. In personal terms, I’m more than ever convinced that UWC needs a “e-Research Reference Group” drawn from interested academic staff from different departments that can outline requirements for future e-research infrastructure requirements.1
  2. Academics are, of course, not infrastructure builders. Infrastructure builders come in different forms – library people, IT people, etc – but in order to build effective e-Research infrastructure, they need to be partners with academics. In other words, there needs to be a common goal: research output. This is different to traditional “IT support”. In my little bubble at SANBI I’ve worked this way over the years: I’ll often partner with individuals or small groups to get work done, with them providing the “domain knowledge” and me grounding the process in computing realities (and hopefully adding a bit of software engineering wisdom etc).
  3. This partnership implies that there needs to be a growth path that recognises and rewards the work of these infrastructure-building partners.2 Paul referred to this as a “third track” in the university, distinct from both academic staff and non-academic support staff. (Ok this is a bit self-interested because I’ve been one of those “non-academic support staff (that participates in research)” for years.)

Ed’s written a blog post about the conference, and there were loads of interesting bits and pieces, such as Yvonne Sing Min’s work on building both a database (the “Vault”) and web front end to allow UCT researchers to have a central toolset for managing their research profiles (sometime similar to what we’re doing for H3ABionet with the NetCapDB), and Hein de Jager mentioning that they’re using Backblaze storage pods at UCT (gotta go see those!), and Andre le Roux’s presentation on redesigning infrastructure to accommodate research, with its focus on people, process and technology. I fear that my talk on scientific workflow systems might have been pitched at the wrong level, but it happened regardless. The presentations are online, unfortunately they don’t include the presentation from day 4 (the workshop day) yet, so Dr Musa Mhlanga’s fascinating talk on using high throughput microscopy for studying biological pathways is missing. I (and other people) tweeted a bit from the conference, using the #eresearch2013 hashtag.

Besides the talks, there was some good networking, since admins / ops people from SANBI, UWC ICS, University of Stellenbosch and UCT were all present at various times. We had a lunchtime meeting (along with Inus from CHPC) to launch a HPC Forum, which basically means that we have a mailing list and also a set of physical meetings to share experience and knowledge with regards to running High Performance Computing sites. If you’re interested in this, drop me a mail.


1. As an illustration of investing in this unknown future,  in “Where Wizards Stay Up Late: The Origins Of The Internet“, Hafner and Lyon report on J. C. R. Licklider’s request to buy a computer for BBN:[Licklider] believed the future of scientific  research was going to be linked to high-speed computers, and he thought computing was  a good field for BBN to enter. He had been at BBN for less than a year when he told Beranek he’d like to buy a computer. By way of persuasion, Lick stressed that the computer he had in mind was a very modern machine—its programs and data were punched on paper tape rather than the conventional stacks of IBM cards.

“What will it cost?” Beranek asked him.
“Around $25,000.”
“That’s a lot of money,” Beranek replied. “What are you going to do with it?”
“I don’t know.”
Licklider was convinced the company would be able to get contracts from the government to do basic research using computers. The $25,000, he assured Beranek, wouldn’t be wasted.

None of the company’s three principals knew much about computers. Beranek knew that Lick, by contrast, was almost evangelistic in his belief that computers would change not only the way people thought about problems but the way problems were solved. Beranek’s faith in Licklider won the day. “I decided it was worth the risk to spend $25,000 on an unknown machine for an unknown purpose,” Beranek said.

2. For a little rant on how hiring difficult hiring computational people to support biologists is, see C. Titus Brown’s “Dear Abby” blog post.


Gotchas in dual-mail-server setup

At SANBI we do spam filtering on a dedicated machine, where we run qpsmtpd with various plugins. The faces the big scary Internet and then any mail that passes its filters is delivered to our main mailserver, where the mailboxes live. Some years ago I wrote a plugin for qpsmtpd that does recipient checking, i.e. it connects to the main mailserver and uses the RCPT TO command to check if the mail can be delivered. I discovered a significant gotcha with this approach: any mail passing the spam filter was being accepted. I.e. I’d accidentially created an open relay (but only for non-spam-filter-triggering mail). So this post is just a note to self (and others that might make this mistake): your final mail server should treat the spam filtering proxy as an external mailserver, i.e. relaying should not be permitted. I did this by changing the mynetworks setting in the main mailserver’s Postfix configuration to exclude the spam filtering server’s IP. (Note that exclusions must be before inclusions in this statement, so !<spam filter IP> had to come before <spam filter IP’s network>.)

Now things are working again, and hopefully we’ll be out of the blocklists soon. However, I took the opportunity to look at what’s out there as filtering SMTP proxies, and it seems that Haraka is interesting. Haraka is Node.js based, so its an event based server written (largely) in Javascript. Kind of like Python’s Twisted. So maybe in the future we’ll switch to Haraka: that is, if we don’t just migrate all our mail to Gmail.

POSTSCRIPT: I forgot that we use our spam filter machine as a mailserver for external clients (when authenticated with SMTP AUTH), so my plan didn’t work. Turns out that what I actually needed was to enable the check_rcpt plugin together with my own plugin, because check_rcpt checks for mail relaying.

PPS: The correct response from a plugin if you think the message is kosher is DECLINED, not OK. OK means we’re sure the message is OK, whereas DECLINED means pass it to the next plugin. Drat!