As modern technologies advance, data generation within biomedical sciences has become faster, cheaper and more accessible to researchers in Africa. But to transform that data into information and knowledge, African researchers need access to tools and infrastructure. H3ABioNet is collaborating with UCT eResearch to build an Open Data science platform as part of the National Institutes of Health (NIH) Data for Africa Initiative.
“As technologies for biomedical sciences advance, we have moved to a datarich – information-poor paradigm,” says Professor Nicola Mulder, head of Computational Biology Division at UCT. “Poor infrastructure has traditionally meant African scientists face major limitations in their ability to analyse the data they gather.”
Through H3ABioNet, a large Pan- African bioinformatics network of 27 institutions in 17 countries, Mulder and colleagues have been working to build capacity for genomics research, including training, infrastructure development and building research tools, workflows and data pipelines.
As part of this effort Mulder is now leading a project funded by the NIH through its Common Fund’s Harnessing Data Science for Health Discovery and Innovation in Africa programme, to build an Open Data science platform for African scientists called eLwazi.
“Ulwazi is the isiXhosa work meaning knowledge or information,” says Mulder, “and Olwazi means big rock in Luganda, symbolising robustness and endurance. eLwazi is thus adapted from these two words.”
As Mulder’s research is by its nature data-intensive, she has collaborated with UCT eResearch in the past, particularly around the use of, and investment in, the ilifu data-intensive research cloud developed in response to the big data needs of the Square Kilometre Array (SKA) project, but designed to support bioinformatics in data-intensive research.
For this reason, Mulder invited UCT eResearch to be consortium partners in her bid to build eLwazi, particularly with regards to data management and administrative access to this computing cluster, among other things.
eLwazi: an African tool for African science
“It is important that an Open Data platform like this considers the local context for the community it is meant to serve,” says Mulder. “We need to factor in the research infrastructure (or lack thereof), the reality of poor Internet connectivity, national data sharing legislation, and possibly most importantly, the perceptions and attitudes of the users, African scientists.”
The platform will hold data from across the continent: this can be either directly hosted on the eLwazi platform or held locally at the individual research sites with just the metadata on the platform. But eLwazi is much more than just a repository: it will also contain a set of software tools and workflows which the researchers can draw on to run their analysis on the data. Ethics and legislation, as well as perceptions, around data sharing can be quite complex.
eLwazi works on the principle that African scientists should have the choice of where their data is stored and analysed. Where access to data is not open, eLwazi will facilitate the application and authentication process for access.
“eLwazi will be an inspirational example of Africa’s scientific renaissance; providing an enabling environment for analysis of biomedical data on the continent and promoting local innovation,” says Mulder.
“Our ability to provide research facilities to international standards but relevant to the local context, is what ensures the sustainability of this platform. This approach has been successfully