Unlocking the Information That Drives Pharmaceutical Innovation: An Optimized Approach to Extracting Greater Value from Scientific Data

By Frank Brown, Ph.D., Chief Science Officer, Accelrys
The pharmaceutical industry is currently facing a perfect storm of competing challenges: As patent protections on today’s blockbuster drugs begin to expire, organizations need to go back to the R&D drawing board to pursue new revenue streams. At the same time, demand for more customized treatments is creating an increasingly complex research landscape, one that involves a greater understanding of diseases at the molecular level. On top of all this, the entry of more agile competitors, combined with tough regulations and the need to do more with less are piling on additional pressures.

According to a 2008 report by Deloitte Consulting1, "Research and development will continue to be the [pharmaceutical] industry’s core value generator. However, it is R&D configured for smaller genotyped market segments, creating targeted treatments, focusing on the patient over the disease life cycle, and working intimately with the other entities in the health care network, both competitors and collaborators. It will be a virtual R&D process in a network of disease-specific organizations and patient groups."

To address this new market reality, pharmaceutical companies need to take a close look at how they can modernize their innovation efforts. Just as organizations have turned to technology to streamline the supply chain or sales and marketing activities, they now must tap into opportunities for optimizing R&D, scale-up and early production processes.

The Data Management Challenge

Scientists, managers, production engineers and other key stakeholders involved in pharmaceutical research are often sitting on a vast wealth of data that can help speed the cycle of innovation and improve time-to-market, but they need better ways need to both "unlock" and leverage this information in a timely fashion.

Drug discovery typically involves the identification, communication and analysis of massive quantities of complex scientific data. Much of this data is locked up in silos – in a diverse array of formats (such as text, images, models, etc.), as well as within proprietary systems and equipment from assorted vendors (such as suppliers of test equipment or computational chemistry systems.) Information may be structured or unstructured, involve both current and historical research, and include anything from laboratory notes or output from analytical devices through molecular simulations and predictive models. To make matters more complicated, the data may also handled in an ad-hoc, manual manner and by individuals, organizational departments and contract locations around the globe.

Researchers need more efficient ways to access, aggregate and mine complex scientific data across many disciplines – otherwise, productivity and decision making will suffer, slowing the pace of innovation.

Limitations of Traditional Solutions

Over the past decade, enterprise solutions like supply chain management and CRM systems have helped streamline manufacturing and sales and marketing activities via automated workflows and collaborative information sharing. Implementing a similar, enterprise-level approach to R&D and early production activities such as drug discovery, clinical testing, and scale-up presents a compelling opportunity for pharmaceutical organizations. In fact, industry experts agree2 that optimized innovation processes can lead to between 10 and 85 cents additional return for every dollar invested in R&D.

But retrofitting traditional business intelligence, data management or product lifecycle management tools is not the answer. These "one size fits all" technologies were built for transactional data, which is generally structured and numerical in nature, and are not able to deliver advanced scientific analysis and drilldown capabilities.

On the other side of the coin, point tools designed for the scientific market, such as Electronic Lab Notebooks (ELN), often only solve part of the data management problem. By focusing on specific disciplines such as bioinformatics or cheminformatics, these types of solutions can lead to the isolation of research information in software from one vendor or another, requiring IT intervention to integrate and transfer data between multiple applications. Saving information has limited value unless data can be searched in an unconstrained way across the enterprise, and ELNs are too limited in scope to enable end-to-end process optimization.

An Optimized Approach to Scientific Information Management

To fully leverage the vast quantities and types of data within their enterprises, pharmaceutical organizations require a platform that facilitates the exploration and integration of scientific information across many disciplines. This platform must be able to access and aggregate both structured and unstructured data from multiple research areas, enable advanced scientific analytics, and offer flexibility for users to view information in the manner most suited to their needs, which may range from web portals to sophisticated 3D visualization. Finally, for users to extract the greatest value from their data, the platform must also be able to deliver precisely the right information, at the right time and in the right format through interactive reports and dashboards.

Today, technology advances such as service-oriented architecture (SOA) are enabling open, standards-based solutions that can support the creation of workflows incorporating data in a variety of formats, as well as services that originate in diverse systems and applications. This type of technology backbone offers a foundation for an optimized approach to scientific information management that includes the following cornerstones:

Figure 1: A complex scientific process is configured into an automated, end-to-end workflow. This example uses Molecular Mechanics-Poisson Boltzmann Surface Area (MM-PBSA) and Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) for structure-based design to estimate the binding free energy of a ligand to a protein target. In this approach, components are used to read and manipulate three-dimensional molecular input data (proteins and ligands), perform CHARMm force-field based simulations, and report free energy of binding in a high-throughput workflow.

1) Integration. Linking a powerful and diverse range of functionalities on a single platform promises to revolutionize research efficiency. Through the integration of disparate data formats, applications and algorithms from multiple research areas, systems and sources, users can create automated workflows that streamline highly complex experiments and other research activities. For an example of a workflow that connects the various steps in a complicated methodology, see Fig. 1. This level of integration goes far beyond what organizations can achieve with ELNs, which tend to focus on specific disciplines or sub-sets of the innovation process, such as experimental design or regulatory compliance. Moving further downstream, scale-up and early production activities also come into play. What if a compound doesn’t perform the same way in the plant as it has in the lab? Production cycles will add even more data sets to already existing sources, so tight integration between analytical data from the development laboratory, a production control system and real-time process data is also extremely important.

2) Analysis. Scientists, chemists, process engineers and other stakeholders also need to be able to transform raw data into the knowledge required to make better decisions, faster. Thus, an ability to analyze advanced scientific information such as spectroscopic, chromatographic, or complex chemistry data is key. An extensive array of statistical methods should be available, ranging from simple statistical indicators to advanced modeling methods. Users must also be able to "drill down" to the information behind high-level analysis when questions or problems require more in-depth investigation. Automation is again a critical component here, in order to avoid delays or productivity losses that happen when IT must be called upon to build and deploy required analytical capabilities.

3) Information Delivery. To derive optimal value from research information, users must finally have the flexibility to choose a delivery method that best suits their needs. In some cases, this may be a web-based client that generates simple reports for non-experts, and in other cases, it will be a customizable interface that helps scientists make specific research decisions using dashboards and visual models. User-centric views of information, can also provide both high level insight into broad organizational status, or deliver targeted intelligence about a particular research project or experiment. However the information is delivered, it needs to be in a familiar format to reduce training and increase productivity.

Real-World Case Study: Experimental Workflows

Figure2: A web-based interface for managing lead optimization programs that may be used to track experiment requests and monitor progress through interactive dashboards, as well as integrate research data to both visualize and report on results.

The concept of the world being flat is in full gear in the pharmaceutical research world. This drives a strong need for a decision support and tracking system that centrally manages experimental workflows – a system made possible through SOA-enabled web services. Fig. 2 illustrates a web-based interface for managing lead optimization programs that enables users to track experiment requests and monitor progress through interactive dashboards, as well as integrate research data to both visualize and report on results. A collaborative environment like the one shown in this example offers a consistent, unified view of research information, so that scientists and other key project stakeholders can make better and faster decisions, and improve the overall efficiency of the process. With a central view of experimental workflows, researchers can avoid duplication of experiments, identifying bottlenecks, improve communication across global project teams and departments, simplify analysis and reporting and more.

Real-World Case Study: Mining Unstructured Data

Pharmaceutical companies need access to a vast amount of unstructured content ranging from external patents and journal articles to internal company documents like laboratory notes. This combined content represents a valuable resource, but the sheer bulk of it impedes the realization of maximum benefit. Diverse formats make it extremely difficult and time consuming to pinpoint and access the knowledge required.

Figure 3: Protocol showing a modular approach to extracting chemical names from unstructured text documents. Documents from a variety of sources, and in multiple formats, can be processed to identify candidate chemical names, which can then be converted to structures for viewing, reporting, querying or storage.

A modular approach to identifying and extracting information, such as chemical structures, from unstructured text documents is illustrated in Fig. 3. In this example, documents from various sources are read into the workflow, including PDF, HTML, and Microsoft Word files. The contents of the documents are passed through a component that scans to find text strings representing chemical names using a combination of linguistic rules and an internal dictionary. Next, a name-to-structure converter is applied to weed out false-positive findings. The converted names, along with the document in which they were found, can then be further processed to meet the particular requirements of the discovery task, for example:

– Process one or more documents to determine the coverage of structures from a set of patents of interest

– Process a large set of documents to construct a database that allows users to search by query

In both of these cases, additional content (such as biological models, disease processes, etc.) can be extracted from the documents for further analysis. This modular approach to chemical text mining has many benefits, including the ability to efficiently carry out targeted analysis of a few documents or widespread exploratory research.

Conclusion

To stay competitive in a swiftly changing industry, pharmaceutical organizations must modernize their R&D and early production processes in order to extract greater value from their research intellectual property. The complex nature of this scientific information requires a highly specialized solution.

An open, standards-based scientific information management platform that combines the cornerstones of integration, analysis and information delivery can help scientists, engineers, senior managers and even non-expert executives gain greater visibility into their research efforts, speed innovation and achieve faster time-to-market and bottom-line results.

References

¹ The Changing Face of R&D in the Future Pharmaceutical Landscape, Dr. Terri Cooper, Ralph Marcello & Bee Animashaun, Deloitte Consulting LLP

²Knowledge Management: Reaping the Benefits, A.D. Little
About the author: Frank K. Brown PhD, served as the Senior Research Fellow within the Office of the CIO at Johnson & Johnson, one of the world’s largest pharmaceutical companies. He is currently is the Chief Science Officer for San Diego-based Accelrys.

The Data Management Challenge

Limitations of Traditional Solutions

An Optimized Approach to Scientific Information Management

Real-World Case Study: Experimental Workflows

Real-World Case Study: Mining Unstructured Data

Conclusion

References

Related Articles Read More >

Sen-Jam Pharma, KVK-Tech partnering for manufacturing and formulation for anti-inflammatory injectables

Fujifilm invests $1.2 billion in North Carolina biopharmaceutical manufacturing facility

Kintara Therapeutics, TuHURA to merge in all-stock transaction

Schott Pharma commits $371 million to new U.S. manufacturing facility for GLP-1, mRNA therapies

Search Pharmaceutical Processing World