Healthcare Needs a New Information Model: Semantic Web and the Challenges in Healthcare Information Technology
This year, the US will spend $2.5 trillion on healthcare, and no one really understands where the money will go. All we know is that we will spend twice as much as the rest of the modern world and we apparently will get worse results. We also know that there are wide variations in how medicine is practiced within this country, and there seems to be no clear correlation between spending and quality.
Escalating healthcare costs and the current crisis in the healthcare system has something in common with the recent financial meltdown. Both crises are rooted in information challenges: The underlying systems have not been transparent, with too many opportunities to game the system, until eventually the inevitable crisis hits.
Much like subprime mortgage applications, the underlying healthcare data that might reveal the true status and risks of the system are buried on paper and in silos. As a result, the risk is misunderstood, mismanaged, mispriced and ultimately shifted to the next sucker, the last one always being the taxpayer.
Just as the mortgage crisis has its roots in policies designed to help more Americans buy a home, healthcare also is full of good intentions gone awry. The fact is, the system has become far too complex for our old information technologies and methods to handle. Without a new information model we will continue to fly blind.
New Models for Linked Data
In 1998, Tim Berners-Lee, the architect of the web standards that enabled the Internet to fundamentally change the way the world is wired, described his vision for the web:
The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web.
The original web enabled the documents of the world to be linked together, and now we can find the world’s smallest needle in the worlds largest haystack in milliseconds. The next generation of the web, as envisioned by Tim Berners-Lee, is about linking data and meaning in much the same way we have already linked documents. This web of linked data enables a marketplace of ideas competing to create smarter and more useful services – without needing loads of capital to redevelop or integrate the old systems.
Tim Berners-Lee continued in Scientific American in 2001:
The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.
The semantic web is a model for linking information and representing knowledge, and semantic technologies for linking data will be a vital tool in solving the healthcare information challenge. To understand why our current approaches to healthcare IT do not scale and don’t work for the problems of understanding and optimizing healthcare services while controlling healthcare spending, let’s go back to the beginning, of how we store and characterize data in the first place.
Traditional Data Models
Traditional databases organize the world into lots of two-dimensional tables. Someone needs to design the schema for the world and then neatly fit the world into that schema. It is quick, simple, and has been how the world has stored and processed information for a generation.
Let’s take a simple everyday example of John and Mary, two friends from college. The simple facts of John and Mary are as follows:
John lives at 234 Main St.
John graduated from Harvard
John plays Rugby
Mary lives at 432 Post St.
Mary graduated from Yale
Mary doesn’t have a sport.
In the traditional information model, we can put this story into a tidy two-dimensional table:

But the real world is more complex than that. What happens when we gather more knowledge of the world? For example, we now would like to add the following facts:
John rents at 234 Main St.
Mary owns 234 Main St.
Or the facts that:
Mary attended Harvard
Mary is a fan of Rugby
We can attempt to fit more and more of the world into the relational database by adding more and more columns to our table, or we can add more tables that go into more detail about the various elements of our data. Then we can join these tables in various ways.
This model worked for a long time, but as the knowledge of the world becomes more complex, the approach becomes brittle. We need more and more effort and people to maintain and manage the ever more complex set of databases.
When relationships become even slightly more complex, the system starts to break. Now let’s try to add the following everyday human story to our database:
Mary is a friend of John
Harvard is a rival of Yale
Harvard and Yale both have Rugby teams
Harvard beat Yale
Even the simplest human stories start to strain the old data model. What if the information about Harvard and Yale attendance and their respective sports teams and schedules are stored in completely different databases in different systems in different locations? What do we do to tell our story then?
We have a couple of choices: We could hire a systems integrator to figure out how to connect the various systems, and build gateways so we can tap into the respective data and assemble a new dataset to tell our story. Another alternative is to start over and build yet another new system that is optimized for our expanded story about Mary and John. Either way, integrating and redeveloping databases and applications starts to get very time consuming and very expensive. And in the end, we have yet another silo of unconnected data.
Every business has this same issue of multitude of systems designed for a multitude of functions: Inventory, payroll, accounting, operations, sales, customer service, legal, etc. The necessary data is in different systems designed for different processes by different vendors. The difficulty and cost of linking data in useful ways goes up exponentially as we add information to the system. The old model does not scale.
Semantic Web and Linked Data Graphs
The solution is not to design grander and grander database schemas for the world. The solution is in a different model of representing data and linking it together. This is the model of the semantic web, as originally envisioned by Tim Berners-Lee.
Rather than building and joining more tables, or replicating and moving data around from different systems, each data element can stay where it is. All that is required is that each data element has its own unique address, a way to describe what it is, and a way to link to other data elements expressing a relationship.

Two dimensional tables of rows and columns are replaced by a linked data graph, with data elements linked to other data elements in a relationship. Each relationship is a triple comprised of subject, object, and predicate, much like a sentence in human language. Mary owns 234 Main St. John rents at 234 Main St. This extra dimension of knowledge, the action that connects two data elements, enables vastly more complex knowledge to be represented without making our software more complex.
In this graph model, our data scales the way the Internet scales. Unlike the relational model, the more people who add information to a graph, the easier it gets to tell our story.
To make our data interoperable and to build applications, we do not need teams of database engineers to spend years designing and updating the database schema and then teams of systems integrators to spend years to connect the data from different sources. The method of pulling knowledge together for new applications is more like a search engine. Through search we can trace the relationships and reassemble the data we need at that moment we need it.
The semantic web model can profoundly increase productivity because it enables more creative energy to go into new applications without the bottlenecks of time and expense of needing to integrate all the data.
Consider the example of John and Mary, but in scale and with richer data sets and linkages about their academic, sports, financial, and personal interests. Might we be able to develop algorithms to predict whether or not John will pay his rent or Mary will pay her mortgage? Will John and Mary make a good couple? Will Harvard beat Yale next year? Business intelligence is all about modeling the world and trying to predict the future, and business intelligence requires linked data.
Now extrapolate our example to challenge of healthcare information. There is an energetic debate in Washington DC about demanding electronic healthcare information systems, stimulated by the mandate from our president together with new funding to start to make it happen. Medical data has been in silos ranging from clinics, hospitals, specialists, primary care, home care, pharmacy, insurance, and many other sources. This data is not connected. The system is far too complex to integrate by traditional means, and if we try by traditional means we will have spent the stimulus money with very little to show for it. It’s time for a new approach to healthcare information.
Semantic Web and Healthcare Information Challenges
The semantic web was designed exactly for such a challenge as integrating the complex and fast changing universe of health information. Rather than building more silos of data, everyone can keep their data where it is, but let’s give data an address and start to use the common language of the semantic web to express the relationships between the data and start to link it together.
The biggest bottleneck in the health care information equation is getting meaningful data into the system in the first place. Good doctors sit with their patients and listen to their stories. They do not want to be spending that time filling in structured forms of data. Every patient story starts as unstructured data about what is going on in the patient’s life. Many different illnesses can present similar symptoms, and medicine is full of nuance in deciphering such complex issues with many variables.
In the example of John and Mary, as their story unfolds, we can add linkages to the data graph. Because human language also works as triples of subject, action, object, it is more and more possible for machines to understand the human story and to extract meaningful data to add as linkages to our graph.
I chose the story of John and Mary because for all we know, the most important data in driving cost may not be in the existing healthcare databases in the first place. We already know that lifestyle factors like diet and exercise drive obesity and high blood pressure, which drive complications and cost of care. But what medical system has anything to say about the fact that John has been binging on Big Macs and Big Gulps because he is stressed about his relationship with Mary and he can’t pay his rent? The linked data model will allow for vastly greater intelligence in the system and for the first time will enable more research on the root causes of most of our healthcare spending: our lifestyle and behavior. Not only is semantic technology a better way to link existing data, we can start to expand horizons beyond the existing silos.
We talk about the cost of changing things, but the only certainty in the health system is that we will spend $2.5 trillion on healthcare this year, and we will spend at least $2.6 trillion next year by doing nothing. The fact is, this is not a “lack of money” problem. There is more money in the system than every before.
The problem is that we have been using the wrong model, a model that has long become brittle and cannot scale with the complexity of healthcare, not to mention the information approaches that we will need to understand and address the root causes of health and illness in America.
The risks which have become apparent in the healthcare IT debates happening on the health information technology policy committee meetings is that we will further delay or lock in the legacy approaches that have failed us in the past, rather than enabling new and better – and simpler and less expensive – models for representing and using the explosion of health information and knowledge.
The good news is that a better model exists, it has been the vision for the web for over a decade, and the world wide web consortium already has agreed on the standards. We need these new models for complex health information challenge, and we need them now.


Excellent! This certainly gives me something to chew on.
Comment by Brian Ahier — July 29, 2009 @ 5:05 pm
Great article – thank you. Interesting to look behind the data to the meaning of the information and how it’s related. I’m interested to see sematic web concepts develop & where they go from a practical implementation standpoint in the enterprise & healthcare.
Comment by Liza Sisler — July 29, 2009 @ 5:53 pm
It seems like a lot of the technical challenges for the semantic web have been going away over the last few years, especially since Oracle now has a triple store. Btw University of Texas Health Science Center is doing pretty much what you describe above. It’s pretty ambitious but it seems like they have made great progress
Comment by James O'Sullivan — July 29, 2009 @ 6:15 pm
[...] is a very interesting blog post by Steve Brown that gives us a novel way of approaching the massive challenge that is healthcare reform. While [...]
Pingback by Semantic Web - A New Model for Healthcare IT? — July 29, 2009 @ 7:16 pm
Nicely said. Great piece.
Comment by Bret Waters — July 29, 2009 @ 9:12 pm
[...] Article Steve Brown, brown2020, 29 July 2009 SHARETHIS.addEntry({ title: "Healthcare Needs a New Information Model: Semantic Web and the Challenges in Healthcare Information Technology", url: "http://articles.icmcc.org/2009/07/30/healthcare-needs-a-new-information-model-semantic-web-and-the-challenges-in-healthcare-information-technology/" }); [...]
Pingback by ICMCC Website - Articles » Blog Archive » Healthcare Needs a New Information Model: Semantic Web and the Challenges in Healthcare Information Technology — July 30, 2009 @ 12:32 am
great technology – but implementation in healthcare will take a lot of time. Precisely because almost of all the constituents try to protect their own interests.
I would love to see an internet-style revolution, but because of the highly regulatedenvironment I doubt we are going to see it soon…
Having said that, the sole chance seem so be a use that benefits one player in his internal management at once. Maybe thats why the University of Texas Health Science Center seems to be doing hust that…
Comment by Roman Rittweger — July 30, 2009 @ 5:18 am
Steve:
Great article and position.
Correct me if I’m wrong, but I’m assuming we are talking about RDF and OWL technologies here. A good book on the subject is “Practical RDF” by Shelley Powers. I’m all for this technology being used, but, sadly, in health care XML is still a somewhat new, foreign, and often unwanted concept. When going down the semantic road, having a standard and PUBLIC nomenclature is very important to long term interoperability and data usability.
Let me pose these questions for discussion: Can object (non-relational) databases such as CouchDB, AWS SimpleDB and BigTable be another way to crack this nut? What advantages does OWL provide over object DBs? Could they be used in combination? I don’t really have an answer..just asking the question.
Cheers,
Alan
Comment by Alan Viars — July 30, 2009 @ 11:22 am
steve,
excellent post — consider the benefits to a healthcare knowledge and relationship network that not only captures an individual patient’s medical info but also meta data by location, lifestyles, profession etc.
I’ve wondered why it is that rehab/detox departments at local hospitals don’t capture info from patients on the social networks those patients return to after cleaning up. many of these individuals are the face of the healthcare system to others considering treatment. insofar as lifestyle practices lead to a great many costly diseases, it would seem that prevention and care outside the hospital could save both money and lives.
there are significant privacy issues (does the system raise my risk assessment based on how my friends live?), but these would seem manageable. i hope that companies involved choose to collaborate on solutions using the kind of insight you propose.
cheers,
adrian
Comment by Adrian Chan — July 30, 2009 @ 12:30 pm
Alan: RDF is the standard that W3C and Tim Berners-Lee advocate, but more important than the specific standard is the model. Don’t ask people to move their data and create yet another silo. Instead make it linkable so that a graph can emerge, containing more knowledge the more it is used, kind of like the web.
Adrian: Privacy is an issue in either model, but I am more comfortable with a federated, distributed model with linked data rather than handing all the data over to a central authority or master database.
Comment by admin — July 30, 2009 @ 5:26 pm
Really excellent article with very clear examples
Comment by Hanumat G. Sastry — January 26, 2011 @ 4:40 am