This article aims to rigorously define three common words we use daily without much forethought: data, information, and knowledge.
Why? First, misunderstandings are rife when people use the same terms to mean different things. Second, the problem is more severe for organisations, especially in the tech industry.
Markets value high-tech organisations’ shares based on their knowledge capital and intellectual property rather than tangible assets, leading organisations to invest more in knowledge management.
Organisations also prefer to recruit experienced (and therefore more knowledgeable) staff over intelligent or highly-educated people (with up-to-date and broad information levels) who lack practical field wisdom. This shows that information and knowledge are not equally valued.
As per T. H. Davenport and L. Prusak, firms spend big money on initiatives aiming to harness intellectual capital but deliver little business value because they treat data, information, and knowledge similarly.
The authors focused on the following ideas:
2. Data, Information, and Organisations
2.1 Data as a Description of Objects and Events
Data collected by organisations and stored on IT systems typically describe objects of relevance, such as clients, products, or transactions.
Data is usually captured and stored in discrete and structured forms that can be tabulated, categorised, enumerated, and measured on relevant scales.
Computer programs efficiently process structured data, hence the latter’s preponderance (until recently) over its non-structured sibling.
Since the advent of social media, computers have learned to effectively sip through tons of unstructured data through social media content like tweets, posts, and images.
This rich tapestry allowed the analysis of complex user activity and preferences and the creation of algorithms for recommendation systems like those used on Amazon, Instagram, and Netflix.
As described above, data is unprocessed and also called raw data.
2.2 Surveys and Watercooler Stories
Unfortunately, it is not structured; therefore, its processing cannot be delegated to machines but will require human faculties of understanding.
To mitigate this problem, organizations typically use surveys with questions that can be answered on a predefined scale to determine what people think of a specific topic or situation.
Data points gathered from surveys can be combined and weighted to output a specific number, allowing you to gauge their sentiments on the surveyed subject.
Although organizations continue to use surveys regularly, sophisticated programs have been developed to perform sentiment analysis based on unstructured data. Algorithms can now predict your mood based on the stories you tell on social media.
Unlike raw data, which does not implicitly have meaning, purpose, or relevance, information can change the receiver’s perception and shape their views on specific topics or situations.
The message carried is the principal difference between information and raw data. In the next section, we will formalise this notion through the concept of entropy and information theory.
To illustrate the difference between raw data and information, consider the following example. Several sensors on a truck travelling through the desert continuously measure engine temperature, oil pressure, noise levels, and vibrations of a metal structure in the chassis.
A data point tells us little about how much the truck can travel before it requires an oil refill or predicts the following breakdown. However, aggregated observations over a prolonged period and many trucks enable data scientists to create sophisticated statistical models with sufficient prediction powers.
2.4 Logical Depth
Logical depth refers to the complexity and depth of analysis required to extract meaningful insights from vast data. Organizations and individuals are often inundated with a deluge of information in today’s data-driven world. However, the true value lies not in the sheer volume of data but in the ability to derive actionable insights and make informed decisions based on that data.
When it comes to understanding complex problems or making important business decisions, it is crucial to dive deep into the data and uncover hidden patterns, correlations, and trends. This process requires employing techniques such as data mining, statistical analysis, machine learning, and visualization tools to extract and interpret the information effectively.
The concept of logical depth encompasses the technical aspect of data analysis and the cognitive effort required to navigate through the complexity and noise. It means:
Logical depth allows individuals and organizations to move beyond surface-level insights and uncover deeper connections that can drive strategic decision-making and problem-solving.
3. Data and Information in Communication Theory
3.1 Information Theory
Information theory examines engineering aspects of information exchange using mathematics. This perspective originated in the early 1920s at Bell Labs.
Hartley used the term information as a quantitative measure of the receiver’s ability to distinguish sequences of symbols.
However, the great leap in the information theory of communication had to wait for Claude Shannon‘s paper in 1948, which we discuss in the next section.
3.2 The Stochastic Sender/Receiver Model
Shannon’s paper, A Mathematical Theory of Communication, was published in The Bell System Technical Journal in 1948.
The paper describes a communication model based on the sender/receiver paradigm. Its fundamental principles are as follows:
3.3 Data Quality and Interpretation
Data and Data Size — Data can be defined as a stream of bits with a size equal to the number of bits transmitted.
In the next section, we will discuss information content (also measured in bits) which, according to Shannon’s information theory, does not always equal the size of transmitted data.
To give you a sense of what’s coming, imagine you receive constant notifications updating you on the value of Pi. In this case, the messages carry no information since Pi’s value will never change.
But data size can also vary by applying data compression algorithms, which encode symbols observed more frequently with lesser bits, saving extra space.
Signal-to-Noise Ratio (SNR) — The signal-to-noise ratio (SNR) is a measure in communications engineering of a channel’s quality. There are different formulations of the SNR; the below is one of the more intuitive:
The above equation tells us that the signal quality is reduced by the ratio of the signal’s power to that of the noise.
Noise pollutes the signal, distorting it and sometimes changing it altogether. The quality of the data received is then questionable, and a sound receiver system must be able to recover the signal from the noise.
Data Interpretation — Although sometimes faithfully presenting factual descriptions, data cannot be trusted implicitly. This mistrust is due to our limited information processing capabilities, inherent cognitive biases, and the various ways data can be manipulated for deception.
In this brilliant lecture on Why Leaders Lie: The Truth About Lying in International Politics, John Mearsheimer depicts where data manipulation leads to faulty conclusions.
While lying is not as rare as we think it is, spinning and concealment are, in fact, much more frequent, and Dr Mearsheimer argues that society cannot function properly without these two.
From Theory to Practice: Exploring the Science of Interpersonal and Group Communication and Its Effects on Social Cohesion
3.4 Information, Uncertainty, and Entropy
Problem Description — Let’s take the example of a stochastic sender (modelled by a Markov process). The symbols shared belong to a closed set .
The probability of observing a specific symbol is given by .
Given this source, we would like to measure the amount of information it produces, but first, we must define the word information. We could start with the following desirable properties that we would like to associate with this measure if it were to be valuable:
Shannon postulated that the only form that this measure could take if it were to satisfy the above assumptions was:
Entropy and Uncertainty — The equation above is similar to that proposed by Boltzmann for entropy in thermodynamics. Entropy measures a system’s disorder; a fully unordered system will have maximum entropy.
In information theory, the interpretation is slightly different; the more entropy a source has, the more uncertainty its messages will contain.
Information Content — Therefore, information can be described by its impact on the receiver’s end, roughly equivalent to how surprised the receiver will be when reading the transmission content.
The information is trivial and non-consequential if the reader is not surprised at all. If they are stunned, then they have observed a rare event.
4. Organisational Knowledge
Knowledge can be defined as a set of (mostly) coherent assumptions that a person gleans from past experiences and formal education on a (typically) specific topic.
This knowledge allows its owner to make sense of past and present experiences and predict future behaviour.
Knowledge is best evaluated by the quality and outcomes of the decisions it has influenced.
Takeuchi and Nonaka’s knowledge creation and management model assumes that knowledge is created in an individual’s mind, even below their awareness level.
How can knowledge be extracted, codified, spread, and transformed into an organisational asset if hidden deep in a person’s mind? This challenge is what constitutes the rudimentary objective of knowledge management in organisations.
To explain this more, we must understand the difference between tacit and explicit knowledge.
4.2 Explicit vs Tacit Knowledge
Tacit knowledge resides in a skilled individual’s subjective insights, expert judgment, and intuition. It is a working mental model developed from past experiences, cannot be easily explained or codified, and often sits below the individual’s awareness level.
Explicit knowledge is systematic, codified, and can be easily communicated to peers and written down in books and computer programs.
Knowledge conversion between tacit and explicit requires articulation efforts.
Skilled chefs, artists, and carpenters pass on their knowledge through apprenticeships, a combination of formal education and practical insights.
4.3 Knowledge Creation
Nonaka understood the creation of new knowledge as transforming tacit knowledge in an employee’s mind into explicit knowledge through innovation.
According to Nonaka, this conversion of tacit into explicit occurs through the social interactions among members of a human group. Through imitation, knowledge is transferred from one individual to another.
Tacit knowledge can also be converted into explicit knowledge by creating contexts where individuals must justify their beliefs.
Davenport and Prusak listed four methods to turn information into knowledge.
Interestingly, in Takeuchi and Nonaka’s work, as well as Davenport and Prusak, knowledge is created first and foremost in the human mind with information at its disposal.
4.4 Knowledge Embedded in Processes
Hiking, scuba diving, surgery, and many other risk-prone activities are heavily constrained by rules and processes to ensure operational success.
The surgeon’s protective clothing, tools’ checklists, patient preparation, and handwashing rituals were perfected over many years, even generations, and have come to incorporate a massive body of knowledge that cannot be easily explained or communicated.
Every failure in an operation’s theatre provided a learning experience and an opportunity to improve existing processes and expand or reshape the frontiers of the present knowledge.
You can only codify a fraction of this vast wisdom in books, journals, or magazines, and new surgeons can only be prepared by a combination of formal education and apprenticeship.
Sometimes, explaining why certain traditions and cultures maintain specific assumptions is challenging. Still, their survival over many generations signifies their usefulness and value.
Process Engineering: Essential Concepts from Lean, Agile, and Toyota for Effective Software Development Processes
5. Final Words
We conclude this article with a sobering thought on the nature of competition in modern times.
Historically, organisations retained their competitive advantage by aggressively protecting their trade secrets and defending their sources of raw materials.
With globalisation, technological advances cannot be easily protected, and competitors quickly imitate and even improve original ideas, swiftly becoming mainstream.
The strategic imperative of securing a competitive edge forced organisations to turn to knowledge assets and intellectual capital as a source of constant innovation and improvement and to keep them at least one step ahead of the competition.
Paul Romer’s original quote vividly illustrates the point.
In a world with physical limits, discoveries of big ideas (for example, how to make high-temperature superconductors) and the discovery of millions of little ideas (better ways to sew a shirt) make persistent economic growth possible. Ideas are the instructions that let us combine limited physical resources in arrangements that are ever more valuable. — Paul Romer
Combining influential discoveries with Operational Excellence replaced cutting-edge technology as the prime source of competitive advantage.
- A Mathematical Theory of Communication – by C. E. Shannon
- Working Knowledge: How Organizations Manage What They Know — by T. H. Davenport and L. Prusak
- Strategic Management and Organisational Dynamics (4th edition) — Ralph Stacey
- Six Frames for Thinking about Information — by Edward de Bono