# Data, Information, and Knowledge — How to Tell the Difference

## 1. Overview

This article aims to rigorously define three common words we use daily without much forethought: datainformation, and knowledge.

Why? First, misunderstandings are rife when people use the same terms to mean different things. Second, the problem is a bit more severe for organisations, especially in the tech industry.

Markets value high-tech organisations’ shares based on their knowledge capital and intellectual property rather than tangible assets, leading organisations to invest more in knowledge management.

Organisations also prefer to recruit experienced (and therefore more knowledgeable) staff over intelligent or highly-educated people (with up-to-date and broad information levels) who lack practical field wisdom. This shows that information and knowledge are not equally valued.

As per T. H. Davenport and L. Prusak, firms spend big money on initiatives aiming to harness intellectual capital but deliver little business value because they similarly treat data, information, and knowledge.

The authors focused on the following ideas:

• Data, information, and knowledge are not interchangeable concepts, despite their overlap in specific contexts.
• The success or failure of knowledge management initiatives depends heavily on an organisation’s ability to distinguish between essential and trivial knowledge and, therefore, which to keep and which to dismiss.
• Based on Nonaka and Takeuchi’s work in the field, the authors discuss transforming tacit into explicit knowledge to benefit organisational learning and increase intellectual wealth.

## 2. Data, Information, and Organisations

### 2.1 Data as a Description of Objects and Events

Data collected by organisations and stored on IT systems typically describe objects of relevance, such as clients, products, or transactions.

Data is usually captured and stored in discrete and structured forms that can be tabulated, categorised, enumerated, and measured on relevant scales.

Computer programs efficiently process structured data, hence the latter’s preponderance (until recently) over its non-structured sibling.

Since the advent of social media, computers have learned to effectively sip through tons of unstructured data through social media content like tweets, posts, and images.

This rich tapestry allowed the analysis of complex user activity and preferences and the creation of algorithms for recommendation systems like those used on Amazon, Instagram, and Netflix.

As described above, data is unprocessed and also called raw data.

### 2.2 Surveys and Watercooler Stories

Storytelling is an exceptionally powerful means of communication among team members of a human group.

Unfortunately, it is not structured; therefore, its processing cannot be delegated to machines but will require human faculties of understanding.

To mitigate this problem, organizations typically use surveys with questions that can be answered on a predefined scale to determine what people think of a specific topic or situation.

Data points gathered from surveys can be combined and weighted to output a specific number, allowing you to gauge their sentiments on the surveyed subject.

Although organizations continue to use surveys regularly, sophisticated programs have been developed to perform sentiment analysis based on unstructured data. Algorithms can now predict your mood based on the stories you tell on social media.

### 2.3 Information

Unlike raw data, which does not implicitly have meaning, purpose, or relevance, information can change the receiver’s perception and shape their views on specific topics or situations.

The message carried is the principal difference between information and raw data. In the next section, we will formalise this notion through the concept of entropy and information theory.

To illustrate the difference between raw data and information, consider the following example. Several sensors on a truck travelling through the desert continuously measure engine temperature, oil pressure, noise levels, and vibrations of a metal structure in the chassis.

A data point tells us very little about how much the truck can travel before it requires an oil refill or predict the following breakdown. However, aggregated observations over a prolonged period and many trucks enable data scientists to create sophisticated statistical models with sufficient prediction powers.

## 3. Data and Information in Communication Theory

### 3.1 Information Theory

Information theory examines engineering aspects of information exchange using mathematics. This perspective originated in the early 1920s at Bell Labs.

Key contributors during this time were Harry Nyquist, who published a paper in 1924 titled Certain Factors Affecting Telegraph Speed, and Ralph Hartley‘s 1928 paper, Transmission of Information.

Hartley used the term information as a quantitative measure of the receiver’s ability to distinguish sequences of symbols.

However, the great leap in the information theory of communication had to wait for Claude Shannon‘s paper in 1948, which we discuss in the next section.

### 3.2 The Stochastic Sender/Receiver Model

Shannon’s paper, A Mathematical Theory of Communication, was published in The Bell System Technical Journal in 1948.

The paper describes a communication model based on the sender/receiver paradigm. Its fundamental principles are as follows:

• The sender is a stochastic system (modelled by a Markov process) able to convey its state to another system (the receiver) by sharing messages across a communication channel.
• The messages the sender generates consist of symbols $\{S_1, S_2, ..., S_n\}$ that can be grouped into words and sentences. The receiver cannot predict the messages it receives but can process them successfully after arrival.
• The notion of signal and noisewhich make up a message, is central to communication theory and can be understood as followsMessages have meaning to the receiver and are conveyed by the symbols used and their arrangement. Any distortion to the structure can vary the meaning, thereby introducing noise. The signal-to-noise ratio is a measure of the quality of the communication channel.
• A binary digit or bit is the smallest unit of information and can typically take two values $\{0, 1\}$. A bit can convey a binary (Yes/No) answer to a specific question, such as the current state of the sender.
• Crucial to this classical notion of bits is determinism; the sender is in a single well-defined state at any given moment. On the other hand, quantum bits (or qubits) represent systems in a quantum superposition, where their states (although well-defined) can be superposed. Only after measurement does a quantum system yield a single definite state, which may persist indefinitely if left undisturbed.

### 3.3 Data Quality and Interpretation

Data and Data Size — Data can be defined as a stream of bits with a size equal to the number of bits transmitted.

In the next section, we will discuss information content (also measured in bits) which, according to Shannon’s information theory, does not always equal the size of transmitted data.

To give you a sense of what’s coming, imagine you receive constant notifications updating you on the value of Pi. The messages, in this case, carry no information since Pi’s value will never change.

But data size can also vary by applying data compression algorithms, which encode symbols observed more frequently with lesser bits, saving extra space.

Signal-to-Noise Ratio (SNR) — The signal-to-noise ratio (SNR) is a measure in communications engineering of a channel’s quality. There are different formulations of the SNR; the below is one of the more intuitive:

$SNR = \dfrac{P_{signal}}{P_{noise}}$

The above equation tells us that the signal quality is reduced by the ratio of the signal’s power to that of the noise.

Noise pollutes the signal, distorting it and sometimes changing it altogether. The quality of the data received is then questionable, and a sound receiver system must be able to recover the signal from the noise.

Data Interpretation — Although sometimes faithfully presenting factual descriptions, data cannot be trusted implicitly. This mistrust is due to our limited information processing capabilities, inherent cognitive biases, and the various ways data can be manipulated for deception.

In this brilliant lecture on Why Leaders Lie: The Truth About Lying in International Politics, John Mearsheimer depicts where data manipulation leads to faulty conclusions.

• Concealment — Partially concealing crucial information to reinforce a specific (and perhaps incorrect) hypothesis.
• Spinning — Arranging data in a particular way to promote the speaker’s view and shine a more positive light on it.
• Lying — Outright fabrication of false data

While lying is not as rare as we think it is, spinning and concealment are, in fact, much more frequent, and Dr Mearsheimer argues that society cannot function properly without these two.

### 3.4 Information, Uncertainty, and Entropy

Problem Description — Let’s take the example of a stochastic sender (modelled by a Markov process). The symbols shared belong to a closed set $\{S_1, S_2, ..., S_n\}$.

The probability of observing a specific symbol is given by $p(S_j) = p_j, \quad j \in [1,n]$.

Given this source, we would like to measure the amount of information it produces, but first, we must define the word information. We could start with the following desirable properties that we would like to associate with this measure if it were to be valuable:

• If a specific symbol arrives 100% of the time (and, therefore, all the others are never observed), the amount of information carried over the channel is zero. In this case, there is no need to share any messages. The receiver is always aware of the sender’s state!
• If all the symbols could be observed with equal probabilities, the amount of information created would be maximal. In this case, the receiver has minimal knowledge (and maximum confusion) of what to expect.
• If the number of symbols the source generates increases, we expect this measure to increase.

Shannon postulated that the only form that this measure could take if it were to satisfy the above assumptions was:

$H = - \sum_{j=1}^{n} p(S_j)\log{p(S_j)}$

Entropy and Uncertainty — The equation above is similar to that proposed by Boltzmann for entropy in thermodynamics. Entropy measures a system’s disorder; a fully unordered system will have maximum entropy.

In information theory, the interpretation is slightly different; the more entropy a source has, the more uncertainty its messages will contain.

Information Content — Therefore, information can be described by its impact on the receiver’s end, roughly equivalent to how surprised the receiver will be when reading the transmission content.

If the reader is not surprised at all, the information is trivial and non-consequential. If they are stunned, then they have observed a rare event.

## 4. Organisational Knowledge

### 4.1 Knowledge

Knowledge can be defined as a set of (mostly) coherent assumptions that a person gleans from past experiences and formal education on a (typically) specific topic.

This knowledge allows its owner to make sense of past and present experiences and predict future behaviour.

Knowledge is best evaluated by the quality and outcomes of the decisions it has influenced.

Takeuchi and Nonaka’s knowledge creation and management model assumes that knowledge is created in an individual’s mind, even below their awareness level.

How can knowledge be extracted, codified, spread, and transformed into an organisational asset if hidden deep in a person’s mind? This challenge is what constitutes the rudimentary objective of knowledge management in organisations.

To explain this more, we must understand the difference between tacit and explicit knowledge.

### 4.2 Explicit vs Tacit Knowledge

Tacit knowledge resides in a skilled individual’s subjective insights, expert judgment, and intuition. It is a working mental model developed from past experiences, cannot be easily explained or codified, and often sits below the individual’s awareness level.

Explicit knowledge is systematic, codified, and can be easily communicated to peers and written down in books and computer programs.

Knowledge conversion between tacit and explicit requires articulation efforts.

Skilled chefs, artists, and carpenters pass on their knowledge through apprenticeships, a combination of formal education and practical insights.

### 4.3 Knowledge Creation

Nonaka understood the creation of new knowledge as transforming tacit knowledge in an employee’s mind into explicit knowledge through innovation.

According to Nonaka, this conversion of tacit into explicit occurs through the social interactions among members of a human group. Through imitation, knowledge is transferred from one individual to another.

Tacit knowledge can also be converted into explicit knowledge by creating contexts where individuals must justify their beliefs.

Davenport and Prusak listed four methods to turn information into knowledge.

• Comparison — By comparing the information we have on two situations, we can identify similar and distinct elements and determine if a known method that applies to the first situation can also be used in the second with some variation.
• Consequences — How would our decisions and actions change given the information we possess?
• Connections — How do bits of knowledge combine and relate to other bits? Does it enforce our current assumptions, or does it render them inconsistent?
• Conversation — In peer conversations, information is analyzed and distilled, updating our mental models.

Interestingly, in Takeuchi and Nonaka’s work, as well as Davenport and Prusak, knowledge is created first and foremost in the human mind with information at its disposal.

### 4.4 Knowledge Embedded in Processes

Hiking, scuba diving, surgery, and many other risk-prone activities are heavily constrained by rules and processes to ensure operational success.

The surgeon’s protective clothing, tools’ checklists, patient preparation, and handwashing rituals were perfected over many years, even generations, and have come to incorporate a massive body of knowledge that cannot be easily explained or communicated.

Every failure in an operation’s theatre provided a learning experience and an opportunity to improve existing processes and expand or reshape the frontiers of the present knowledge.

You can only codify a fraction of this vast wisdom in books, journals, or magazines, and new surgeons can only be prepared by a combination of formal education and apprenticeship.

Sometimes, explaining why certain traditions and cultures maintain specific assumptions is challenging. Still, their survival over many generations signifies their usefulness and value.

## 5. Final Words

We conclude this article with a sobering thought on the nature of competition in modern times.

Historically, organisations retained their competitive advantage by aggressively protecting their trade secrets and defending their sources of raw materials.

With globalisation, technological advances cannot be easily protected, and competitors quickly imitate and even improve original ideas, swiftly becoming mainstream.

The strategic imperative of securing a competitive edge forced organisations to turn to knowledge assets and intellectual capital as a source of constant innovation and improvement and to keep them at least one step ahead of the competition.

Paul Romer’s original quote vividly illustrates the point.

In a world with physical limits, it is discoveries of big ideas (for example, how to make high-temperature superconductors) together with the discovery of millions of little ideas (better ways to sew a shirt) that make persistent economic growth possible. Ideas are the instructions that let us combine limited physical resources in arrangements that are ever more valuable. — Paul Romer

Combining influential discoveries with Operational Excellence replaced cutting-edge technology as the prime source of competitive advantage.