Information
Theory and Entropy – Their Relevance to Philosophy Home
Dr
Donald Cameron BRLSI – 11th February 2004
Entropy
is the name given to a quantitative measure of information and also to a quantity
defining the ability of gases to yield mechanical work in heat engines. The
analogy is valid and useful, but must not be taken outside its limits. The
concepts of information, order, redundancy, noise, selection, Maxwell’s Demon
and the evolution of memes can be of great service to philosophy.
Information is a word
that we use a great deal and we all have a pretty good concept of what we mean
by it. It comes in quantities – we sometimes speak of a lot of information or
very little, although most people have not gone so far as to ask in what units
it might be measured. It has the strange property that you can give it away,
but still keep it for yourself.
It is generally
thought that information has some value. I remember one young graduate seeking to
prove that greater knowledge was not sufficiently recognised in his rate of
pay. He reasoned as follows:
Equation 1: Knowledge
is power
Equation 2: Time is
money
Equation 3: power =
work/time
Substituting from
equations 1 and 2 into equation 3, he obtained
Knowledge =
work/money, which can be rearranged as
Money =
work/knowledge.
Thus money is directly
proportional to work (as you might expect) but, surprisingly, is inversely
proportional to knowledge.
Now this conclusion,
like much published philosophy, is complete rubbish, because the reasoning
lacks precision, but at least in this case its author was happy to admit that
it was rubbish. In the 1950’s CP Snow pointed out to members of the arts
faculties that knowledge of the second law of thermodynamics was a prerequisite
to any claim to a complete education. Sadly, much of the erudite-sounding talk
about information theory and entropy that has been heard since is no better
than the reasoning of the young graduate about his pay.
Let us begin with a
short course in information theory – I promise to use only the simplest of
mathematics. Think of information passing in a channel from a transmitter to a
receiver. If a binary digit, a 0 or a 1 can be sent, then the receiver, who
previously thought either result equally probable, will now know which one of
the two possibilities it is. If a letter of the alphabet is sent, the receiver
would know which out of 26 possibilities it is. Clearly an alphabetic letter
gives more information than a binary digit. Perhaps we could say that the
measure is 26 for one and 2 for the other.
But now suppose that
we receive a second letter. The number of possible two-letter sequences is
26x26 or 676. This doesn’t seem to be working, because common sense tells us
that reception of two letters should be twice as much information as one. Our
measure should add, not multiply. Logarithms have the property that
log(x)+log(y)=log(xy) – in effect they transform a multiplication operation
into an addition. As far as I can remember, logs provide the only function that
will do that. Using the logarithm of the number of possibilities, we now have a
measure that describes the information received, and it is additive. There must
be few of us left who can remember seriously using log tables to perform
multiplications before the age of the electronic calculator, or using a slide
rule, which is based on the same principle – its graduations are to a log scale
and its distances can be added to give multiplication.
Using this measure,
the information content of a binary digit is log(2) and that of an alphabetic
letter is log(26) and for any other signal it is log(1/p) where p is the
probability at the receiver of receiving that signal before it arrives. It
turns out that this measure of information is very useful and it is used by
communications and software engineers to good effect. This is the quantity that
has been called “entropy” of which more later.
Usually, in
information theory, the logarithms are taken to the base 2 – then, since log2(2)
will equal 1, a binary digit will carry one unit of information. This is called
one “bit”, but using our formula, every other kind of information can be
measured in bits. For example, for an alphabetic letter, log2(26) =
4.7. This tells us that four binary digits would not be enough to code a
letter, but it could be done with five with room to spare. Let’s check this
out: four digits gives 2x2x2x2=16 possibilities – not enough for a 26 letter
alphabet, whereas another digit multiplies by two again giving 32
possibilities, enough for the 26 letters and a few other symbols.
This very basic
introduction to information theory does not give its full scope. In fact every
kind of information can be measured in bits by extending the same basic
principle – continuous wave forms, pictures and whatever, but you will be
relieved that it is beyond the scope of my talk this evening to go through the
mathematics of that in detail. But there is one further bit of number work I
would like to explore before abandoning the mathematical approach.
We have noted that the
quantity of information transferred is dependent on the prior probability of
each symbol at the receiver. If we have a system that sends 0s and 1s, but with
a probability of 0.8 for 0 and 0.2 for 1, the information conveyed by each
symbol is different. Receipt of a 1 gives log2(1/0.2) = 2.322 bits
of information, because 1 is a more unusual event, whereas 0 works out to be
log2(1/0.8) = 0.322 bits, because it is relatively common. This
accords well with common sense. We would feel that we had received a larger
amount of information, if we were told that there was a crocodile on the lawn
than if we learned that there was a cat.
For this system, the
average information transmitted per symbol is
0.8 log2(1/0.8)+0.2 log2(1/0.2)
= 0.722 bits
instead of 1 bit per
symbol when the 0s and 1s are equally probable. This is an important result of
information theory. When the symbols are not equally probable and the recipient
can guess something about the incoming signal, the quantity of information
received is reduced. Maximum information transmission is possible, only if the
symbols of the alphabet are equally probable and independent of each other.
For English text, the
symbols are, by no means, equally probable and independent. There are far more
Es than Zs, words have to be recognisable English words with particular
spellings and arranged according to grammatical rules. Sequences that describe
very improbable events are themselves very improbable. This greatly reduces the
number of letter sequences that are possible. These constraints reduce the
information capacity of written English from 4.7 bits per letter to around 1
bit per letter. In past times, when information transmission was more expensive
than today, telegraph companies would have codes for commonly used phrases like
“happy birthday” or “best wishes for mothers’ day”. They could be transmitted
more cheaply using a short code, because they conveyed very little information.
This inefficiency is
described as redundancy – it becomes necessary to have a longer message than
would be needed with a maximally efficient code. But redundancy is not always a
bad thing:
It mrkes ir powsiblrto
undetstand thr meshage egen in a veru noisy trarsmisdion.
Without redundancy,
our communications would have no defence against noise, but in a noise-free
environment, we can code information so that it can be transmitted more
efficiently. When we buy software, the installation process may involve
converting it from a compressed format. This has been done to save space by
sending the information in a form that has less redundancy, where the
programmer can be confident that the transmission is reliable.
Noise is the part of
the received data that is not carrying the required information to the
receiver. It can also be measured numerically and communications engineers talk
about the signal/noise ratio. If noise is high, it requires more redundancy in
the transmitted message to overcome it. Of course, noise is relative to the
receiver’s point of view. People at a party talking loudly in an unfamiliar
foreign language might make it more difficult for me to hear the person I am
talking with. Their signal is our noise, yet our signal would be theirs.
So much for
information, but what about entropy? I am proud that I passed my undergraduate
years in the very laboratories in Glasgow University where Lord Kelvin and
others were making their discoveries about the Second Law of Thermodynamics. Of
course, just being there did not guarantee that I absorbed the information –
quantified or not! Yet Lord Kelvin is one those people like Churchill whose
quoted words have survived to become part of our culture. Examples of his
sayings are “the steam engine has given more to science than science has given
to the steam engine” and “when you cannot express it in numbers, your
understanding of a subject is of a meagre and unsatisfactory kind; it may be
the beginning of knowledge, but you have scarcely in your thoughts advanced to
the stage of science”. These offer good advice to scientists, engineers and
especially philosophers even today. He also made some mistakes, such as
declaring both flying machines and evolution impossible. He was one of the few
people to have a good reason to hold the latter view, because his calculations
of the heat escaping from the earth implied that it must have been very hot a
relatively short time ago. The problem was solved only after both Kelvin and
Darwin had died, when it was realised that radioactivity was replenishing the
earth’s internal heat.
But what has
thermodynamics got to do with information? At first sight, we are dealing with
the behaviour of gases, engines and fuels. Carnot in France, Kelvin in Britain
and Clausius in Germany were all primarily concerned with the efficiency of
steam or other piston engines and the behaviour of the gases that drive them.
Imagine a cylinder
containing gas in a small volume under pressure. The piston can be driven out
to do work, but as the crank goes round and the gas is recompressed the same
amount of work must be done on the gas to get it back to its starting point –
rather more, in fact, when friction losses are considered. To make a useful
engine we need to use heat. If heat is applied to the compressed gas, its
pressure will become even higher and the piston can be driven out with greater
force. If the gas is now cooled, the pressure will fall and the return stroke
can be done with less force. We have an engine that will deliver a net work
output over its cycle.
But we have had to
give up some of the heat energy in cooling the gas. This is “waste” heat. It,
no doubt, cost us fuel to create and now it is degraded to a lower temperature
where it cannot be turned into work.
The First Law of Thermodynamics
says that energy can neither be destroyed nor created and heat and work are
equivalent and can be measured in the same units. The Second Law says that
while work can always be fully converted into heat, heat cannot so readily be
converted into work. The task of the nineteenth century researchers was to
understand and quantify this apparent law of experience. Their result was to
discover that a quantity named “entropy” could increase, but for any
self-contained system, it could never decrease. This, in itself, had
considerable philosophical impact. Here was a proof that there is an arrow to
time. The processes of the universe are irreversible in a much more fundamental
sense than the difficulty of putting the toothpaste back in the tube.
Professor Hammond,
last October, gave us an excellent introduction to entropy and he rightly said
that, while we have little difficulty in visualising quantities of heat and of
work, it is much less easy to have a feel for the quantity called “entropy”. We
can define the change in entropy as the amount of heat transferred divided by
the temperature:
dS = dQ/T
Suppose we take a
quantity of heat Qhot from an infinite hot source and use it to
drive an engine. We need an infinite, or at least very large, source so that
the temperature doesn’t change much as we draw heat out of it, to make our
mathematics easier. The amount of work generated is W and the amount of heat
liberated to an infinite cold sink is Qcold.
The First Law states
that energy is conserved, the heat from the hot source must equal the sum of
the work done and the waste heat released to the cold sink:
Qhot = W + Qcold
The Second Law states
that the entropy added to the cold sink must be greater than or equal to the
entropy taken away from the hot source:
Qcold/Tcold >= Qhot/Thot
If our engine is
maximally efficient, the entropy does not increase, but just manages to stay
the same and this relationship becomes an equality. The efficiency of our
perfect engine is the ratio of the work output to the heat input:
Efficiency = W/Qhot = (Qhot – Qcold)/
Qhot = 1 – Tcold/Thot
This is an important
and interesting result. It means that there is a fundamental limit to
efficiency. No matter how carefully we reduce friction and polish the exhaust
valves, we cannot do better than this. Suppose we could devise a machine to use
the heat from the sun at 5000 Kelvin and use space at 3 Kelvin as the cold
sink. The efficiency would be 0.9994. But if we have a more terrestrial engine
using heat at 500 Kelvin and discharging at atmospheric temperature of 288 K,
the maximum attainable efficiency would be 0.424. Real practical engines suffer
from friction, which turns some of their work into heat and increases entropy,
making the efficiency even lower.
You will be wondering
what this could possibly have to do with the measures of information that we
talked about earlier. A clue comes if we examine the entropy of a unit mass of
a substance. If we add a small amount of heat dQ, the temperature will increase
by dT so that dQ=CdT, where C is the specific heat. This small increment of
heat will cause an increment of entropy of dS=dQ/T, or dS=(C/T)dT. Assuming
that the specific heat is constant, integration gives us S=C log(T).
Ludwig Bolzmann who
died in 1906, had the expression S = k log W carved on his tombstone. His
contribution was to formulate the second law of thermodynamics in terms of the
probable arrangements of atoms and their energies. In this context, W is the
number of arrangements that would give the same observed state at the
macroscopic scale. In effect W is proportional to a probability and Boltzman is
saying that the world is tending to move to a more probable state. By deriving
the second law mathematically he has arrived at a formulation similar to that
later found for information transmission. So is the measure of information and
that of entropy the same thing, as some would have us believe?
The answer
is clearly not. Like all analogies, it is useful where it corresponds, but we must
be careful not to deceive ourselves where it does not. For example, I have five
plastic tiles, each bearing a single letter, here on the table. They spell the
message “BRLSI” and I will now sweep them off the table. The message on the
tiles is destroyed as they fall to the floor. They also convert their potential
(height) energy into heat with an increase in entropy, but that is clearly not
the same thing. Their thermodynamic result would have been the same even if I
had mixed them to destroy the message before sweeping them off the table.
One of the
big differences is that information can be replicated. The information carried
by these tiles has not been destroyed because several copies of it still exist
in my mind and yours. This is an essential difference to which we will return
in a few moments.
Of course, the measure
of the quantity of information that we have been discussing is useful, but it
is not a full measure of its value to the receiver. That value would be the
improvement, in terms of the receiver’s own objectives that could be achieved
as a result of decisions made in the light of the new information. To know the
position of 99% of the lions in Africa would be a very much greater quantity of
information than that of just the one lion, concealed near the path where you
are intending to walk. It is easy to understand which parcel of information
might be most useful.
We, as evolved
animals, have innate capacities to process information, so as to spot
correlations and to make decisions that will promote our survival and
reproduction. Often we do this rather well, but quite unconsciously of the
mechanisms taking place in our heads. We also process information about
information. Knowledge of the provenance of information is very important to establishing
its credibility. A little analysis may be able to give us more understanding of
what we are doing. There are a number of different concepts here, which require
some effort to disentangle, but it is well worth while to try to clarify them.
Information only
exists when it is carried by physical entities of some kind that give a coded
representation of a source message from another person or ultimately some
aspect of the real world or our own internal world. These entities may be
electrochemical impulses in a nerve cell, sound waves, electrical pulses in a
wire, letters on a page, flag movements or whatever. A code is obviously
necessary for information transmitted between people, but it is also true about
the world that we suppose we directly observe. In fact we do not observe it
directly. The three-dimensional world that we see out there is coming to us on
rays of light that have bounced off its features and formed a two-dimensional,
upside-down image on our retinas; it is then converted into electrochemical
nerve impulses and sent to different areas of the brain that identify straight
lines, movement and other characteristics and then bring them together somehow
to give an understanding of a 3D physical reality. Similar physical events lie
behind our other senses. Direct observation of the real world is both noisy and
full of redundancy. We are good at using the redundancy to overcome the noise.
We are almost
unconscious of the astonishing amount of code conversion that is happening in
normal life. Suppose a friend tells me a telephone number and I email it to
someone. The number is stored in some kind of code in my friend’s long-term
memory. It must be turned into a sequence of nerve impulses controlling my
friend’s mouth shape and vocal chords to make it appear as sound waves in
spoken English. It is then converted into movements in my ears and then coded
as nerve impulses creating a record in my short-term memory, probably in a
different code. To write it down, a complex series of codes are emitted by my
brain to do the necessary hand movements with visual feedback. I then forget
it, but re-install it from my note by looking at it, causing images on my
retina that must be identified as shapes and then recognised as letters and
numbers, then processed as language and stored on my short-term memory. More
code conversion is needed before I type it into the keyboard of my computer
where mechanical strokes are converted to electrical impulses and then into a
binary code, recorded by a magnetic medium and transmitted as pulses in the
telephone line. Lots more decoding and recoding will occur before the receiver
of my email will have used the phone number to make a call.
Our whole feeling of
existence, our consciousness, is the not-yet-fully-understood effect of these
myriads of nerve impulses. We could paraphrase Descartes to say “I process
electrochemical impulses in nerve fibres, therefore I am”. Neuroscience has
only begun to scratch the surface and we can wonder whether it will ever be
possible to understand the whole of the wonderful complexity of the human brain
with no more than a human brain to investigate it. But certainly it is worth
trying – let us not join those who have made the mistake of declaring something
impossible (like Lord Kelvin about flying machines). Although much remains to
be explored, a great deal is already known about how the brain works, and
valuable insights into its mechanism have been gained by observing patients who
have suffered accidents or disease to the brain. It is astonishing that, even
with the amount of evidence available today, there are still some desire-driven
people who hope that the mind is something more than this – something
supernatural. We should not waste our time with them, because a theory that is
not sourced in observed information has no value.
When the information
coming into our senses contains redundancy, it is possible for us to reduce it
to a more compact form. That is what is known as forming a theory. For example,
we can see and record the movement of the planets with their mysterious
wanderings and retrograde movements in the night sky. It seems like a lot of
data, yet it can be compressed. When Kepler observed that, in three dimensions,
their movements are almost, but not quite, ellipses, he achieved considerable
data compression. But when Newton proposed that each body is attracted to
another by a force proportional to their masses and inversely proportional to
the square of the distance between them, and found that their paths fitted this
as perfectly as observation could tell, it was a great step forward. We may not
know what causes this force, but it is still a massive removal of redundancy.
That is what we mean by an elegant and successful theory identifying a natural
law.
A theory has predictive
power. Why is this? Going back to our data stream of letters, let us suppose
that we receive the sequence “BZUQFAIUGBEWLAJ”, can you guess what the next
letter is going to be? Perhaps not, I certainly can’t, but if we receive the
stream “AAAAAAAAAAAAAA” then it is not too hard to predict that the next letter
is probably “A”. And because we had a good idea that it would be “A”, the
confirmation that it is, in fact, “A” gives us very little information. This is
a highly redundant data stream. The discovery of a theory is thus the discovery
of redundancy in observed data and the reduction of its information into a
smaller amount of data. Its predictive power is no more, in principle, than
supposing another “A” is likely after a long stream of “As”. Having discovered
a deductive method of reducing a complex stream of data to something simpler
that seems to occur every time, we have what we call a natural law.
It should be said that
the predictive power of a theory refers not only to the future, but also to any
observations that we have not yet made. Evolutionary and geological theories,
for example, give us information about what has happened in the past, although
it sounds a little odd to speak of predicting the past. It follows also, from
this view of theory-making, that the theory can be no more than the evidence on
which it is built and it is useless to build elements of it that are neither
confirmed nor disproven by the underlying observations. Here we are approaching
a restatement of Occam’s Razor!
Part of the predictive
power of a theory lies in the concept of cause. A friend of mine, an amateur
meteorologist, discovered an important natural law about wind. He noticed,
after many observations, that every time he saw the trees moving their leaves,
it was windy. It was clear to him that the wind was caused by the trees waving
their leaves and pushing the air along. Wrong of course, but it is not clear
why. Only when we gather more data, or perhaps use data we already know, can we
prove that. The direction of causation is often not quite so clear. Suppose we
discover that consumption of a certain vitamin is correlated with symptoms of
clinical depression. Does this mean that the vitamin causes depression, or does
it mean that depressed people tend to consume more vitamins? The direction of
causation is not, however, something that complicates our idea of theory
making. It is just another detail of reducing a complex mass of data to find a
sequence that seems likely to repeat.
It is not just in the
pinnacles of science, but in ordinary life too, we are forming theories. By
spotting the patterns in the incoming data we achieve that predictive power. We
have absorbed information about how people move in general and we observe how a
particular person moves on the pavement towards us. We are usually able to
predict that person’s movements enough to avoid a collision. A very large part
of our mental daily life is the forming and use of these small theories. Whatever the level of importance, predicting
the results of alternative decisions we might make, then evaluating these
decisions against a value scale and choosing the decision that gives the best
result, we have the essence of how our mental apparatus works.
To use any
information, we have to know the code. A code has much in common with a theory.
The ability to decode the evidence from the “directly” observed real world
requires a theory of what the observations tell us. In principle the
interpretation of streams of impulses in thousands of nerve fibres could be
studied to produce a theory of the three-dimensional world that we all believe
in. This theory or code has probably been largely installed in us by evolution
acting on our ancestors’ genes, although some parts of it may be learned
through a capacity for learning that is, in turn, coded in our genes. Obviously
a capacity for learning requires a complex mechanism before it can begin. We
are mostly unconscious of that mechanism.
Languages are learned
by our brains correlating the spoken words with the decoded images coming from
direct observation. We do not have any particular language genetically
programmed within us, but we certainly have apparatus in our brains
specifically adapted to learn and use language. Experiments with even our close
relatives, the chimpanzees, show that, although they can acquire some
rudimentary language ability, they lack the brain hardware to progress very
far. A second language is usually learned by correlating it with the first
language. When we become more fluent, we no longer code our thought into
English and then recode in the second language; instead we go directly from
thought to word. When we eavesdrop on a conversation in a language that has no
common features with one that we have learned, it sounds like random noise – we
can extract no meaning from it.
To clarify these
ideas, let us try a thought experiment. Let us suppose that we are members of a
team of data-processing experts in the future and we are charged with doing the
job of a human brain. And suppose that we are equipped with computers whose
power is far greater than today’s, but we are not allowed to assume any
information other than the idea of theory making just discussed. All other
information must come from the incoming nerve impulses. These impulses are called
“action potentials” and involve an electro-chemical process that travels along
the “axons”, or main nerve fibres, of the nerve cells. Normally the axons
terminate on the “dendrites” or secondary fibres of other cells or on the cell
bodies themselves. In the real brain, these receiving cells then process the
incoming signal in some way and decide whether to fire an impulse themselves,
sending information to further cells.
For our experiment, we
are going to remove the brain and take its place. We have incoming fibres
giving us signals, but we have no knowledge of any code, or even which sense
organ they come from. Remember, we cannot “see” what is going on outside; we
don’t even know that there is an outside. We have only the nerve impulses. If
this is all we have, what can we do? We can find out whether there are any
correlations within the incoming data itself. The data stream from one fibre
may provide some clue as to what pattern to expect in another fibre at the same
time, and perhaps a clue what to expect at a later time. The brain has outputs
which we can call decisions. By sending certain sequences of impulses down
these outgoing fibres, we find that the incoming data stream alters in a predictable
way, at least in part.
Assuming that we are
data processing experts of extraordinary skill, we may well have formed
theories that allow us to predict sequences of incoming data and to learn how
to alter them with our decision outputs. But we are still a long way from being
a human brain. If the blood-sugar level of our body drops, for example, that
will only cause a change in the pattern of incoming impulses. It is possible
that we may have been clever enough to discover that a complex pattern of output
signals can cause a search for certain circumstances that will result in input
signals that can be compared with a database and found to be in a certain
category. Perhaps this category has been given the quite arbitrary name “food”.
Further output signals will carry out the complex control task of picking it up
and eating it, causing blood sugar levels to rise again.
We might do that once
or twice out of intellectual curiosity, but why continue? After all, there are
many other correlations between output and input to explore. What our team
lacks is any information on what we are trying to achieve. There is no
information in our hands to say what our goals are (and we are not allowed to
smuggle our own goals in). We have been discovering correlations allowing us to
control our incoming signals to some extent, using our decision output signals.
But we have no information to specify that one set of incoming signals is
preferable to another. Where can we get this information? Clearly not from the
incoming signals themselves, because these are what we are trying to compare.
To work as a substitute brain, our team has something missing that it cannot
create. The objective, the goal, in terms of detectable inputs, would have to
have been provided in advance.
Now this is not such a
surprising result. In fact we have just re-discovered Hume’s law that you
cannot derive value information from fact information, you cannot derive an
“ought” from an “is”. Or, to put it another way, fact information is about how
the world is, value information is about how we would like it to be, and these
are two different things. They cannot be derived one from the other.
From this it follows
that, to work as a decision-making device, a brain must have some starting set
of values or preferences programmed into it genetically. It cannot get that
from its nerve-signal inputs and we can see why the widespread belief in the
blank slate or “tabula rasa” was complete nonsense (in fact usually nonsense
motivated by political desire). So we can see that without some genetically
specified values, a brain simply cannot begin to function. This clarification
of Hume’s Law is a fact with great relevance to philosophy. Values cannot be
deduced from facts alone – that is the naturalistic fallacy. Equally, of
course, facts can not be deduced from values. Those who say, “That would be too
unacceptable, we cannot possibly believe that!” are also wrong.
Although value information transmitted by inheritance must be present in all brains, human and animal, it is likely that a large amount of fact information is too. Although a great deal of learning does take place, particularly in the early years, it would be surprising if some of the theories that we might make were not hard-wired genetically also. What a massive task it would be to crack the code of incoming nerve impulses to produce a theory of a three-dimensional world of solid objects and all the other concepts of the “real world” that we take for granted. Our theory of other people’s minds is almost certainly a built-in mechanism, which can be seen to go wrong in autism. It seems more likely that our evolution will have programmed some fact theories into our brains, but it is absolutely necessary that it has programmed rules of logic and inference together with a basic value function.
For example, although we can consider four-dimensional space, and deal easily with it in mathematical terms, we do not really believe in it. Yet the neural circuitry could probably be constructed to handle it just as easily as it does for three dimensions. But it has not evolved to do so, probably because the world really is three-dimensional.
William Paley, in his Natural Theology of 1802 gave his
classic argument for the existence of God: “In crossing a heath, suppose I pitched my foot against a
stone, and were asked how the stone came to be there, I might possibly answer,
that … it had lain there forever … but suppose I had found a watch upon the
ground, and it should be inquired how the watch happened to be in that place, I
should hardly think of the answer which I had before given, that, for anything
I knew, the watch might have always been there. Yet why should not this answer
serve for the watch as well as for the stone? For this reason, that, when we
come to inspect the watch, we perceive (what we could not discover in the
stone) that its several parts are framed and put together for a purpose. …the
inference we think is inevitable, that the watch must have had a maker – that
there must have existed, at some time, and at some place or other, an artificer
or artificers who formed it for the purpose which we find it actually to
answer; who comprehended its construction, and designed its use.”
Paley did not exactly
succeed in proving God’s existence, but we must agree that he had a powerful
point. The evident non-randomness of the watch makes it something different
from the stone – something that needs explanation just as he claims. Living
things show the same evidence of a property that seems to imply purpose. We need
a precise understanding of what that is.
What we call order,
information, or non-randomness are often used loosely as interchangeable terms
and I must confess to doing it sometimes myself, despite my aim to stick to
what is mathematically definable. Order and information are, in fact, very
different concepts. For example, the sequence “ABABABABABABAB…” shows great
order, but carries little information. If we are told that the next letter
turns out to be “A” and the next “B”, we would not be surprised; our
probability of that event would not have changed much. It is not difficult in
this case to form the theory that the alternation of “A” and “B” is a property
of the sequence and a more succinct descriptor of it. This is the
theory-building process that we have discussed. A data stream has the property
of non-randomness, when there is identifiable redundancy so that theories can
be made. It would never be possible to build a theory or crack a code, if every
bit of data available to us gave a brand new piece of information unrelated to
anything else.
When we think of the
problem of information and order in living things that Paley has posed. It is
clear that the study of biology gives us a great deal of information. We have
discovered things that our untutored brains could never have dreamt of.
Evolution is an extraordinary machine that has generated vast amounts of
information. Yet the data that comes from biology also contains a great deal of
order in the sense of the word defined above.
In particular we find,
as we explore many different biological phenomena from the body chemistry of
dung beetles to human marriage customs, from the migration behaviour of birds
to the action of organelles within the cell that control the transfer of
information from DNA to RNA and to proteins, we observe that they all seem to
be serving a similar apparent purpose. That purpose is to give properties that
William Paley would have called “useful”. But we can now more accurately
describe it as the optimisation of features that will tend to increase the
number of the organism’s descendants. The reason why Paley’s idea of usefulness
was aligned with biological fitness is no coincidence because, like the rest of
us, Paley was a product of natural selection!
Out of the many, many
possible arrangements of atoms that might exist, this is a very small subset
and it is a quite distinct signature of living things. Biological theory has
distilled it into a succinct message, which is not far from a simple statement
that natural selection exists and has acted. The almost infinite complexity of
the structure and behaviour of living things has been generated by this
information source. Of course we find some features of living things that are
not yet optimised by natural selection, but these things (such as sickle cell
anaemia and addiction to drugs) exist only where natural selection has not had
enough time to act. They seem to be the noise, not the signal.
Creationists are people who regret, as I do myself, the
damage that scientific evidence has done to religion. But instead of bracing
themselves to accept the truth, whatever it may be, they have made denial into
an art form. They ignore the great bulk of scientific evidence, but cling
gratefully to any scrap of science that looks like supporting their desires.
Creationists often claim that evolution contradicts the
second law of thermodynamics, because it brings order out of chaos; it creates
non-randomness. In its direct application to the distribution of heat energy, this
is clearly false. The sun has supplied enough high-grade energy throughout
evolutionary history. But, in the analogy between information and entropy, the
idea might make some sense. Just as the entropy of the universe, or any closed
part of it, can only increase, information flowing in a closed channel can only
degrade. If the signal can not be added to, it will ultimately decay into
noise; the universe will ultimately collapse into chaos and improbable
arrangements of atoms will be seldom found.
But the second law of thermodynamics is not absolute. As
Boltzmann showed, it is a statement of probability, but with such a large
number of atoms in the universe, the statistical average inevitably prevails.
Yet, in the 19th century, James Clerk Maxwell presented a
counterexample. He imagined that two chambers of gas at the same temperature
might be connected by a small aperture. At the aperture sits a tiny demon with
a table-tennis bat. As molecules approach from chamber A, he holds it up
against all of the fast moving molecules so that they bounce back, but allows
the slow ones to pass. Molecules approaching from chamber B are bounced back if
they are slow, but allowed to pass into chamber A if they are fast. In time the
gas in chamber A will be at a higher temperature than that in chamber B without
any external source of energy and the second law will be broken. The demon has
added no energy – only order. We confidently use the second law today because
Maxwell’s demon does not exist. But if some unlikely future nano-technology
could ever produce a machine to work as a Maxwell’s demon, we would have a
solution to the world’s energy problems at a stroke! (The world of course has
no energy problem, because, by the First Law, it can neither be created nor
destroyed. What it has is an entropy problem.)
But when we come to the ordered content of living things,
the situation is different. The analogue of Maxwell’s demon is alive and well
and it is called natural selection. This force has been present for as long as
replicators have existed, steadily producing non-randomness by selecting some
and discarding others. Evolution does not reduce thermodynamic entropy, but it
does reduce random disorder because these are not the same thing. We have
passed outside the areas of correspondence in the analogy. This is why the
creationist appeal to the second law is as false in its information-theory
analogy as it is in thermodynamics. The process of natural selection really can
create order out of chaos and can generate an enormous quantity of information.
All living things are highly ordered or non-random. They
contain information. Our minds are the function of our nerve cells; our nerve
cells have the same source as our muscle cells, bone cells and blood cells.
They are all the products of evolution. All of these have been formed by many
random events that have then been sifted by natural selection. Natural
selection is the only source of information that defines our beings. We are
made of very ordinary atoms and many of these atoms are not even the same ones
that were in our bodies ten years ago. But the essence of a person is the
information content embodied in the arrangement of the atoms. This information
has come from natural selection combined with the random contingencies of our
individual and ancestral histories.
It is common today to stress that evolution works to no
purpose. This is certainly true in the sense that it works to no purpose
outside itself, but it works to a purpose of its own – it is the blind
watchmaker. It is the only source of directed behaviour, other than accidental
contingency. It is the only source of purpose that has been discovered in the
universe. This is not a conclusion that appeals to human vanity, yet that vanity
too is a product of evolution.
But against this picture we have the very large body of
information with which we are most familiar – that of human culture. Over the
last few centuries this fund of information has been causing much more dramatic
changes than the slow working of natural selection. A new improvement in
technology, or even a new false belief, can spread through the population at a
far faster speed than it could ever do by inheritance. It is not only fact
information, correct or otherwise, that can spread – values can too. We all
have an instinct to adopt some values from others in our communities and a fear
of breaching the accepted taboo. When we look at the change in many socially
accepted values over the last hundred years it has been extraordinary. In 1900
it was perfectly acceptable to refuse entry to a restaurant or bus because of a
person’s race, yet it was a serious scandal to confess oneself an atheist and
homosexuals were sent to prison. If it were found out that two unmarried people
of opposite sex had even spent a night unaccompanied in the same house, quite
serious condemnation could result.
This very substantial effect that culture has on our values
seems to rule out my earlier assertion that their origin must ultimately be due
to information-generating property of natural selection. Perhaps we can throw
some light on this by examining a model of information flow in the development
of culture. Clearly people are acting as transmitters and receivers of
information and there are various channels of communication. These include the
spoken word, the written word, even facial expression can pass information.
Nowadays, the transmission of information is greatly changed by television,
Internet and email. Yet it must be beyond dispute that the information cannot
originate in the communication channels. It must come from the transmitters, so
the conclusion happily accepted by many, that our values are an emergent
property of our cultural development, must be wrong. (“Emergent” is, in any case,
a word that we should treat with great suspicion. It is not an explanation, but
rather a pious hope that coherent information can come together out of a random
jumble somehow. This cannot be true – information theory tells us that
information passing in a channel can degrade, but it will not increase in
quantity.)
But there is a complex effect happening here. The value
information that the transmitters put out is conditional on what they receive.
The expansion in the means of transmission changes the way in which the
instinctive values are expressed. In simple terms, the existence of instant,
long-distance communication makes distant people seem more like members of our
home community. Much of what passes for moral rules is actually the result of a
multi-person, negotiated cooperation. This cooperative structure can grow at a
much faster speed than could inherited information, but it is nevertheless the
product of people who are serving their inherited drives. We support social
rules because it is good for us to do so.
In 1978 Richard Dawkins launched the metaphor of the meme. A
meme is an item of human culture – information that is passed between one brain
and another. It can be of any length from a new word to an entire religion. It
may concern a factual belief or an expression of values and it may be right or
wrong. Information packages with all these qualities inhabit an environment of
human brains. When one person communicates to another, they reproduce, some
more successfully than others. When they are forgotten, or the brain carrying
them dies, then one of their copies dies.
The memes resemble living things by surviving and
reproducing to different extents. Those that catch on will replicate more;
others will die out. It is clear that natural selection will operate. Because
the speed of mutation and reproduction of the memes is so much greater than the
genes, memetic evolution seems to be much more rapid. In recent times, the
speed of cultural evolution has been so great that some commentators have said
(wrongly) that genetic evolution has reached an end point and that memetic
evolution has taken over. Many accept the idea that our identity must be
considered as a blend of the genes and memes that inhabit us, but that is a
rather sterile model, which prohibits further analysis until the blending
process is better understood.
But having shown that information, including value
information, can be created by natural selection acting on genes, must it not
be true that the same kind of information could be created by the natural
selection of memes? In fact, the analogy between memes and genes is not
complete enough to allow this to happen. Memes may evolve in the same way that
plants or fungi evolve, but values only evolve as part of a replicating information
processor. Although memes are information, they do not process it themselves.
The only information processors on the scene are the human brains that provide
their environment.
Nevertheless memes and genetic people are species that
co-evolve, adapting to each other. Memes will adapt so that they replicate best
in the population of human brains that they find around them and the genes will
adapt to replicate best in the environment of memes that they find themselves
in. Neither will be evolving for the benefit of the other, however, different
species never do, but each will try to exploit the other for its own benefit.
My own metaphor is to regard people as being farmers of
memes. Most of the memes that our genetic selves cultivate are of benefit to us
and we seek them from others, just as a farmer needs to obtain seeds. Their
evolution in the community of our brains is perhaps more akin to artificial
selection than natural selection. Even the mutations of the memes are not
always random, but are sometimes made to a purpose defined by genetic evolution
coded within the brain – a process less like random mutation that might be
called “memetic engineering”.
But not everything in our meme garden is lovely. There are
memes that do not serve our genetic purpose – the analogues of weeds or even
viruses. Since memetic evolution is so much faster than genetic evolution,
perhaps there is a danger that it will outrun its genetic hosts and take over?
Certainly some infectious memes, like drug taking, seem to work against the
interest of the genes. Maybe we will be able to domesticate some other memes to
help us control the memes, as a sheepdog controls the farmer’s sheep. Perhaps
philosophy, still a semi-wild animal today, could be tamed to do this job for
us?
My talk this evening has touched on a number of subjects,
and I hope I have convinced you of the relevance of them to philosophy. The
differing concepts of information and order; redundancy and noise, natural
selection as a creator of information; Maxwell’s demon which might have done
so, but does not exist; the communication network view of human culture, the
idea that tracing information to its source is essential to its verification;
the natural selection of memes; the human project of meme farming and the human
misfortune of meme infection. Much work remains to be done, but philosophy is
about information, I am sure that these concepts must be useful to its future
progress.
So now, I would hand you back to our chairman, and, with his
permission, I would be happy to answer any questions or comments that you would
like to raise.
Handout
sheet:
Information
Theory and Entropy – Their Relevance to Philosophy
A quick summary:
1. Information can be
measured by using the logarithm of the probability of the received message; its
quantity is sometimes called “entropy”. “Redundancy” occurs when received
signals are not independent Redundancy can be good, as it gives a defence
against noise.
2. Entropy in thermodynamics
and has the units energy/temperature. The Second Law of Thermodynamics states
that the entropy of the universe, or any closed part of it, can only increase.
It defines an upper limit to the efficiency of heat engines.
3. Boltzmann’s epitaph, S=k log W commemorates his
demonstration that the second law is equivalent to saying that the world will
move towards a more probable state. That is the reason for the use of the word
entropy in both fields.
4. The relation between information and entropy is an
analogy and like all analogies it must not be stretched too far. They are not
identical. In particular, information can be created and replicated. The value
of information can be very different from its quantity.
5. We have evolved an extraordinary capacity for coding
information and tracing its provenance. The formation of a theory is equivalent
to the identification of redundancy in the information stream available to us.
6. By considering the source of information, we can deduce
that the brain must have a description of basic values and basic rules of logic
installed genetically. There are probably some theories about fact (such as the
world being three-dimensional) that are genetically installed also. Natural
selection is the source of this information. (Compare Paley’s example of the
watch on the ground.) Natural Selection is like a successful Maxwell’s Demon.
7. The Creationist appeal to the Second Law is as incorrect
in its information-theory analogy as it is in its thermodynamic meaning.
8. Neither core values nor basic logic can be an “emergent”
property of culture. Information originates in the transmitters, not the
communication channels. But an interesting complication arises from the idea
that elements of culture called memes can spread and evolve like living things
inhabiting an environment of human brains.
Further Reading:
Peter W Atkins (1984) “The Second Law” Scientific American
Michael Ruse (1986) “Taking Darwin Seriously” Blackwell
Daniel Dennet (1995) “Darwin’s Dangerous Idea” Simon &
Schuster
Matt Ridley (1996) “The Origins of Virtue” Viking
Alan Lightman (2000) “Great Ideas in Physics” McGraw-Hill
Donald Cameron (2001) “The Purpose
of Life” Woodhill
Matt Ridley (2003) “Nature via Nurture” Fourth Estate Home