[This Article appeared
in the American Scientist (Nov-Dec
1990), Volume 78, 550-558. Retyped and posted with permission.]
The Science of Scientific Writing
If the reader is to grasp what the writer means,
the writer must understand what the reader needs
George D. Gopen and Judith A. Swan*
*George D. Gopen
is associate professor of English and Director of Writing Programs at
Duke University. He holds a Ph.D. in English from Harvard University
and a J.D. from Harvard Law School. Judith A. Swan teaches
scientific writing at Princeton University. Her Ph.D., which is in biochemistry,
was earned at the Massachusetts Institute of Technology. Address for
Gopen: 307 Allen Building, Duke University, Durham, NC 27706
S
cience is often hard to read. Most people assume that its difficulties
are born out of necessity, out of the extreme complexity of scientific
concepts, data and analysis. We argue here that complexity of thought
need not lead to impenetrability of expression; we demonstrate a number
of rhetorical principles that can produce clarity in communication without
oversimplifying scientific issues. The results are substantive, not merely
cosmetic: Improving the quality of writing actually improves the quality
of thought.
The fundamental purpose of scientific discourse is not
the mere presentation of information and thought, but rather its actual
communication. It does not matter how pleased an author might be to
have converted all the right data into sentences and paragraphs; it
matters only whether a large majority of the reading audience accurately
perceives what the author had in mind. Therefore, in order to understand
how best to improve writing, we would do well to understand better how
readers go about reading. Such an understanding has recently become
available through work done in the fields of rhetoric, linguistics and
cognitive psychology. It has helped to produce a methodology based on
the concept of reader expectations.
Writing with the Reader in Mind: Expectation and Context
Readers do not simply read; they interpret. Any piece
of prose, no matter how short, may "mean" in 10 (or more)
different ways to 10 different readers. This methodology of reader expectations
is founded on the recognition that readers make many of their most important
interpretive decisions about the substance of prose based on clues they
receive from its structure.
This interplay between substance and structure can be
demonstrated by something as basic as a simple table. Let us say that
in tracking the temperature of a liquid over a period of time, an investigator
takes measurements every three minutes and records a list of temperatures.
Those data could be presented by a number of written structures. Here
are two possibilities:
t(time)=15', T(temperature)=32º, t=0', T=25º;
t=6', T=29º; t=3', T=27º; t=12', T=32º; t=9';
T=31º
time (min) temperature(ºC)
0 25
3 27
6 29
9 31
12 32
15 32
Precisely the same information appears in both formats,
yet most readers find the second easier to interpret. It may be that
the very familiarity of the tabular structure makes it easier to use.
But, more significantly, the structure of the second table provides
the reader with an easily perceived context (time) in which the significant
piece of information (temperature) can be interpreted. The contextual
material appears on the left in a pattern that produces an expectation
of regularity; the interesting results appear on the right in a less
obvious pattern, the discovery of which is the point of the table.
If the two sides of this simple table are reversed,
it becomes much harder to read.
temperature(ºC) time(min)
25 0
27 3
29 6
31 9
32 12
32 15
Since we read from left to right, we prefer the context
on the left, where it can more effectively familiarize the reader. We
prefer the new, important information on the right, since its job is
to intrigue the reader.
Information is interpreted more easily and more uniformly
if it is placed where most readers expect to find it. These needs and
expectations of readers affect the interpretation not only of tables
and illustrations but also of prose itself. Readers have relatively
fixed expectations about where in the structure of prose they will encounter
particular items of its substance. If writers can become consciously
aware of these locations, they can better control the degrees of recognition
and emphasis a reader will give to the various pieces of information
being presented. Good writers are intuitively aware of these expectations;
that is why their prose has what we call "shape."
This underlying concept of reader expectation is perhaps
most immediately evident at the level of the largest units of discourse.
(A unit of discourse is defined as anything with a beginning and an
end: a clause, a sentence, a section, an article, etc.) A research article,
for example, is generally divided into recognizable sections, sometimes
labeled Introduction, Experimental Methods, Results and Discussion.
When the sections are confused--when too much experimental detail is
found in the Results section, or when discussion and results intermingle--readers
are often equally confused. In smaller units of discourse the functional
divisions are not so explicitly labeled, but readers have definite expectations
all the same, and they search for certain information in particular
places. If these structural expectations are continually violated, readers
are forced to divert energy from understanding the content of a passage
to unraveling its structure. As the complexity of the context increases
moderately, the possibility of misinterpretation or noninterpretation
increases dramatically.
We present here some results of applying this methodology
to research reports in the scientific literature. We have taken several
passages from research articles (either published or accepted for publication)
and have suggested ways of rewriting them by applying principles derived
from the study of reader expectations. We have not sought to transform
the passages into "plain English" for the use of the general public;
we have neither decreased the jargon nor diluted the science. We have
striven not for simplification but for clarification.
Reader Expectations for the Structure of Prose
Here is our first example of scientific prose, in its
original form:
The smallest of the URF's (URFA6L), a 207-nucleotide
(nt) reading frame overlapping out of phase the NH2-terminal
portion of the adenosinetriphosphatase (ATPase) subunit 6 gene has been
identified as the animal equivalent of the recently discovered yeast
H+-ATPase subunit 8 gene. The functional significance of
the other URF's has been, on the contrary, elusive. Recently, however,
immunoprecipitation experiments with antibodies to purified, rotenone-sensitive
NADH-ubiquinone oxido-reductase [hereafter referred to as respiratory
chain NADH dehydrogenase or complex I] from bovine heart, as well as
enzyme fractionation studies, have indicated that six human URF's (that
is, URF1, URF2, URF3, URF4, URF4L, and URF5, hereafter referred to as
ND1, ND2, ND3, ND4, ND4L, and ND5) encode subunits of complex I. This
is a large complex that also contains many subunits synthesized in the
cytoplasm.*
[*The full paragraph includes one more sentence: "Support
for such functional identification of the URF products has come from
the finding that the purified rotenone-sensitive NADH dehydrogenase
from Neurospora crassa contains several subunits synthesized
within the mitochondria, and from the observation that the stopper mutant
of Neurospora crassa, whose mtDNA lacks two genes homologous
to URF2 and URF3, has no functional complex I." We have omitted this
sentence both because the passage is long enough as is and because it
raises no additional structural issues.]
Ask any ten people why this paragraph is hard to read,
and nine are sure to mention the technical vocabulary; several will
also suggest that it requires specialized background knowledge. Those
problems turn out to be only a small part of the difficulty. Here is
the passage again, with the difficult words temporarily lifted:
The smallest of the URF's, and [A], has been identified
as a [B] subunit 8 gene. The functional significance of the other URF's
has been, on the contrary, elusive. Recently, however, [C] experiments,
as well as [D] studies, have indicated that six human URF's [1-6] encode
subunits of Complex I. This is a large complex that also contains many
subunits synthesized in the cytoplasm.
It may now be easier to survive the journey through
the prose, but the passage is still difficult. Any number of questions
present themselves: What has the first sentence of the passage to do
with the last sentence? Does the third sentence contradict what we have
been told in the second sentence? Is the functional significance of
URF's still "elusive"? Will this passage lead us to further discussion
about URF's, or about Complex I, or both?
Information is interpreted more easily and more
uniformly if it is placed where most readers expect to find it.
Knowing a little about the subject matter does not
clear up all the confusion. The intended audience of this passage would
probably possess at least two items of essential technical information:
first, "URF" stands for "Uninterrupted Reading Frame," which describes
a segment of DNA organized in such a way that it could encode a protein,
although no such protein product has yet been identified; second, both
APTase and NADH oxido-reductase are enzyme complexes central to energy
metabolism. Although this information may provide some sense of comfort,
it does little to answer the interpretive questions that need answering.
It seems the reader is hindered by more than just the scientific jargon.
To get at the problem, we need to articulate something
about how readers go about reading. We proceed to the first of several
reader expectations.
Subject-Verb Separation
Look again at the first sentence of the passage cited
above. It is relatively long, 42 words; but that turns out not to be
the main cause of its burdensome complexity. Long sentences need not
be difficult to read; they are only difficult to write. We have seen
sentences of over 100 words that flow easily and persuasively toward
their clearly demarcated destination. Those well-wrought serpents all
had something in common: Their structure presented information to readers
in the order the readers needed and expected it.
Beginning with the exciting material and ending
with a lack of luster often leaves us disappointed and destroys our sense
of momentum.
The first sentence of our example passage does just
the opposite: it burdens and obstructs the reader, because of an all-too-common
structural defect. Note that the grammatical subject ("the smallest")
is separated from its verb ("has been identified") by 23 words, more
than half the sentence. Readers expect a grammatical subject to be followed
immediately by the verb. Anything of length that intervenes between
subject and verb is read as an interruption, and therefore as something
of lesser importance.
The reader's expectation stems from a pressing need
for syntactic resolution, fulfilled only by the arrival of the verb.
Without the verb, we do not know what the subject is doing, or what
the sentence is all about. As a result, the reader focuses attention
on the arrival of the verb and resists recognizing anything in the interrupting
material as being of primary importance. The longer the interruption
lasts, the more likely it becomes that the "interruptive" material actually
contains important information; but its structural location will continue
to brand it as merely interruptive. Unfortunately, the reader will not
discover its true value until too late—until the sentence has ended
without having produced anything of much value outside of that subject-verb
interruption.
In this first sentence of the paragraph, the relative
importance of the intervening material is difficult to evaluate. The
material might conceivably be quite significant, in which case the writer
should have positioned it to reveal that importance. Here is one way
to incorporate it into the sentence structure:
The smallest of the URF's is URFA6L, a 207-nucleotide
(nt) reading frame overlapping out of phase the NH2-terminal
portion of the adenosinetriphosphatase (ATPase) subunit 6 gene; it has
been identified as the animal equivalent of the recently discovered
yeast H+-ATPase subunit 8 gene.
On the other hand, the intervening material might be
a mere aside that diverts attention from more important ideas; in that
case the writer should have deleted it, allowing the prose to drive
more directly toward its significant point:
The smallest of the URF's (URFA6L) has been identified
as the animal equivalent of the recently discovered yeast H+-ATPase
subunit 8 gene.
Only the author could tell us which of these revisions
more accurately reflects his intentions.
These revisions lead us to a second set of reader expectations.
Each unit of discourse, no matter what the size, is expected to serve
a single function, to make a single point. In the case of a sentence,
the point is expected to appear in a specific place reserved for emphasis.
The Stress Position
It is a linguistic commonplace that readers naturally
emphasize the material that arrives at the end of a sentence. We refer
to that location as a "stress position." If a writer is consciously
aware of this tendency, she can arrange for the emphatic information
to appear at the moment the reader is naturally exerting the greatest
reading emphasis. As a result, the chances greatly increase that reader
and writer will perceive the same material as being worthy of primary
emphasis. The very structure of the sentence thus helps persuade the
reader of the relative values of the sentence's contents.
The inclination to direct more energy to that which
arrives last in a sentence seems to correspond to the way we work at
tasks through time. We tend to take something like a "mental breath"
as we begin to read each new sentence, thereby summoning the tension
with which we pay attention to the unfolding of the syntax. As we recognize
that the sentence is drawing toward its conclusion, we begin to exhale
that mental breath. The exhalation produces a sense of emphasis. Moreover,
we delight in being rewarded at the end of a labor with something that
makes the ongoing effort worthwhile. Beginning with the exciting material
and ending with a lack of luster often leaves us disappointed and destroys
our sense of momentum. We do not start with the strawberry shortcake
and work our way up to the broccoli.
When the writer puts the emphatic material of a sentence
in any place other than the stress position, one of two things can happen;
both are bad. First, the reader might find the stress position occupied
by material that clearly is not worthy of emphasis. In this case, the
reader must discern, without any additional structural clue, what else
in the sentence may be the most likely candidate for emphasis. There
are no secondary structural indications to fall back upon. In sentences
that are long, dense or sophisticated, chances soar that the reader
will not interpret the prose precisely as the writer intended. The second
possibility is even worse: The reader may find the stress position occupied
by something that does appear capable of receiving emphasis, even though
the writer did not intend to give it any stress. In that case, the reader
is highly likely to emphasize this imposter material, and the writer
will have lost an important opportunity to influence the reader's interpretive
process.
The stress position can change in size from sentence
to sentence. Sometimes it consists of a single word; sometimes it extends
to several lines. The definitive factor is this: The stress position
coincides with the moment of syntactic closure. A reader has reached
the beginning of the stress position when she knows there is nothing
left in the clause or sentence but the material presently being read.
Thus a whole list, numbered and indented, can occupy the stress position
of a sentence if it has been clearly announced as being all that remains
of that sentence. Each member of that list, in turn, may have its own
internal stress position, since each member may produce its own syntactic
closure.
Within a sentence, secondary stress positions can be
formed by the appearance of a properly used colon or semicolon; by grammatical
convention, the material preceding these punctuation marks must be able
to stand by itself as a complete sentence. Thus, sentences can be extended
effortlessly to dozens of words, as long as there is a medial syntactic
closure for every piece of new, stress-worthy information along the
way. One of our revisions of the initial sentence can serve as an example:
The smallest of the URF's is URFA6L, a 207-nucleotide
(nt) reading frame overlapping out of phase the NH2-terminal
portion of the adenosinetriphosphatase (ATPase) subunit 6 gene; it has
been identified as the animal equivalent of the recently discovered
yeast H+-ATPase subunit 8 gene.
By using a semicolon, we created a second stress position
to accommodate a second piece of information that seemed to require
emphasis.
We now have three rhetorical principles based on reader
expectations: First, grammatical subjects should be followed as soon
as possible by their verbs; second, every unit of discourse, no matter
the size, should serve a single function or make a single point; and,
third, information intended to be emphasized should appear at points
of syntactic closure. Using these principles, we can begin to unravel
the problems of our example prose.
Note the subject-verb separation in the 62-word third
sentence of the original passage:
Recently, however, immunoprecipitation experiments
with antibodies to purified, rotenone-sensitive NADH-ubiquinone oxido-reductase
[hereafter referred to as respiratory chain NADH dehydrogenase or complex
I] from bovine heart, as well as enzyme fractionation studies, have
indicated that six human URF's (that is, URF1, URF2, URF3, URF4, URF4L,
and URF5, hereafter referred to as ND1, ND2, ND3, ND4, ND4L and ND5)
encode subunits of complex I.
After encountering the subject ("experiments"), the
reader must wade through 27 words (including three hyphenated compound
words, a parenthetical interruption and an "as well as" phrase) before
alighting on the highly uninformative and disappointingly anticlimactic
verb ("have indicated"). Without a moment to recover, the reader is
handed a "that" clause in which the new subject ("six human URF's")
is separated from its verb ("encode") by yet another 20 words.
If we applied the three principles we have developed
to the rest of the sentences of the example, we could generate a great
many revised versions of each. These revisions might differ significantly
from one another in the way their structures indicate to the reader
the various weights and balances to be given to the information. Had
the author placed all stress-worthy material in stress positions, we
as a reading community would have been far more likely to interpret
these sentences uniformly.
We couch this discussion in terms of "likelihood" because
we believe that meaning is not inherent in discourse by itself; "meaning"
requires the combined participation of text and reader. All sentences
are infinitely interpretable, given an infinite number of interpreters.
As communities of readers, however, we tend to work out tacit agreements
as to what kinds of meaning are most likely to be extracted from certain
articulations. We cannot succeed in making even a single sentence mean
one and only one thing; we can only increase the odds that a large majority
of readers will tend to interpret our discourse according to our intentions.
Such success will follow from authors becoming more consciously aware
of the various reader expectations presented here.
We cannot succeed in making even a single sentence
mean one and only one thing; we can only increase the odds that a large
majority of readers will tend to interpret our discourse according to
our intentions.
Here is one set of revisionary decisions we made for
the example:
The smallest of the URF's, URFA6L, has been identified
as the animal equivalent of the recently discovered yeast H+-ATPase
subunit 8 gene; but the functional significance of other URF's has been
more elusive. Recently, however, several human URF's have been shown
to encode subunits of rotenone-sensitive NADH-ubiquinone oxido-reductase.
This is a large complex that also contains many subunits synthesized
in the cytoplasm; it will be referred to hereafter as respiratory chain
NADH dehydrogenase or complex I. Six subunits of Complex I were shown
by enzyme fractionation studies and immunoprecipitation experiments
to be encoded by six human URF's (URF1, URF2, URF3, URF4, URF4L, and
URF5); these URF's will be referred to subsequently as ND1, ND2, ND3,
ND4, ND4L and ND5.
Sheer length was neither the problem nor the solution.
The revised version is not noticeably shorter than the original; nevertheless,
it is significantly easier to interpret. We have indeed deleted certain
words, but not on the basis of wordiness or excess length. (See especially
the last sentence of our revision.)
When is a sentence too long? The creators of readability
formulas would have us believe there exists some fixed number of words
(the favorite is 29) past which a sentence is too hard to read. We disagree.
We have seen 10-word sentences that are virtually impenetrable and,
as we mentioned above, 100-word sentences that flow effortlessly to
their points of resolution. In place of the word-limit concept, we offer
the following definition: A sentence is too long when it has more viable
candidates for stress positions than there are stress positions available.
Without the stress position's locational clue that its material is intended
to be emphasized, readers are left too much to their own devices in
deciding just what else in a sentence might be considered important.
In revising the example passage, we made certain decisions
about what to omit and what to emphasize. We put subjects and verbs
together to lessen the reader's syntactic burdens; we put the material
we believed worthy of emphasis in stress positions; and we discarded
material for which we could not discern significant connections. In
doing so, we have produced a clearer passage--but not one that necessarily
reflects the author's intentions; it reflects only our interpretation
of the author's intentions. The more problematic the structure, the
less likely it becomes that a grand majority of readers will perceive
the discourse in exactly the way the author intended.
The information that begins a sentence establishes
for the reader a perspective for viewing the sentence as a unit.
It is probable that many of our readers--and perhaps
even the authors--will disagree with some of our choices. If so, that
disagreement underscores our point: The original failed to communicate
its ideas and their connections clearly. If we happened to have interpreted
the passage as you did, then we can make a different point: No one should
have to work as hard as we did to unearth the content of a single passage
of this length.
The Topic Position
To summarize the principles connected with the stress
position, we have the proverbial wisdom, "Save the best for last." To
summarize the principles connected with the other end of the sentence,
which we will call the topic position, we have its proverbial contradiction,
"First things first." In the stress position the reader needs and expects
closure and fulfillment; in the topic position the reader needs and
expects perspective and context. With so much of reading comprehension
affected by what shows up in the topic position, it behooves a writer
to control what appears at the beginning of sentences with great care.
The information that begins a sentence establishes
for the reader a perspective for viewing the sentence as a unit: Readers
expect a unit of discourse to be a story about whoever shows up first.
"Bees disperse pollen" and "Pollen is dispersed by bees" are two different
but equally respectable sentences about the same facts. The first tells
us something about bees; the second tells us something about pollen.
The passivity of the second sentence does not by itself impair its quality;
in fact, "Pollen is dispersed by bees" is the superior sentence if it
appears in a paragraph that intends to tell us a continuing story about
pollen. Pollen's story at that moment is a passive one.
Readers also expect the material occupying the topic
position to provide them with linkage (looking backward) and context
(looking forward). The information in the topic position prepares the
reader for upcoming material by connecting it backward to the previous
discussion. Although linkage and context can derive from several sources,
they stem primarily from material that the reader has already encountered
within this particular piece of discourse. We refer to this familiar,
previously introduced material as "old information." Conversely, material
making its first appearance in a discourse is "new information." When
new information is important enough to receive emphasis, it functions
best in the stress position.
When old information consistently arrives in the topic
position, it helps readers to construct the logical flow of the argument:
It focuses attention on one particular strand of the discussion, both
harkening backward and leaning forward. In contrast, if the topic position
is constantly occupied by material that fails to establish linkage and
context, readers will have difficulty perceiving both the connection
to the previous sentence and the projected role of the new sentence
in the development of the paragraph as a whole.
Here is a second example of scientific prose that we
shall attempt to improve in subsequent discussion:
Large earthquakes along a given fault segment do
not occur at random intervals because it takes time to accumulate the
strain energy for the rupture. The rates at which tectonic plates move
and accumulate strain at their boundaries are approximately uniform.
Therefore, in first approximation, one may expect that large ruptures
of the same fault segment will occur at approximately constant time
intervals. If subsequent main shocks have different amounts of slip
across the fault, then the recurrence time may vary, and the basic idea
of periodic mainshocks must be modified. For great plate boundary ruptures
the length and slip often vary by a factor of 2. Along the southern
segment of the San Andreas fault the recurrence interval is 145 years
with variations of several decades. The smaller the standard deviation
of the average recurrence interval, the more specific could be the long
term prediction of a future mainshock.
This is the kind of passage that in subtle ways can
make readers feel badly about themselves. The individual sentences give
the impression of being intelligently fashioned: They are not especially
long or convoluted; their vocabulary is appropriately professional but
not beyond the ken of educated general readers; and they are free of
grammatical and dictional errors. On first reading, however, many of
us arrive at the paragraph's end without a clear sense of where we have
been or where we are going. When that happens, we tend to berate ourselves
for not having paid close enough attention. In reality, the fault lies
not with us, but with the author.
We can distill the problem by looking closely at the
information in each sentence's topic position:
Large earthquakes
The rates
Therefore...one
subsequent mainshocks
great plate boundary ruptures
the southern segment of the San Andreas fault
the smaller the standard deviation...
Much of this information is making its first appearance
in this paragraph--in precisely the spot where the reader looks for
old, familiar information. As a result, the focus of the story constantly
shifts. Given just the material in the topic positions, no two readers
would be likely to construct exactly the same story for the paragraph
as a whole.
If we try to piece together the relationship of each
sentence to its neighbors, we notice that certain bits of old information
keep reappearing. We hear a good deal about the recurrence time between
earthquakes: The first sentence introduces the concept of nonrandom
intervals between earthquakes; the second sentence tells us that recurrence
rates due to the movement of tectonic plates are more or less uniform;
the third sentence adds that the recurrence rates of major earthquakes
should also be somewhat predictable; the fourth sentence adds that recurrence
rates vary with some conditions; the fifth sentence adds information
about one particular variation; the sixth sentence adds a recurrence-rate
example from California; and the last sentence tells us something about
how recurrence rates can be described statistically. This refrain of
"recurrence intervals" constitutes the major string of old information
in the paragraph. Unfortunately, it rarely appears at the beginning
of sentences, where it would help us maintain our focus on its continuing
story.
In reading, as in most experiences, we appreciate the
opportunity to become familiar with a new environment before having
to function in it. Writing that continually begins sentences with new
information and ends with old information forbids both the sense of
comfort and orientation at the start and the sense of fulfilling arrival
at the end. It misleads the reader as to whose story is being told;
it burdens the reader with new information that must be carried further
into the sentence before it can be connected to the discussion; and
it creates ambiguity as to which material the writer intended the reader
to emphasize. All of these distractions require that readers expend
a disproportionate amount of energy to unravel the structure of the
prose, leaving less energy available for perceiving content.
We can begin to revise the example by ensuring the
following for each sentence:
- The backward-linking old information appears in the
topic position.
- The person, thing or concept whose story it is appears
in the topic position.
- The new, emphasis-worthy information appears in the
stress position.
Once again, if our decisions concerning the relative
values of specific information differ from yours, we can all blame the
author, who failed to make his intentions apparent. Here first is a
list of what we perceived to be the new, emphatic material in each sentence:
time to accumulate strain energy along a fault
approximately uniform
large ruptures of the same fault
different amounts of slip
vary by a factor of 2
variations of several decades
predictions of future mainshock
Now, based on these assumptions about what deserves
stress, here is our proposed revision:
Large earthquakes along a given fault segment do
not occur at random intervals because it takes time to accumulate the
strain energy for the rupture. The rates at which tectonic plates move
and accumulate strain at their boundaries are roughly uniform. Therefore,
nearly constant time intervals (at first approximation) would be expected
between large ruptures of the same fault segment. [However?], the recurrence
time may vary; the basic idea of periodic mainshocks may need to be
modified if subsequent mainshocks have different amounts of slip across
the fault. [Indeed?], the length and slip of great plate boundary ruptures
often vary by a factor of 2. [For example?], the recurrence intervals
along the southern segment of the San Andreas fault is 145 years with
variations of several decades. The smaller the standard deviation of
the average recurrence interval, the more specific could be the long
term prediction of a future mainshock.
Many problems that had existed in the original have
now surfaced for the first time. Is the reason earthquakes do not occur
at random intervals stated in the first sentence or in the second? Are
the suggested choices of "however," "indeed," and "for example" the
right ones to express the connections at those points? (All these connections
were left unarticulated in the original paragraph.) If "for example"
is an inaccurate transitional phrase, then exactly how does the San
Andreas fault example connect to ruptures that "vary by a factor of
2"? Is the author arguing that recurrence rates must vary because fault
movements often vary? Or is the author preparing us for a discussion
of how in spite of such variance we might still be able to predict earthquakes?
This last question remains unanswered because the final sentence leaves
behind earthquakes that recur at variable intervals and switches instead
to earthquakes that recur regularly. Given that this is the first paragraph
of the article, which type of earthquake will the article most likely
proceed to discuss? In sum, we are now aware of how much the paragraph
had not communicated to us on first reading. We can see that most of
our difficulty was owing not to any deficiency in our reading skills
but rather to the author's lack of comprehension of our structural needs
as readers.
In our experience, the misplacement of old and
new information turns out to be he No. 1 problem in American professional
writing today.
In our experience, the misplacement of old and new
information turns out to be the No. 1 problem in American professional
writing today. The source of the problem is not hard to discover: Most
writers produce prose linearly (from left to right) and through time.
As they begin to formulate a sentence, often their primary anxiety is
to capture the important new thought before it escapes. Quite naturally
they rush to record that new information on paper, after which they
can produce at their leisure contextualizing material that links back
to the previous discourse. Writers who do this consistently are attending
more to their own need for unburdening themselves of their information
than to the reader's need for receiving the material. The methodology
of reader expectations articulates the reader's needs explicitly, thereby
making writers consciously aware of structural problems and ways to
solve them.
Put in the topic position the old information
that links backward; put in the stress position the new information you
want the reader to emphasize.
A note of clarification: Many people hearing this structural
advice tend to oversimplify it to the following rule: "Put the old information
in the topic position and the new information in the stress position."
No such rule is possible. Since by definition all information is either
old or new, the space between the topic position and the stress position
must also be filled with old and new information. Therefore the principle
(not rule) should be stated as follows: "Put in the topic position the
old information that links backward; put in the stress position the
new information you want the reader to emphasize."
Perceiving Logical Gaps
When old information does not appear at all in a sentence,
whether in the topic position or elsewhere, readers are left to construct
the logical linkage by themselves. Often this happens when the connections
are so clear in the writer's mind that they seem unnecessary to state;
at those moments, writers underestimate the difficulties and ambiguities
inherent in the reading process. Our third example attempts to demonstrate
how paying attention to the placement of old and new information can
reveal where a writer has neglected to articulate essential connections.
The enthalpy of hydrogen bond formation between the
nucleoside bases 2'deoxyguanosine (dG) and 2'deoxycytidine (dC) has
been determined by direct measurement. dG and dC were derivatized at
the 5' and 3' hydroxyls with triisopropylsilyl groups to obtain solubility
of the nucleosides in non-aqueous solvents and to prevent the ribose
hydroxyls from forming hydrogen bonds. From isoperibolic titration measurements,
the enthalpy of dC:dG base pair formation is -6.65±0.32 kcal/mol.
Although part of the difficulty of reading this passage
may stem from its abundance of specialized technical terms, a great
deal more of the difficulty can be attributed to its structural problems.
These problems are now familiar: We are not sure at all times whose
story is being told; in the first sentence the subject and verb are
widely separated; the second sentence has only one stress position but
two or three pieces of information that are probably worthy of emphasis--"solubility
...solvents," "prevent... from forming hydrogen bonds" and perhaps "triisopropylsilyl
groups." These perceptions suggest the following revision tactics:
- Invert the first sentence, so that (a) the subject-verb-complement
connection is unbroken, and (b) "dG" and "dC" are introduced in the
stress position as new and interesting information. (Note that inverting
the sentence requires stating who made the measurement; since the
authors performed the first direct measurement, recognizing their
agency in the topic position may well be appropriate.)
- Since "dG and "dC" become the old information in
the second sentence, keep them up front in the topic position.
- Since "triisopropylsilyl groups" is new and important
information here, create for it a stress position.
- "Triisopropylsilyl groups" then becomes the old information
of the clause in which its effects are described; place it in the
topic position of this clause.
- Alert the reader to expect the arrival of two distinct
effects by using the flag word "both." "Both" notifies the reader
that two pieces of new information will arrive in a single stress
position.
Here is a partial revision based on these decisions:
We have directly measured the enthalpy of hydrogen
bond formation between the nucleoside bases 2'deoxyguanosine (dG) and
2'deoxycytidine (dC). dG and dC were derivatized at the 5' and 3' hydroxyls
with triisopropylsilyl groups; these groups serve both to solubilize
the nucleosides in non-aqueous solvents and to prevent the ribose hydroxyls
from forming hydrogen bonds. From isoperibolic titration measurements,
the enthalpy of dC:dG base pair formation is -6.65±0.32 kcal/mol.
The outlines of the experiment are now becoming visible,
but there is still a major logical gap. After reading the second sentence,
we expect to hear more about the two effects that were important enough
to merit placement in its stress position. Our expectations are frustrated,
however, when those effects are not mentioned in the next sentence:
"From isoperibolic titration measurements, the enthalpy of dC:dG base
pair formation is -6.65±0.32 kcal/mol." The authors have neglected to
explain the relationship between the derivatization they performed (in
the second sentence) and the measurements they made (in the third sentence).
Ironically, that is the point they most wished to make here.
At this juncture, particularly astute readers who are
chemists might draw upon their specialized knowledge, silently supplying
the missing connection. Other readers are left in the dark. Here is
one version of what we think the authors meant to say, with two additional
sentences supplied from a knowledge of nucleic acid chemistry:
We have directly measured the enthalpy of hydrogen
bond formation between the nucleoside bases 2'deoxyguanosine (dG) and
2'deoxycytidine (dC). dG and dC were derivatized at the 5' and 3' hydroxyls
with triisopropylsiyl groups; these groups serve both to solubilize
the nucleosides in non-aqueous solvents and to prevent the ribose hydroxyls
from forming hydrogen bonds. Consequently, when the derivatized nucleosides
are dissolved in non-aqueous solvents, hydrogen bonds form almost exclusively
between the bases. Since the interbase hydrogen bonds are the only bonds
to form upon mixing, their enthalpy of formation can be determined directly
by measuring the enthalpy of mixing. From our isoperibolic titration
measurements, the enthalpy of dG:dC base pair formation is -6.65±0.32
kcal/mol.
Each sentence now proceeds logically from its predecessor.
We never have to wander too far into a sentence without being told where
we are and what former strands of discourse are being continued. And
the "measurements" of the last sentence has now become old information,
reaching back to the "measured directly" of the preceding sentence.
(It also fulfills the promise of the "we have directly measured" with
which the paragraph began.) By following our knowledge of reader expectations,
we have been able to spot discontinuities, to suggest strategies for
bridging gaps, and to rearrange the structure of the prose, thereby
increasing the accessibility of the scientific content.
Locating the Action
Our final example adds another major reader expectation
to the list.
Transcription of the 5S RNA genes in the egg
extract is TFIIIA-dependent. This is surprising, because the concentration
of TFIIIA is the same as in the oocyte nuclear extract. The other transcription
factors and RNA polymerase III are presumed to be in excess over available
TFIIIA, because tRNA genes are transcribed in the egg extract. The addition
of egg extract to the oocyte nuclear extract has two effects on transcription
efficiency. First, there is a general inhibition of transcription that
can be alleviated in part by supplementation with high concentrations
of RNA polymerase III. Second, egg extract destabilizes transcription
complexes formed with oocyte but not somatic 5S RNA genes.
The barriers to comprehension in this passage are so
many that it may appear difficult to know where to start revising. Fortunately,
it does not matter where we start, since attending to any one structural
problem eventually leads us to all the others.
We can spot one source of difficulty by looking at
the topic positions of the sentences: We cannot tell whose story the
passage is. The story's focus (that is, the occupant of the topic position)
changes in every sentence. If we search for repeated old information
in hope of settling on a good candidate for several of the topic positions,
we find all too much of it: egg extract, TFIIIA, oocyte extract, RNA
polymerase III, 5S RNA, and transcription. All of these reappear
at various points, but none announces itself clearly as our primary
focus. It appears that the passage is trying to tell several stories
simultaneously, allowing none to dominate.
We are unable to decide among these stories because
the author has not told us what to do with all this information. We
know who the players are, but we are ignorant of the actions they are
presumed to perform. This violates yet another important reader expectation:
Readers expect the action of a sentence to be articulated by the verb.
Here is a list of the verbs in the example paragraph:
is
is...is
are presumed to be
are transcribed
has
is...can be alleviated
destabilizes
The list gives us too few clues as to what actions
actually take place in the passage. If the actions are not to be found
in the verbs, then we as readers have no secondary structural clues
for where to locate them. Each of us has to make a personal interpretive
guess; the writer no longer controls the reader's interpretive act.
As critical scientific readers, we would like
to concentrate our energy on whether the experiments prove the hypotheses.
Worse still, in this passage the important actions
never appear. Based on our best understanding of this material, the
verbs that connect these players are "limit" and "inhibit." If we express
those actions as verbs and place the most frequently occurring information--"egg
extract" and "TFIIIA"--in the topic position whenever possible,* we
can generate the following revision:
In the egg extract, the availability of TFIIIA limits
transcription of the 5S RNA genes. This is surprising because
the same concentration of TFIIIA does not limit transcription in the
oocyte nuclear extract. In the egg extract, transcription is not limited
by RNA polymerase or other factors because transcription of tRNA genes
indicates that these factors are in excess over available TFIIIA. When
added to the nuclear extract, the egg extract affected the efficiency
of transcription in two ways. First, it inhibited transcription generally;
this inhibition could be alleviated in part by supplementing the mixture
with high concentrations of RNA polymerase III. Second, the egg extract
destabilized transcription complexes formed by oocyte but not by somatic
5S genes.
[*We have chosen these two pieces of old information
as the controlling contexts for the passage. That choice was neither
arbitrary nor born of logical necessity; it was simply an act of interpretation.
All readers make exactly that kind of choice in the reading of every
sentence. The fewer the structural clues to interpretation given by
the author, the more variable the resulting interpretations will tend
to be.]
As a story about "egg extract," this passage still
leaves something to be desired. But at least now we can recognize that
the author has not explained the connection between "limit" and "inhibit."
This unarticulated connection seems to us to contain both of her hypotheses:
First, that the limitation on transcription is caused by an inhibitor
of TFIIIA present in the egg extract; and, second, that the action of
that inhibitor can be detected by adding the egg extract to the oocyte
extract and examining the effects on transcription. As critical scientific
readers, we would like to concentrate our energy on whether the experiments
prove the hypotheses. We cannot begin to do so if we are left in doubt
as to what those hypotheses might be--and if we are using most of our
energy to discern the structure of the prose rather than its substance.
Writing and the Scientific Process
We began this article by arguing that complex thoughts
expressed in impenetrable prose can be rendered accessible and clear
without minimizing any of their complexity. Our examples of scientific
writing have ranged from the merely cloudy to the virtually opaque;
yet all of them could be made significantly more comprehensible by observing
the following structural principles:
- Follow a grammatical subject as soon as possible
with its verb.
- Place in the stress position the "new information"
you want the reader to emphasize.
- Place the person or thing whose "story" a sentence
is telling at the beginning of the sentence, in the topic position.
- Place appropriate "old information" (material already
stated in the discourse) in the topic position for linkage backward
and contextualization forward.
- Articulate the action of every clause or sentence
in its verb.
- In general, provide context for your reader before
asking that reader to consider anything new.
- In general, try to ensure that the relative emphases
of the substance coincide with the relative expectations for emphasis
raised by the structure.
It may seem obvious that a scientific document
is incomplete without the interpretation of the writer; it may not be
so obvious that the document cannot "exist" without the interpretation
of each reader.
None of these reader-expectation principles should
be considered "rules." Slavish adherence to them will succeed no better
than has slavish adherence to avoiding split infinitives or to using
the active voice instead of the passive. There can be no fixed algorithm
for good writing, for two reasons. First, too many reader expectations
are functioning at any given moment for structural decisions to remain
clear and easily activated. Second, any reader expectation can be violated
to good effect. Our best stylists turn out to be our most skillful violators;
but in order to carry this off, they must fulfill expectations most
of the time, causing the violations to be perceived as exceptional moments,
worthy of note.
A writer's personal style is the sum of all the structural
choices that person tends to make when facing the challenges of creating
discourse. Writers who fail to put new information in the stress position
of many sentences in one document are likely to repeat that unhelpful
structural pattern in all other documents. But for the very reason that
writers tend to be consistent in making such choices, they can learn
to improve their writing style; they can permanently reverse those habitual
structural decisions that mislead or burden readers.
We have argued that the substance of thought and the
expression of thought are so inextricably intertwined that changes in
either will affect the quality of the other. Note that only the first
of our examples (the paragraph about URF's) could be revised on the
basis of the methodology to reveal a nearly finished passage. In all
the other examples, revision revealed existing conceptual gaps and other
problems that had been submerged in the originals by dysfunctional structures.
Filling the gaps required the addition of extra material. In revising
each of these examples, we arrived at a point where we could proceed
no further without either supplying connections between ideas or eliminating
some existing material altogether. (Writers who use reader-expectation
principles on their own prose will not have to conjecture or infer;
they know what the prose is intended to convey.) Having begun by analyzing
the structure of the prose, we were led eventually to reinvestigate
the substance of the science.
The substance of science comprises more than the discovery
and recording of data; it extends crucially to include the act of interpretation.
It may seem obvious that a scientific document is incomplete without
the interpretation of the writer; it may not be so obvious that the
document cannot "exist" without the interpretation of each reader. In
other words, writers cannot "merely" record data, even if they try.
In any recording or articulation, no matter how haphazard or confused,
each word resides in one or more distinct structural locations. The
resulting structure, even more than the meanings of individual words,
significantly influences the reader during the act of interpretation.
The question then becomes whether the structure created by the writer
(intentionally or not) helps or hinders the reader in the process of
interpreting the scientific writing.
The writing principles we have suggested here make
conscious for the writer some of the interpretive clues readers derive
from structures. Armed with this awareness, the writer can achieve far
greater control (although never complete control) of the reader's interpretive
process. As a concomitant function, the principles simultaneously offer
the writer a fresh re-entry to the thought process that produced the
science. In real and important ways, the structure of the prose becomes
the structure of the scientific argument. Improving either one will
improve the other.
The methodology described in this article originated in the linguistic
work of Joseph M. Williams of the University of Chicago, Gregory G.
Colomb of the Georgia Institute of Technology and George D. Gopen. Some
of the materials presented here were discussed and developed in faculty
writing workshops held at the Duke University Medical School.
Bibliography
Williams, Joseph M. 1988. Style: Ten Lessons in Clarity
and Grace. Scott, Foresman, & Co.
Colomb, Gregory G., and Joseph M. Williams. 1985. Perceiving structure
in professional prose: a multiply determined experience. In Writing
in Non-Academic Settings, eds. Lee Odell and Dixie Goswami. Guilford
Press, pp. 87-128.
Gopen, George D. 1987. Let the buyer in ordinary course of business
beware: suggestions for revising the language of the Uniform Commercial
Code. University of Chicago Law Review 54:1178-1214.
Gopen, George D. 1990. The Common Sense of Writing: Teaching Writing
from the Reader's Perspective. To be published.
Copyright | Privacy
Statement | Disclaimer
© 2002 American Statistical Association. All Rights Reserved.
Upgrading to the latest version of browser software Microsoft,
Netscape, or Opera
may enhance your online experience. |