Assessing Student Understanding and Learning in Constructivist Study Environments
Presented at the 1994 National Convention of
the Association for Educational Communications and Technology
John B. Black, Robert McClintock, Clifford Hill
For the last
couple of years, Teachers College, Columbia University and the Dalton School
( an independent school in New York City) have collaborated on the Dalton Technology
Project. This project aims to use networked multimedia workstations to produce
an environment that supports student studying in groups using authentic materials
and contexts. This approach to education constrasts sharply with the usual approach
which has students working individually to passively receive knowledge from
teachers and textbooks using artificial problems. The project shares many features
with the developing constructivist approaches to instructional design (e.g.,
Jonassen, 1991; Bednar, Cunningham, Duffy and Perry, 1991; Collins, Brown and
Newman, 1990; Cognition and Technology Group at Vanderbilt, 1990; Spiro, Feltovich,
Jacobson and Coulson, 1991), but it differs from them in emphasizing design
for study as opposed to design for instruction. Thus, we strive to create "
a place for study in a world of instruction" (McClintock, 1971).
Seven Principles of
In addition to developing the particular study systems for different subject
areas in the Dalton Technology Project, we have been trying to specify what
the underlying design principles are for such an approach. In doing this we
draw inspiration both from Cognitive Science (e.g., frown, Collins and Duguid,
1989) and from hermeneutic interpretation theory (e.g., Palmer, 1969). From
this effort, we have come up with the following seven study system design
1. Text: Present students with particular cultural objects
(events, writings, images, artifacts, scores, observations, experiments, etc.),
the origin and meaning of which will confront them as obscure, a challenge to
2. Context: Provide students with open-ended access to
contextual materials ; that may help to clarify and interpret the cultural objects
presented to them and provide pathways leading from the particular object to
the comprehensive assemblage of pertinent materials. On the one hand, the
context must be immediate, and on the other hand it should include
3. Engagement: Situate the presentation of the text and
context--both the challenging cultural objects and their contextualizing resources
--in such a way that students will grasp strong ownership of the on-going ;
effort to interpret the material.
4. Cooperation: Have students collaborate in their quest
for interpretative understanding, learning to empathize with the interpretative
actions of their peers.
5. Inclusivity: Use cognitive apprenticeship to show students
how to enlarge the scope and power of the contextual materials they bring to
bear on interpreting the text, moving the interpretation toward that ideal condition
in which ail significant contextualizing materials have been taken into account.
6. Abstraction: Encourage students to bring significant
contexts to bear upon multiple, different cultural objects to prepare them to
transfer their interpretative skills to novel problems.
9. Diversity: Encourage students to situate complex cultural
objects in many different significant contexts to prepare them to develop the
cognitive flexibility of understanding things from many points of view.
An example program will serve to illustrate these principles,
then we will discuss how to assess student understanding and learning in these
kinds of study environments. In the Archaeotype program, students study
ancient Greek and Roman history by using observations of simulated archaeological
digs to construct interpretations of the history of these sites, while drawing
upon a wide variety of background information. The Archaeotype program
(implemented in Supercard on Macintosh computers), which is the earliest and
most fully-developed of the Dalton Technology Project programs, presents the
students with a graphic simulation of an archaeological site, then the students
study the history of the site through simulated digging up of artifacts (the
text), making various measurements of the artifacts in a simulated laboratory,
and relating the objects to what is already known using a wide variety of reference
materials (the context). The students work cooperatively in groups,
while the teacher models how to deal with such a site then fades their involvement
while coaching and supporting the students in their own study efforts (inclusivity).
The students develop ownership of their work by developing their own interpretations
of the history of the site and mustering various kinds of evidence for their
conclusions (engagement). $y arguing with the other students and studying
related interpretations in the historical literature, they get a sense of other
perspectives (diversity). By going through the process a number of times
bringing each contextual background to bear on a number of different artifacts,
the students learn and understand the general principles behind what they are
Assessing Student Understanding and Learning
So, what might students get from an educational experience like
Archaeotype that they wouldn't get from a regular class, and what might
they get from a regular class that they wouldn't get in Archaeotype? In
a regular class on Greek and Roman history, the students would probably learn
more facts about history (because they are devoting all their time to learning
such facts) than the Archaeotype students would learn, but the Archaeotype
students would probably remember the facts they do learn longer and have
a greater understanding of them and historical reasoning. Thus if given an objective
test of memory, for Greek and Roman history facts at the end of the course,
a standard class would probably do better than an .Archaeotype class,
but a year or two later the Archaeotype class would probably do better.
More importantly, if we examined essays arguing for some historical conclusion,
then we would expect the Archaeotype students to be much more sophisticated
than the regular students (in fact, the reports from current Archaeotype
students seem quite sophisticated in terms of language, argument structure,
citations, etc.) -- and thus demonstrate a much deeper understanding of historical
facts and reasoning: We are in the midst of conducting such an investigation
of content learning, but do not have the results to report yet.
However, more than these particulars of the topic area for a class, an Archaeotypetype
educational experience should teach students to examine any situation, make
relevant observations and measurements, organize these materials, search out
related bodies of knowledge, organize all this information and use it to draw compelling conclusions and
make useful recommendations. Thus, the strongest test of student learning and
understanding from Archaeotype would be to compare their ability to investigate
and make conclusions and recommendations in an entirely different and unrelated
situation to the ability of students who have not had an Archaeotype experience
to do the same. That is what we did in the study reported here.
In the study we conducted, the students were given a booklet describing four
psychology experiments examining how people remember lists of words. The students
had to examine the basic obeservations, report on the results of the studies,
find the patterns, devise explanations and argue for those explanations. They
were also given some background readings in the psychology of memory. The Dalton
students who had been through the Archaeotype program were compared to
students from the Grace Church School (who also had some data-analysis experience
from going through The Voyage of the Mimi program from Scholastic Publishers).
The experimental group was 20 sixth-grade students who had participated
in the Archaeotype program at the Dalton School, an independent school
located on the east side of Manhattan. The control group was 20 sixthgrade
students who attended the Grace Church School, an independent school also
located on the east side of Manhattan.
Students in the
two groups were given a ten-page document (the assignment booklet) divided into
two parts. The first part described the results of four memory studies as follows:
(1) in study 1 subjects
listened to 20 words spoken at the rate of one word per second and then immediately
(2) in study 2 subjects listened to the same words spoken at the rate of one
word every three seconds and then immediately recalled them
(3) in study 3 subjects listened to the same words spoken at the rate of one
word per second but recalled them only after performing an unrelated 30-second
(4) in study 4 subjects listened to a different 20 words (many of which were
semantically related) spoken at the rate of one word per second and then immediately
The second part of the document provided background readings on technical concepts
such as short-term memory and long-term memory. Students were asked to use these
readings to interpret the results of the four studies and to present their interpretations,
along with practical recommendations for improving memory, in a written report
Administering the Materials and Collecting Student Reports
The study was conducted in two 2-hour sessions (for a total of 4 hours) spread
over two adjacent days. On the first day, the experimenter passed out theassignment
booklets, the students paired up, the experimenter read the instructions on
the first page of the assignment booklet, then the experimenter ran a demonstration
of the kinds of memory studies described in assignment booklets. In the demonstration
the experimenter read a list of 20 words then the students wrote down their
recall of them and the experimenter conducted a short discussion of what the
results were. This demonstration was done so that the students could see what
the studies described in the assignment booklets were like. After the demonstration,
the students proceeded to work on the assignment in groups of two. While doing
the assignment the students were free to use any of the resources in the Dalton
and Grace Church School buildings (computers, libraries, etc.) including asking
experimenter, for clarification and information questions (the same experimenter
conducted all sessions). At the end of the 2-hour period on the second day,
the students handed in their reports and all the work they had done in folders.
The experimenter then lead a halfhour discussion of the study.
Analysis of Student Reports
We devised a rubric for evaluating three dimensions of the student reports
pattern recognition, argumentation, and data representation. Given the emphasis
on data interpretation in the Archaeotype program, we accorded the most
weight to the dimension of argumentation, as indicated by the following distribution
(1) pattern recognition (20 points)
(2) argumentation (30 points)
(3) data representation (10 points)
In principle, students could receive a total of 60 points, though
we should point out that the rubric was designed to reflect what might be
described as expert responses to the task. This emphasis on high standards
is in keeping with the larger movement in educational reform that is often
referred to as authentic assessment. Pattern Recognition. Students received
1-2 points for describing each of the following intra-study patterns:
(1) in study 1 the pattern of last words/first words/middle words
(with middle words highly attentuated)
(2) in study 2 the pattern of fast words/first words/middle words (with
middle words more developed)
(3) in study 3 the pattern of first words/middle words/last words(with
last words highly attenuated)
(4) in study 4 the pattern of last words/words grouped in semantic categories
(with last words relatively attenuated)
In addition, students received 1-2 points for describing each
of the following crossstudy patterns that relate to number of words recalled:
(5) more words were recalled in study 2 than in study 1
(6) fewer words were recalled in study 3 than in studies 1 and 2
(7) more words were recalled in study 4 than in studies 1, 2, and 3
In effect, the number of words recalled in the studies can be
ranked in the following order:
study 4 > study 2 > study 1 > study 3
Apart from these major patterns, students received i-6 points
for noticing other significant patterns (i.e., 1-2 points up to three patterns):
for example, in studies 1 and 2 when middle words were recalled, they often
formed associative pairs (e.g., cup/water); or in study 4 the most salient
semantic categories were those involving fruit and animals as opposed to those
involving furniture and transportation (i.e., words in these categories were
recalled not only more frequently but earlier in the sequence);- and--within
the various categories, certain words which function as prototypes, tended
to be recalled first: for example, coat for the category of clothing and chair
for the category of furniture.
Explanation and Argumentation. Students were expected
to draw on the background readings to develop arguments supporting hypotheses
about the patterns they observed in the four studies. As a consequence, arguments
that drew appropriately on the background readings were awarded .1-4 points
each, whereas arguments, which did not draw on the background readings, were
awarded 1-2 points each. Here are local arguments that could be used in interpreting
major patterns in the four studies:
(1) in study 1 short-term memory explains the fact that the
last words are the first recalled
(2) in study 2 increase in time - and thus deeper processing in long-term
memory - explains the fact that more words can be recalled (especially, the
middle words that can be meaningfully associated)
(3) in 'study 3 the intervening 30-second task is used to explain not only
the fact the last words are no longer recalled first (i.e., short-term memory
is no longer operating) but fewer total words are recalled (i.e., long-term
memory is diminished as well)
(4) in study 4 the presence of semantically related words is used to explain
the fact that not only are more words recalled but the sequence in which they
are recalled (i.e., semantically related words tended to be grouped).
In addition to local argumentation, students were given credit
for global argumentation (e.g., these four studies suggest that meaningful
associations among individual words is the most powerful factor in word recall).
They were given i-2 points if such argumentation was presented without the
background readings, i-4 points if it was presented with the background readings.
As to the final recommendations in the report, students were given
1-4 points for grounding them in the data (e.g., ample time should be provided
so that meaningful associations can be formed between the items to be remembered)
and i-4 points for grounding them in the background readings (e.g., meaningful
associations should be developed so that material can be transferred from short-term
memory to long-term memory).
Students were also given i-2 points whenever they displayed legitimate forms
of alternative explanation for the same phenomena (for example, in study four
the fact that cat tended to occur early among the recalled words could have
been explained by the fact that it was among the last words presented (i.e.,
.short-term memory) and/or the fact it serves as a prototype of the 'animal'
category (i.e., members of such a category, as mentioned, tend to occur before
members of 'furniture' or 'transportation' categories).
Students were given credit if they used numerical and/or graphic methods
to represent major patterns in the four studies. With respect to numerical methods,
they received 1-2 points if they calculated the means for significant patterns
(1) the total number of words recalled in each study
(2) the number of first words, middle words, and last words retailed
in studies 1-3
(3) the number of words recalled in the semantic categories as well as the number
of last words recalled in study 4.
Students received an additional 1-2 points if they used these means to establish
significant proportions such as
(1) the relative weighting of first words, middle words, and last
words that were recalled in studies 1-3
(2) the relative weighting of last words and associated words (i.e,.
those in the semantic categories) that were recalled in study 4.
As to graphic methods of representation, students were given 1-6 points for
appropriate use of such methods. These methods include bar graphs that .represent
the proportions of different kinds of words retailed in the four studies. With
respect to studies 1-3, the line graph of proportion recalled ploted against
serial position (usually, called "the serial position curve") could have been
used to represent the major patterns constituted by first words/middle words/last
words. Alternatively, they could have used a flow chart to represent the
input/output relations for shortterm and long-term memory in these studies.
With respect to study 4, .they could have used tree-structures to represent
membership in the major semantic categories.
We present the results in Table 1. The numbers in this table are the means
for the Archaeotype group and the Control group. The total possible score
overall was 60 points, although this represents all that could conceivably be
found, not what any pre-college student could attain -- only a specialist in
the psychology of memory would have a chance of getting all these points. Thus,
the important aspect of these numbers is not their absolute value, but how the
Archaeotype and Control groups compare. This comparison is striking:
in total (the first column in Table 1), the Archaeotype group scored
33,% higher than the Control group (25.2 vs 19.2 -- out of a possible 60), and
this difference was very statistically significant, t(38)=2.22, p<:02.
To do this statistical analysis and the others reported later, we assigned
each student the score of the report created by the group (here, each group
is a pair) that they were in, then calculated a t test to see how big the difference
between the means of the Archaeotype student scores and the Control student
scores were compared to the variance of these scores within the Archaeotype
group and within the Control group.
Table 1 - Quantitative Analysis of Reports Written by Students
in the Archaeotype Group and the Control Group
Explanation and Argumentation
As described earlier, this overall total score breaks down into subscores for
recognizing the patterns in the observations (Pattern Recognition), explaining
the patterns and arguing for those explanations (Explanation and Argumentation),
and converting the observations into forms that could provide insight (Data
Representation).This breakdown shows that the overall Archczeotype superiority
was almost totally caused by a 73% higher performance for the Archaeotype students
in the important Explanation and Argumentation area (13.8 vs 8.0 -- out of a
possible 30 points). Statistically also, this is a highly significant difference,
t(38)=3.34, p<.001. There was also a slight difference in favor of the Archaeotype
students in the Pattern Recognition scores (10.6 vs 9.6 -- out of a possible
20), but that difference was not even close to being statistically significant
so we have to discount it, t(38)=0.76, p>.2.
The Data Representation scores held two surprises for us. The first surprise
is that they were so low (16% and $% of the possible, compared to 27%-53% of
the possible in the other areas): neither the Archaeotype students nor the Control
students used means,proportions, graphics nor diagrams in their discussions
-- they merely talked about one surprise is that the Control students scored
better than the Archaeotype students (3.6 vs 0.8-- out of a possible 10) to
a significant degree, t(38)=1.95, p<.05. However, the Control advantage was
totally due to these students putting the observations into a database program
on the computers (part-of Microsoft Works, which they were accustomed to using)
and calculating means. For example, one pair of students in the control group
displayed the database shown in Appendix C, Figure 5. This use of databases
was a potentially valuable move, but the control students did not exploit this
analysis for Pattern Recognition and Explanation-Argumentation. The Archaeotype
students did not show comparable use of database or spreadsheet programs and
thus scored lower on Data Representation. Taken together these results show
that the students both need to have experience using computer programs for manipulating
data, but they also need practice using them .meaningfully as part of their
work in analyzing authentic tasks.
The results showed an impressive ability on the part of the Archaeotype students
to create explanations of observations and argue for the validity of those explanations
using a mixture of their own terms and ideas, and the technical terminology
and concepts provided by background readings in a research literature. They
also did well in recognizing patterns in the observations, but not significantly
better than the control group we compared them to. In fact, the similar performance
of the Dalton School Archaeotype students and the Grace Church School
Control students on the Pattern Recognition portion of the assignment provides
assurance that the two groups were comparable, which makes the much higher performance
of the Archaeotype students on Explanation-Argumentation all the more
impressive. However, we need to also recognize that the basic patterns in the
observations the students were analyzing were fairly easy to see -- particularly,
after the demonstration and discussion conducted by the experimenter in the
beginning of the sessions. It may be that if the patterns being searched for
had been less apparent then there would have been more of a difference in Pattern
Recognition between the Archaeotype students and the Control students.
In fact, a study we have done comparing performance on another program with
a similar design (Galileo which teaches science to high school students through
astronomy) found pattern-recognition differences when the patterns were much
harder to see.
The Archaeotype students actually did worse than the Control students
in Data Representation, although both groups scored rather low in this area.
It is disappointing that the Archaeotype students did not use even such
rudimentary ways of representing data as counts, means and`proportions. At least
some students in the Control group managed to do some counting and means through
entering the observations into a computer database program they were accustomed
to using. Ideally, the students would even have used visualization techniques
like graphs and diagrams to reveal patterns in the observations and to argue
for their explanations. Archaeotype would seem a natural context within
which to introduce the powerful idea of representing information in different
forms to gain insight.
Bednar, A.K., Cunningham, D., Duffy, T.M., and Perry, J.D. (1991). Theory into
practice: How do we link? In G.J. Anglin (Ed.) instructional technology:
Past, present and future. Englewood, CO: Libraries Unliminted, Inc.
Brown, J.S., Collins, A. and Duguid, P. (1989). Situated cognition and the culture
of learning. Educational Researcher, 3,9, 32-42.
Cognition and Technology Group at Vanderbilt (i990). Anchored
instruction and its relation to situated cognition. Eductional Researcher, 20,
Collins, A., Brown, J.S., and Newman, S.E. (1990). Cognitive apprenticeship.
In L.B. Resnick (Ed.). Knowing, learning and instruction. Hillsdale,NJ: Erlbaum.
Jonassen, D.H. (1991) Objectivism versus Constructivism: Do we need a new philosophical
paradigm? Educational Technology Research and Development, 39, 5-14.
McClintock, R. (1971) . Toward a place for study in a world of instruction.
Teachers >College Record, 72, 405-416.
Palmer, R.E. (1969) Hermeneutics. Evanston,IL: Northwestern University Press.
Spiro, R.J., Feltovich, P.J., Jacobson, M.J. and Coulson, R.L. (1991). Cognitive
fiexibiltiy, constructivism, and hypertext. Educational Technology, 21, 24-33.