research-article

Open access

Student Performance on the BDSI for Basic Data Structures

Authors:

Leo PorterAuthors Info & Claims

ACM Transactions on Computing Education (TOCE), Volume 22, Issue 1

Article No.: 8, Pages 1 - 34

https://doi.org/10.1145/3470654

Published: 25 October 2021 Publication History

All formats PDF

Abstract

A Concept Inventory (CI) is an assessment to measure student conceptual understanding of a particular topic. This article presents the results of a CI for basic data structures (BDSI) that has been previously shown to have strong evidence for validity. The goal of this work is to help researchers or instructors who administer the BDSI in their own courses to better understand their results. In support of this goal, we discuss our findings for each question of the CI using data gathered from 1,963 students across seven institutions.

1 Introduction

Concept Inventories (CIs) are designed to measure student conceptual knowledge for a particular topic. They are valued by education researchers and instructors, as they provide a common benchmark for student knowledge, and they support instructors looking to diagnose areas of strength (or weakness) in their particular courses. The development process for a CI ensures that various measures of validity are used to ensure that the questions on the CI are meaningful to instructors, that they are properly interpreted by students, and that they measure student understanding [1].

In computer science, there have been few instruments created for use in such a manner (for example, the Second CS1 instrument [23] and the Digital Logic Concept Inventory [14]). Our previous work complemented the field by adding the Basic Data Structures Inventory (BDSI), a CI that can be used in conjunction with a CS2 course (or the corresponding course that teaches basic data structures) [27].

Our previous work first reported on the establishing of learning goals [26] and common difficulties with basic data structures [40]. We then reported on our extensive development and validation process in which we reported on our claims of validity and evidence we gathered to support those claims [27].

What remains unanswered for instructors and researchers using the instrument is how to interpret their findings. If a large section of students answered a particular distractor for a given question, how often do students at similar schools select this distractor and what might have led students to select that distractor? In this work, we seek to provide this context to users of the instrument by reporting on the frequency of student responses to each question and use data from our student interviews to help with interpreting the source of particular distractors. The data used to provide this context includes data from 99 interviews at seven institutions as well as data from large-scale pilots of the instrument to 1,963 students. For each question, we provide the full content of the question, a box plot that characterizes the frequency of student answer selections, and our interpretation of what the results imply about student understanding of the underlying concepts. By providing our analysis of these results, we hope that CS instructors, CS education researchers, or anyone else interested in utilizing the BDSI will feel more confident in adopting it and be better able to understand its results.

2 Background

A CI is a multiple-choice assessment designed to measure student understanding of the core concepts of a course. The goal of a CI is not to summatively assess a particular student, but rather to create a standardized instrument that can be used to compare across instructors, courses, and pedagogies. Adams and Wieman [1] list several requirements for a useful CI:

•

It must be easy to administer and grade without any training in the context of a normal course schedule.

•

The instrument must test “expert thinking” of obvious value to teachers.

•

It must be demonstrated that it measures what it claims (evidence of validity).

2.1 CIs in STEM and Computer Science

CIs arose from observations by physics educators in the mid-1980s that conventional instruction in Newtonian physics was not working. It appeared that everyday beliefs about motion and force were incompatible with Newtonian concepts; moreover, conventional instruction produced little change in these beliefs [15]. This led to the design of the Force Concept Inventory (FCI) [15] to measure these student misconceptions. The FCI spurred the adoption of Peer Instruction and other active learning techniques within physics education [4]. A later study [12] administered the FCI to 6,542 students in 62 different courses and found that traditional courses had lower learning gains than those using active learning. The impact of the FCI on the physics community led to the design of other CIs both in physics [3, 21] and in numerous other scientific areas including chemistry [19], genetics [32], statistics [34], calculus [8], geoscience [20], oceanography [2], nursing [29], astronomy [30, 39], and biology [7].

Within computer science, there have been fewer CIs developed [36]. Tew and Guzdial developed the FCS1, an assessment for CS1 that uses a pseudocode to measure students’ basic programming knowledge [37]. Due to the unavailability of the FCS1, the Second CS1 Assessment (SCS1), an isomorphic version of the FCS1, was developed and made available to instructors [23]. CIs have also been developed for digital logic [14] and computational thinking for elementary-level [24] and middle-grades [28] students. Preliminary work towards CIs is available for a number of other computer science topics, including algorithms [6, 9], cybersecurity [31], recursion [13], advanced data structures [18], architecture [25], and operating systems [38]. For additional evaluation instruments in CS, see Reference [5] and for a broad survey of CI availability in computer science, see Reference [36].

2.2 BDSI Prior Work

This article represents the culmination of a multi-year project to develop, validate, and understand the BDSI. Throughout the project, we published several other papers along the way. To provide context, we briefly summarize our prior work below. Note that while we summarize some of this prior work in other sections (e.g., BDSI development in Section 3 and evidence of statistical validation in Section 4), the prior work contains many more details. We encourage interested readers to refer to these papers for additional information about the BDSI:

•

The first publication [26] describes the development of the BDSI’s learning goals. It recounts how, with the help of a panel of CS2 experts, we determined a set of topics and question coverage goals for the BDSI.

•

The next paper [40] identifies common student misunderstandings in basic data structures. It represents a precursor to the present article with a substantially smaller student sample size. It reports only a subset of the concepts on the BDSI and analyzes earlier, less-refined versions of the questions.

•

The third publication [27] details our work toward validating the BDSI, and the present article analyzes the dataset presented in this validation work. The paper evaluates several validity claims for the BDSI, including topics such as the questions are meaningful to instructors, students properly interpret questions, and the CI is internally consistent. It analyzes student think-aloud interviews and a large dataset of several BDSI runs using the methods of Classical Test Theory and Item Response Theory. Ultimately, we believe that the results illustrate strong evidence of statistical validation for the BDSI.

•

Finally, the most recent paper [35] characterizes the full development process for the BDSI. It chronicles the question development and refinement process with the intent of providing a CI development roadmap. We hope that the lessons it offers will lower the barrier to entry for future CI development in computer science.

The present article complements this prior work by analyzing a large set of student responses that we collected during the BDSI development process. Rather than development processes [35] or statistical validation [27], it primarily focuses on understanding the student learning implications of student selections on the CI. We expect that this article will be especially helpful in interpreting results for CS2 instructors or CS researchers who adopt the BDSI.

3 BDSI Development and Administration

Our findings are based on data collected during and after the development process of the BDSI. As such, we detail the completed development process and these sources of these data here. The BDSI development team consists of faculty with teaching backgrounds from both large research-intensive universities and small liberal arts colleges. We met frequently during the development of the BDSI, authored questions, conducted (and analyzed) interview data from students, and analyzed results from pilot runs of the instrument. Throughout the CI development process, we consulted with an expert panel: CS2 instructors from a broad set of North American institutions including research-intensive universities, public regional universities, liberal-arts colleges, and community colleges. The expert panel provided feedback periodically during the development process and were extremely helpful in contributing their institutional context to the work. While the full details of our four-year CI development process are documented in prior studies [27, 35], it can be briefly summarized as:

(1)

Establish topics that are important to teachers.

(2)

Develop a set of open-ended questions on these topics.

(3)

Interview students and offer open-ended practice exams to discover misconceptions to convert questions into a forced-choice or select-all format.

(4)

Conduct validation interviews and question trials at a variety of institutions.

(5)

Statistically validate the CI.

We refer to step 3 as the interview phase, and it encompasses one-on-one think-aloud interviews with students on broad questions aimed to elicit thinking about basic data structures. Based on the interview results, we developed an initial set of CI candidate questions. As our goal was to gain a nuanced understanding of student conceptual knowledge of data structures, we opted to include select-all questions to capture cases where students frequently selected both a correct answer and one or more distractor answers, despite potential complications for scoring.

For step 4, the refinement phase, we iteratively refined the candidate questions via both individual student interviews and trial runs of the instrument. We organized the trials as “final exam review sessions” to be held in conjunction with the end of the term for courses that teach basic data structures. These sessions were led by course instructors, TAs, or other faculty and involved students individually taking a one-hour “practice exam” that was the current draft of the BDSI. During this phase, we asked students to explain their answers for early versions of multiple-choice questions, and these responses proved integral for finding patterns of erroneous student reasoning. Once complete, the exam booklets were collected by the staff administering the exam, and the staff then went over the answers to the BDSI as well as other course topics not addressed by the CI. Students who attended the final exam review session were allowed to opt-in or opt-out from participating per our approved Human Subject Board protocol (if they opted out, then they could complete the practice exam, but we did not use their data).

Finally, in step 5, the validation phase, we interviewed students to ensure that they were interpreting our questions correctly. We then administered the finalized CI via similar “final exam review sessions” to collect large-scale response data. The quantitative results presented in Section 4, in which we present how frequently students select each answer choice, come from these validation runs. The combination of the quantitative interviews and a statistical analysis of the finalized CI response data demonstrates strong evidence for BDSI validity [27].

For multiple-choice and select-all style questions, we did not randomize the order of the answer choices. The BDSI has a variety of question types (e.g., data structure analysis and code fill-in-the-blank), and though some types of questions may have benefited from randomization, others would not [10].

Steps 3–5 are particularly important for the question analysis we provide in Section 4, as each phase uniquely contributed to our understanding of common student misconceptions. Throughout that analysis, we describe whether our insights into patterns of student thinking came from an interview or an open-ended practice exam.

3.1 Learning Goals

The final version of the BDSI contains 13 questions, which cover a variety of data structures topics, including lists, trees, stacks, and sets, with a heavy emphasis on lists and trees. While all of the questions are multiple-choice, they are presented in two formats—some ask that students select only one option, whereas others ask students to select all the options that apply. Each question has a primary learning goal according to the goals described in Reference [26]; the relevant table is reproduced in Table 1.

Table 1.

Goal#	Learning Goal
1	Analyze runtime efficiency of algorithms related to data structure design.
2	Select appropriate abstract data types for use in a given application.
3	Compare data structure tradeoffs to select the appropriate implementation for an abstract data type.
4	Design and modify data structures capable of insertion, deletion, search, and related operations.
5	Trace through and predict the behavior of algorithms (including code) designed to implement data structure operations.
6	Identify and remedy flaws in a data structure implementation that may cause its behavior to differ from the intended design.

Table 1. BDSI Learning Goals (Adopted from Reference [26])

]BDSI Learning Goals (Adopted from Reference [26])

3.2 Assumed Student Background

We assume students have already taken a course in basic data structures when they are given the instrument. In developing the questions for the BDSI, we faced a challenge due to variations in course content, particularly surrounding the choice of programming language. To make the BDSI accessible to as many courses as possible, we aimed to develop questions that are agnostic to the choice of programming language. Thus, we augmented the pseudocode language originally used in the FCS1 [37] and SCS1 [23] with minor modifications to include objects. When presenting the BDSI to students, we provide a short reference packet that encompasses:

•

Question answering instructions that highlight the differences between select-one and select-all question styles.

•

Brief definitions for the interfaces and data structures used in the BDSI. We provide these definitions not to introduce students to these concepts (e.g., for a pre-test), but to help account for students who may have learned different terminology depending on their programming language (e.g., array lists in Java, lists in Python, and vectors in C++). For reference, we reproduce the BDSI’s definitions, instructions, and pseudocode reference in Appendix A.

•

A concise summary of the BDSI’s pseudocode language with several examples.

Regardless of the language in which students learned the material (Java, Python, or C++), students rarely identified programming language differences, the pseudocode language, or data structure terminology as sources of confusion in the course of our validation work. While our pseudocode was tested with students using a variety of imperative coding languages, we do not address basic data structures in functional paradigms such as Scheme, whose teaching and learning may be different.

4 Results and Analysis

In this section, we present each question, illustrate the selection rate of each answer choice, and offer our analysis of the results. The analysis draws from observations of student reasoning in the development phases described in Section 3. When possible, we offer our conjecture as to how faculty might work to remedy the identified misconception for students. To aid instructors in comparing their findings to those of others at comparable institutions, we provide student response rates per answer choice. These additional results add nuance to interpreting the results, particularly for select-all questions, which may be masked when only examining question correctness.

In the interest of space, we only comment on results that we believe are notable, and our analysis ignores infrequently-chosen answers if they are not indicative of an interesting result. We note that even the most infrequently chosen answers were observed in student responses during earlier CI development phases (e.g., open-ended interviews).

Table 2 describes the institutions from which we collected data. Due to the variance in sample sizes between large R1 institutions and small liberal-arts colleges, our question analysis presents box plots and the median correctness rate across each independent run of the instrument, regardless of the number of students who participated. Reporting the median by independent run prevents the larger datasets from drowning out the smaller ones and characterizes differences in performance between institutions, which is one of the goals of a concept inventory. Table 3 summarizes the questions, their learning goals, and their correctness rates. The remainder of this section presents and analyzes each question in detail. We analyze the dataset that we collected as part of the BDSI validation process. For validation details (including CTT and IRT results), see Reference [27].

Table 2.

	I-A	I-B	I-C	I-D	I-E	I-F	I-G
School Type	CC	PUI	LAI	RIU	LAI	RIU	RIU
School Size	M	L	S	L	S	L	L
# Participants	16	17	12/57	14/11	63	82/408/840	110/164/169

Table 2. Summary of the Institutions That Participated in Data Collection

Some institutions participated multiple times, as reflected by the number of participants separated by slashes. The type of each institution is listed as one of community college (CC), primarily undergraduate institution (PUI), liberal arts institution (LAI) or research intensive university (RIU). We define small as less than 5,000 students, medium as 5,000 to 10,000 students, and large as greater than 10,000 students.

Table 3.

	Learning Goal	Structures	25^th	Median	75^th	Question Description
q1	4	LL	49%	68%	78%	Correctly add an element to the tail of a linked list.
q2	1	LL	9%	14%	30%	Compare list performance with and without a tail pointer.
q3	1	LL	73%	76%	81%	Without looking at the code, experimentally determine how an ADT was implemented.
q4	1	LL	75%	79%	86%	Analyze performance of implementing an interface with a linked list.
q5	5	BST	24%	33%	44%	Identify BST insertion orders that lead to balanced trees.
q6	5	LL	51%	61%	74%	Explain the behavior of a function that uses a linked list.
q7	4	BT	32%	34%	46%	Write a function to compute a property of a binary tree.
q8	3	Many	55%	64%	76%	Compare the performance of satisfying an abstract interface with several structures.
q9	5	BST	10%	21%	33%	Analyze performance and effect on tree properties of adding elements to a BST.
q10	2	LL, Stack	47%	54%	60%	Evaluate the performance of implementing “undo” functionality with a linked list.
q11	6	BST	68%	71%	74%	Determine appropriate cases for testing the correctness of a BST implementation.
q12	4	BT	42%	48%	50%	Compute the width of a binary tree.
q13	2	List, Set	78%	81%	89%	Compare solving a problem with a list interface versus a set interface.

Table 3. Summary of BDSI’s Questions, Their Learning Goals, the Data Structures They Cover, and Their Median and Interquartile Correctness Rates

4.1 Question 1: Linked List Append

Question 1 examines students’ ability to author code to complete a linked list method when the singly linked list is modified to include a tail pointer. Students are expected to use the tail to insert a new element and then update the tail. This question addresses learning goal 4, “Design and modify data structures capable of insertion, deletion, search, and related operations.” During interviews, the pseudocode language was not a barrier to students, although they sometimes looked to the question preamble for an example of how to create a new node. The correct response is option D. Figure 1(a) provides the frequency of student responses for each answer choice.

Fig. 1.

4.1.1 Correct Choice.

Option D correctly adds a new element to the end of the list by using the tail reference, then updates the tail. The majority of students (median of 68%) answer this question correctly.

4.1.2 Distractor E: Using a Loop (Ignoring the Tail Pointer).

Option E is the most popular incorrect response with a median response of 13%. Although the loop is correct for appending the new element, it fails to update the tail reference and is hence incorrect. This response was observed during an open-ended practice exam in which students could provide any answer. Option E surprisingly ignores the tail reference, which is provided in the question and appears in other option choices. Unfortunately, this approach was not observed during interviews so our ability to interpret students’ thinking is limited. However, we suspect this is caused by students having memorized (or becoming familiar with) the looping approach to traversing a linked list and recognizing that this looping approach is often what is needed when working with linked lists. Potential remedies might include emphasizing data structure algorithm development beyond examples from class and providing more problems to students that have them author LinkedList methods that do not require looping through the list.

4.1.3 Distractor B: Correct Append but Fails to Update Tail.

Students who selected option B correctly create a new node and add it to the end of the list, but they miss the update of the tail. This was observed during interviews and during open-ended practice exams. In general, students who made this mistake either forgot that they needed to update the tail themselves or believed that they were not responsible for updating the tail. We suspect instruction that emphasizes data structure invariants (e.g., the tail reference always points to the last element in the list) and the importance of every method maintaining those invariants might help with mitigating this error.

4.1.4 Distractor C: Correct Update of the Tail but Fails to Append.

Students who selected option C correctly create a new node and set the tail to point to that new node. However, this option fails to update the original tail.next so the list does not include the new element and the tail now refers to an element not in the list. This response was observed during interviews and during open-ended practice exams. We believe the source of this mistake is similar to Distractor B in that they again fail to maintain all invariants of the list.

4.1.5 Distractor A: Misunderstanding of Objects.

Option A is the least common response. The code correctly recognizes the need to update tail.next and the tail reference, but it does not create a MyListNode containing the element. In a strongly typed language, this code would not compile. Again, this response was observed both during interviews and during open-ended practice exams. During free-response interviews, we suspect this might have occurred due to misunderstanding the pseudocode or through carelessness. Upon adding multiple-choice options that all correctly create a new node, we did not see this response in our closed-form interviews. As such, we believe students selecting this option misunderstand types and objects. Future work could see if this is more common among students who take the course in one programming language over another. Potential remedies would be reviewing differences in object types and/or more emphasis on Object-oriented Programming techniques.

4.2 Question 2: Advantages of a Tail Pointer

Question 2 asks students to determine the runtime implications of adding a tail pointer to the singly linked list from the previous question. Students are expected to see that the operation to add an element at the end, as coded in the previous question, would be fastest and that no other operations would be faster. This question addresses learning goal 1, “Analyze runtime efficiency of algorithms related to data structure design.” The correct response is option D. Figure 1(b) provides the frequency of student responses for each answer choice. During interviews, the incorrect response B was often initially selected by students, but as they talked through their answer as part of the think-aloud process, they would catch themselves and recognize that although they could access the last element using the tail, they would be unable to point the tail to the second-to-last element without iterating through the entire list. At scale, however, a large number of students appear to make this mistake, causing this question to be consistently the hardest question on the BDSI.

4.2.1 Correct Choice.

Option D correctly identifies that adding a new element to the end of the list is faster using the tail reference, as there is no longer a need to iterate through the list. The majority of students (median of 94%) correctly select this option. However, as this is a “select all that apply” question, only a median of 14% correctly select this as the only answer.

4.2.2 Distractor B: Removal of the Last Element.

As mentioned above, Distractor B is selected by a majority of students (median of 70%) and is incorrect. We believe that students selecting this option are not carefully thinking through the detailed steps of how to perform the removal including how to: (1) update the second-to-last node to point to nil and (2) update the tail to point to the second-to-last node (or nil if the list is empty after the removal). We suspect that teaching students to create short pseudocode of solutions before reasoning about their runtime might help reduce this error.

4.2.3 Distractor C: Faster Addition of Elements to the Start.

Adding an element to the beginning of the list is already fast for a singly linked list. This mistake did not appear in interviews but did appear during the open-ended run. Only a small group of students (median of 7%) select this option, but they may have considerable confusion about when linked-lists, without a tail pointer, are efficient.

4.3 Question 3: Runtime of LinkedList Operations

Question 3 asks students to design an experiment to determine if a list implementation is singly or doubly linked. To correctly answer the question, they must understand how the runtime of operations on the two list implementations will differ. This question addresses learning goal 1, “Analyze runtime efficiency of algorithms related to data structure design.” The correct response is option B. The distribution of student answers for this question is shown in Figure 2(a).

Fig. 2.

4.3.1 Correct Choice.

The correct answer is choice B. This option requires students to understand that adding a node to the end of a singly linked list will take longer than adding to the start of the list, since to add at the end one must traverse the entire list, while adding at the start only requires following a single pointer and then updating several pointers. In a doubly linked list with a tail pointer, neither of these operations will require traversing the list, so they will both take approximately the same amount of time. The median student correctness for this question was 75%.

4.3.2 Distractor C: addAtEnd vs. removeAtEnd.

This distractor compares the timing of two linked list operations, just like the correct answer, but the operations it compares are adding a node at the end of the list and removing a node from the end of the list. These two operations will both involve traversing the entire list in the singly linked list and avoid traversing it in the doubly linked list. Thus, both operations will be O(n) in the singly linked list and O(1) in the doubly linked list. This means that comparing the times of these two operations on a single implementation will not reveal any differences between them. Students choosing this answer may think that they should test these two operations because they have a different time complexity in each implementation. This was the most popular distractor answer, with the median response rate of 11%.

4.3.3 Distractor E: Source Code Required.

This distractor claims that it is impossible to tell which list implementation is being used without looking at the source code. Its median response rate is 5%, making it the second-most-popular distractor. During interviews on the free-response version of this question, we frequently had students answer that they would look at the source code to determine the implementation, despite our attempts to make clear that this was a situation in which you would not have access to the source code. However, when we converted the question to multiple choice, validation interviews made it clear that there was no way to include looking at the source code as a potential answer without the students interpreting it as meaning they had access to the source code. Instructors should be aware of the student misconception that source code will always be available.

4.4 Question 4: LinkedList Runtime

This question tests students knowledge of how to implement data storage in a LinkedList and the runtime of related LinkedList operations. It evaluates learning goal 1, “Analyze runtime efficiency of algorithms related to data structure design.” Students are asked to store key-value pairs in the LinkedList, where keys are integers, including negative integers. Pairs will be inserted once and accessed many times. The question asks for the best possible time complexity for a get method that takes in a key and returns a value. The correct response is option C. The breakdown of student responses for this question is shown in Figure 2(b).

This question was originally part of a longer question, in which we also asked students to implement the same data store, but using an ArrayList. We expected the correct answer to be log N—students would insert pairs in a sorted order and then perform binary search on the list. However, we found that both very high- and very-low-performing students were selecting the distractor answer: constant time. In interviews, we found that high-performing students devised a solution in which they found the most negative integer key and then biased all of the keys by that amount to insert them into a potentially very large array, allowing them to be found in O(1) time. Low-performing students simply did not take into account the fact that they could not insert based on integer value if the integers were negative. To avoid this problem with the question, we used only the part of the question that focused on LinkedLists in the final CI.

4.4.1 Correct Choice.

The correct answer is option C, time complexity of N. Since this is a LinkedList, the method will need to traverse the LinkedList to find the key, an operation that has a time complexity of N, the length of the LinkedList. A median of 79% of students correctly chose this answer.

4.4.2 Distractor D: Log N.

This was the second-most commonly chosen answer, with a median value of 7% of students selecting it. In this answer, students believe they can find a value in a LinkedList in time log N. This answer may indicate that students think they can perform a binary search on the LinkedList.

4.5 Question 5: Binary Search Trees

Question 5, which is based on work by Karpierz and Wolfman [18], probes students’ ability to reason about the shape of a binary search tree (BST). In particular, students are expected to choose an insertion order for a range of numbers such that it ultimately produces a desirable BST structure: a fully balanced tree. This question addresses learning goal 5, “Trace through and predict the behavior of algorithms (including code) designed to implement data structure operations.” Options B and D are both correct answers to this question. Figure 3(a) illustrates how frequently students selected each answer choice.

Fig. 3.

4.5.1 Correct Choices.

The sequence in Option B, with a median selection rate of 80%, starts at the root and then immediately begins building both of the root’s subtrees. Fewer students (median of 53%) correctly selected D, which initially spends more time constructing the root’s right subtree. This resulted in a median correctness of only 33% on the problem, making it one of the harder problems on the BDSI. We suspect that option B more closely matches the way in which balanced BSTs are typically taught in the classroom, and students more easily recognized the pattern. We recommend that instructors demonstrate that multiple insertion orders might ultimately produce the same BST.

4.5.2 Distractor C: Starting in the Middle, Alternating Outwards.

Option C was easily the most popular distractor choice (median 35%). In selecting C, students apply the correct logic to identify the root node, but then abandoned that logic for subsequent nodes under the root. That is, they identified that 512 should be the root of the tree, which produced two equally sized trees below the root, but then generated a long chain of nodes for each of the root’s subtrees. After inserting 512, the sequence alternates inserting values that were one larger or smaller than what was inserted previously.

Students selected this distractor in interviews and during the open-ended practice exams. They seem to believe that only the root needs to contain an equal number of children on the left and right for the tree to be considered balanced. Concerningly, they do not consider applying the same criterion recursively, despite many properties of BSTs often being defined recursively.

4.5.3 Distractor F: Randomness.

A median of 7% of the responses asserted that inserting values randomly into a BST will always produce a balanced tree. This response suggests that some students may equate “random order” with “good order,” despite a random order not being guaranteed to produce a balanced tree. From interviews and the open-ended practice exams, it seemed as though students had learned that the average case for BSTs is O(log n) and that random insertion can often provide this average case. Some students may also have been confused about the definition of a “fully balanced tree.”

4.6 Question 6: Linked List Traversal

Question 6 tests students’ ability to trace a linked list traversal and reason about its behavior. The function to analyze, mystery, uses a while loop to search for a provided key without running off the end of the linked list (by testing that current != nil). This question addresses learning goal 5, “Trace through and predict the behavior of algorithms (including code) designed to implement data structure operations.” The correct response is option A. Figure 3(b) provides the frequency of student responses for each answer choice.

An important insight is that there are two temporary variables that step through the linked list during the traversal, with the variable temp always one step (or link) behind the variable current. This means that when the function returns temp, it returns the node before the node containing the desired key (or nil if the desired key is in the head of the list).

We recognize that, for this question (and others), student errors could be caused by misunderstandings related to general programming (e.g., the challenges of loops [16, 33]). In these subsections, we speak to the data structures errors we were able to discern from interviews in the CI refinement and validation phases.

4.6.1 Correct Choice.

Option A correctly describes the function’s behavior in each of three important cases: when the key is found in the first node of the linked list, when the key is found elsewhere in the linked list, and when the key is not found. The remaining distractors each describe an incorrect behavior for one or more of these cases. A median of 61% of students correctly chose this answer.

4.6.2 Distractors B and C: Misunderstanding What Is Returned When the Key Is Found.

Distractors B and C both resemble the behavior of returning the current variable when the key is found, ignoring the role of the trailing temp. This results in believing that the function returns the node containing the key, not the one before it. Distractors B and C differ in what students believe happens if the key is not found, which depends on whether they correctly trace the ending condition of the WHILE loop. A median of 14% of students chose distractor B, and 12% chose distractor C.

4.6.3 Distractor D: Misunderstanding What Is Returned in Special Cases.

Distractor D reflects a correct understanding of the main role of the temp variable and how it causes the function to return the node before the one containing key, but incorrectly analyzes what happens in the edge case when the key is not found. This distractor was chosen by a median of 10% of students.

4.7 Question 7: Recursion on a Binary Tree

Question 7 asks students to reason about recursive operations on a binary search tree. This corresponds to learning goal 4, “Design and modify data structures capable of insertion, deletion, search, and related operations.” Students are expected to both know to use recursion and be able to correctly implement recursive functions. The question asks them to correctly select the function that returns the sum of the values of all of the leaf nodes of a binary tree. They are given the class definition for the nodes of the binary tree and told that their function will be called on the root node of the tree. Although this is a “select all” style question, it has only one correct answer: Option B.

4.7.1 Correct Choice.

In this answer, the function first checks if the node is nil, and if so, returns zero. If both children of the node are nil, then the tree is a leaf node, and so the value of the node is returned. Otherwise, the function returns the sum of the recursive call of the function on both children of the node. The majority of students (median correctness of 77%) recognized that this answer was correct; however, most of them failed to recognize it was the only correct answer, with a median of 44% correctly choosing only this answer.

4.7.2 Distractor A: One-sided Recursion.

In this distractor, the function first returns the node’s value if the child is a leaf node. Next, it checks if a left child exists, and if so, returns the value of the current node plus a recursive call to the left. Last, the function checks if a right child exists, and if so returns the current node’s value plus the recursive call to the right. This distractor is wrong in two ways: It both includes the values of non-leaf nodes in the sum, and it recurses only to the left side if a node has two children. This was our least popular distractor answer, with a median of only 13% of students choosing it. This answer was more popular as an answer in the open-ended practice exam phase than when the question was converted to multiple choice. It may be that while only recursing to one side is an easy mistake for students to make while writing code, it is easy for them to recognize the mistake when they can also see correct code.

4.7.3 Distractor C: No Recursion.

This distractor answer uses a while loop to iterate over nodes in the binary tree, summing the values of the leaf nodes. However, it always travels down to the left child of a node (unless the node does not have a left child), leaving most of the tree unexplored. This was our second-most-popular distractor answer, with a median of 18% of students choosing it. To address this common error, instructors may wish to stress the recursive nature of data structures like BSTs.

4.7.4 Distractor D: Not Returning the Recursive Call.

This distractor defines a new variable and sets it to 0 at the beginning of the function. If the node it is called on is a leaf, then it adds the value of the node to this accumulator variable, and it returns the variable at the end of the function. It also recursively calls itself on the left and right children of the node, but does not return the results of these recursive calls or add them to the variable. As a result, it will only return the value of the first leaf node it finds, not the sum of all the leaf nodes. This was the most commonly chosen distractor answer. A median of 40% of students chose this answer. Similar student answers have been documented in previous studies of recursion, and the misconception of variables being treated as global in recursion has been described in the literature as the “looping” mental model of recursion; see References [11, 17, 22]. The popularity of this answer indicates that instructors should explicitly address when students must return the result of a recursive call of a function and how to properly pass variables when using recursion.

4.8 Question 8: Random-access List Implementation

Question 8 tests students’ ability to choose an appropriate data structure, given a high-level description of the data to be stored and the planned access pattern. It corresponds to learning goal 3, “Compare data structure tradeoffs to select the appropriate implementation for an abstract data type.” The correct response is option A, and Figure 4(b) provides the frequency of student responses for each answer choice.

Fig. 4.

Our interviews revealed that this question captures student misconceptions that essentially derive from having memorized atomized facts about performance of arrays and linked lists without the conceptual understanding of how to apply those facts to engineering design choices. It was particularly important to the members of our expert panel that students be able to make such engineering design choices that go beyond simply memorizing particular data structure facts.

Some of the atomized facts that students verbalized during interviews for this question included the following (these are all true but not all applicable to this situation, and their misapplication leads students to the various distractor choices):

•

Inserting into the front or “middle” of an ArrayList can be time-consuming, since all elements following the inserted one need to be moved over to create space (students may correctly understand this to be a reason that we may sometimes prefer a linked-list to an array).

•

Doubly linked lists with both a head and tail pointer allow both forward and reverse traversal of the list.

•

Binary search is an efficient search mechanism.

•

To use binary search, data must be in sorted order.

Two key insights for this question are that a linked list’s insert/remove flexibility is not advantageous for an access pattern that does not include insertion/removal and that binary search is not advantageous when we are given the index location of the element to access. Thus, the only relevant focus of analysis is the access time when the element’s index is given.

4.8.1 Correct Choice.

Option A correctly identifies the ArrayList as the correct data structure for storing N strings and allowing access to them according to their position (index). This was chosen by a median of 64% of students.

4.8.2 Distractor B: Unwarranted Assumptions about Access Pattern and Sorting.

Distractor B proposes using binary search to access elements of the list, but this errs in two ways. First, there is no need to perform binary search if the index is given, as it is in this question. Second, the binary search algorithm only works on sorted data, and this answer choice does not specify that the data would be sorted. This distractor was chosen by a median of 10% of students.

4.8.3 Distractors C and E: Incorrect Analysis of DoublyLinkedList Traversal Cost.

Distractors C and E propose using a linked data structure, despite the fact that, as mentioned above, list editing is not required for this application.

These distractors also incorrectly suggest that a list, whether unsorted (Distractor C) or sorted (Distractor E), can support constant-time access to arbitrary elements. In fact, an element at a given index must be accessed by an O(N) traversal from either the head or the tail. In our interviews, some students thought that this traversal of the doubly linked list would be better than linear time, because it could proceed simultaneously forwards from the head and backwards from the tail. While both directions of traversal are possible, the access remains linear time cost. The median percentage of students who chose these options was 0% for C and 3% for E.

4.8.4 Distractors D and F: Incorrect Application of Binary Search to a Linked Structure.

Distractors D and F propose using a linked data structure, despite the fact that, as mentioned above, list editing is not required for this application.

These distractors also propose using binary search. However, a log(N) binary search algorithm is impossible on DoublyLinkedList structures, since it requires random access ability. Sorting is required for binary search, making choice D incorrect in an additional dimension. The median percentage of students who chose these answers was 0% for D and 17% for F.

4.9 Question 9: Binary Search Tree Properties

Question 9 asks students to predict the properties of a data structure (BST) after a series of operations (insertions) is performed on it. It requires students to identify which properties of the tree change and to consider how long the insertions will take (both in good and worst-case scenarios). It corresponds to learning goal 5, “Trace through and predict the behavior of algorithms (including code) designed to implement data structure operations.” The correct answer combination is B, C, and D. Figure 5(a) illustrates how frequently students selected each answer choice.

Fig. 5.

4.9.1 Correct Choices.

The correct selections are B, C, and D. Option B is true because, with the median at the root, the tree initially contains an equal number of nodes in the left and right subtrees. All new insertions are larger than an any existing value in the tree, thus they must all appear in the right subtree, making it larger than the left. A median of 79% of students chose option B. Options C and D depend on the order in which the insertions are made, which is unspecified by the question. If the new values are added in an order that balances the right subtree, then the final insertion will take log N time (C). However, if the values are added in the worst-case order (increasing numerical order), then the right subtree devolves into a list, and the final insertion requires time proportional to N. A median of 61% of students chose C, and 46% of students chose D. Only 21% of students correctly chose all three answers.

Many students selected B and C without D, indicating they did not think the time to insert could be O(N). This may be an example of the “default balanced” misconception identified by Karpierz and Wolfman [18], in which students assume that BSTs will automatically have a balanced shape. The large number of students who failed to select this answer indicates that instructors should emphasize the worst-case results of insertion into a BST.

Also, many students selected B and D but failed to select option C, indicating they did not think the time to insert could be proportional to log N. Based on open-ended practice exams in which we asked students to explain their selections, students who did not select C often indicated that they had interpreted the question to mean that every inserted value was larger than every previously inserted value, rather than just being larger than the original values in the tree. Despite clarifying the text of the question many times, we believe this may still primarily be an issue of reading comprehension or students not reading the question carefully enough.

4.9.2 Distractor A: Root Remains the Same.

A median of 9% of students chose A, indicating that they thought the root would continue to hold the median value in the tree, even after new nodes were added. In our open-ended practice exams, more than half of the students who selected A justified their answer by indicating that the root of a BST remains the same. Selecting this option indicates that students have internalized the idea that once in a BST, a node will stay in the same position unless nodes are removed (correct); however, they seem to have memorized this idea to the extent that it did not occur to them that while the root node’s value will not change, properties related to it (like the median) can change.

4.10 Question 10: Linked Lists for Stack Functionality

Question 10 tests students’ ability to correctly apply standard data structures when implementing prescribed program functionality, which is closely aligned with learning goal 2, “Select appropriate abstract data types for use in a given application.” The question asks students to build “undo” functionality (which behaves as a LIFO stack) using a singly linked list, taking performance into account. The correct answer for this problem is option D. Figure 5(b) illustrates how frequently students selected each answer choice.

4.10.1 Correct Choice.

While options C and D both meet the correctness requirements (LIFO functionality), only option D offers the best performance (O(1)). A median of 84.2% of students correctly chose option D, making it by far the most popular answer.

4.10.2 Distractor C: O(N) Performance.

Option C is the second-most frequently selected choice (median of 29%), and while it correctly solves the problem, its O(N) performance is substantially worse than option D. Most of the students who selected C selected both C and D, which suggests that they understand the correctness requirements and data structure behavior without accounting for performance. In response to these results, we recommend that instructors explicitly highlight concrete examples in which different “correct” applications of data structures exhibit disparate performance results.

4.10.3 Distractors A and B: Incorrect.

Neither option A nor B correctly solve the stated problem, and students select them at approximately the same rate (a median of 9.8% and 9.5%, respectively). During open-ended practice exams, there was no clear pattern in students’ explanations for selecting these choices.

4.11 Question 11: Testing BST Search

This question is designed to measure students’ ability to properly test their code to identify possible mistakes in the implementation. The learning goal associated with this question is #6, “Identify and remedy flaws in a data structure implementation that may cause its behavior to differ from the intended design.” Based on our interviews with students, testing appears to not always be taught extensively, so possible improvements to student performance on this question would be to teach students how to design good tests, perhaps discussing code coverage. The correct answer for this question is E. Figure 6(a) provides the frequency of student responses for each answer.

Fig. 6.

4.11.1 Correct Choice.

Option E searches for an element at each level of the tree, both in right and left subtrees, and for an item not in the tree. The median correct response rate is 71%.

4.11.2 Distractor B: Only Inspect Leaf Nodes.

The most popular incorrect answer (12% median response rate) is to search for each leaf node. Although this ensures the search is capable of going down either the right or left branches of the tree, it does not consider searches for the root, levels other than the leaves, nor an unsuccessful search.

4.11.3 Distractor D: Inspect All Except Leaf Nodes.

The second-most-popular incorrect answer (6% median response rate) searches for the root node, down the left and right subtrees, and for an element not present. However, it does not search for any elements in leaf nodes.

4.12 Question 12: Binary Tree Traversal

Question 12 asks students to recursively reason about information stored in a binary tree by computing a width function. The question corresponds to learning goal 4, “Design and modify data structures capable of insertion, deletion, search, and related operations,” and the correct response is option C. Figure 6(b) provides the frequency of student responses for each answer choice. The original, open-ended, version of this question asked students to write code for the entire function, but we found that doing so took students a considerable amount of time. As such, we reduced the question to only require code for the core recursive component.

A challenge in designing the recursive width function is that the maximal length path could either include the current node, or not, and the handling of these two cases looks very different. The case where the maximal path is located somewhere among the current node’s left or right descendants (and does not include the current node) is handled in the provided code lines numbered 1 and 2, respectively.

4.12.1 Correct Choice.

Option C correctly accounts for the case where the maximal length path does include the current node, by gathering the information from left and right children, and remembering to add 1 to account for the current node itself. A median of 46% of students correctly chose C.

4.12.2 Distractors B and D: Incorrect Gathering of Left and Right Path Lengths.

Distractors B and D attempt to gather the path lengths from the left and right children by recursively calling the width function. This is incorrect, because it will count paths that start and end in the subtrees, not extending up to the current node. Distractor D was the second-most commonly chosen distractor (median of 13%), because it shares the same structure as the correct answer: gathering information from left and right and adding 1 to account for the current node. A median of 7% of students chose distractor B. We suspect that students answering B or D believe that recursion is necessary to solve this question but struggle to do so properly.

4.12.3 Distractor E: Incorrectly Conflates Completed Paths and Continuing Paths.

Like Distractors B and D, Distractor E attempts to gather the path lengths from the left and right children by recursively calling the width function. The code correctly adds 1 to account for the current node, but incorrectly mixes the maximal paths of the subtrees with the heights of the subtrees, essentially creating new branches off of subtree paths from the current node. Students may be drawn to this option because they understand that implementing the width function requires finding the max of some options, while neglecting to observe that the problem asks them only to list the three options. Distractor E was the most commonly chosen distractor with a median of 21%.

4.13 Question 13: Abstract Interfaces

Question 13 asks students to compare two abstract interfaces and select the most appropriate one for solving a particular problem (recording e-mail addresses). It corresponds to learning goal 2, “Select appropriate abstract data types for use in a given application.” The BDSI assumes that students are already familiar with an abstract List interface (at the level described in the BDSI data structure definitions in Appendix A), whereas it does not assume that students have previously seen the Set interface. The correct answer is option C. Figure 7 illustrates how frequently students selected each answer choice.

Fig. 7.

4.13.1 Correct Choice.

The correct answer is option C—if the goal is to recognize unique addresses, then storing duplicates does not help, since a duplicate will cause the same address to be counted multiple times. Overall, students tend to do very well on this question, which has median correctness rate of 80%.

4.13.2 Distractor B.

Option B is the most commonly selected distractor, with a median rate of 7%. In interviews, students were drawn to this response because a list most closely matched the form of the data as it was provided (“a file...of all emails you’ve ever received in the order you received them”). Preservation of the original data was prioritized over efficiency for the task at hand.

4.13.3 Distractor E.

Selecting option E (median 4.9%) suggests that a student is having trouble understanding a new abstract interface and comparing it against a known interface. While we expect that many courses already do this, we recommend that instructors walk through examples in which they evaluate two or more abstract interfaces for fitness in solving a problem.

5 How to Use the BDSI

We now address how faculty can adopt the BDSI as an instrument to compare aggregate student knowledge of basic data structures across institutions, instructors, courses, and pedagogies.

The BDSI is available to faculty by joining the BDSI group online.¹ The format of the instrument is a PDF in two parts: the introductory material (including pseudocode guide) and the test itself. It should be printed for students to complete on paper. There are some diagrams in the BDSI, and an accessible, though not validated, version with text descriptions in place of the diagrams is also available for visually impaired students to complete on a device with screen reader software. Faculty should plan on giving students about one hour to complete the CI. It is important for faculty to give the entire CI, and not a subset of CI questions, as we have only demonstrated validity for the CI as a whole—results are not comparable for individual questions.

We recommend only using the BDSI as a post-test, as students will be unfamiliar with terminology prior to being taught basic data structures. Because CIs are not intended to be used for summative assessment of individual students, the BDSI should not be used on, or in place of, a comprehensive final exam, or as another graded assignment. Additionally, because CI questions cannot be altered, we caution against using the CI in any way that would incentivize students to either look for copies of the questions ahead of time or share questions with each other or publicly (i.e., using it as a graded exam). However, students may find it useful as a supervised “dress rehearsal” practice final exam, which faculty can use as a mutually beneficial opportunity to obtain student results.

6 Conclusion

In this work, we presented the results of administering the BDSI, a Concept Inventory for basic data structures, to 1,963 students across seven institutions. We discuss both the results and the implications of those results for instructors. We hope that this article will serve as a resource for instructors who administer the BDSI in their own courses, as well as inform instructors about common student misconceptions in basic data structures.

The value of a CI is not in the questions themselves, but in its capacity to drive pedagogical change. As such, along with reporting on our results, we have made the CI available to the community. Educational researchers may now use this assessment of student knowledge in their research, and instructors can use it to gauge strengths and weaknesses in their own curricula. Ultimately, we hope that this CI will be taken up and deployed by the CS education research community to further enhance student learning.

Acknowledgments

The authors gratefully acknowledge the contributions of the following collaborators: Meghan Allen, Owen Astrachan, John Bell, Darci Burdge, Stephanie Chasteen, Rajendra Chattergoon, John Donaldson, Maureen Doyle, John Glick, Paul Hilfinger, Josh Hug, Marina Langlois, Kevin Lin, Lisa Meeden, Zach Palmer, Mahika Phutane, Joe Politz, Kate Rydberg, Samar Sabie, Kate Sanders, Jacqueline Smith, Marty Stepp, Paramsothy Thananjeyan, Steve Wolfman, Anna Zaitsev, and Christine Zhou.

Footnote

https://groups.google.com/forum/#!forum/cs2-bdsi-concept-inventory.

Appendix

References

[1]

Wendy Adams and Carl Wieman. 2011. Development and validation of instruments to measure learning of expert-like thinking. Int. J. Sci. Educ. 33, 9 (2011), 1289–1312.