# Introducing tractability and the Knapsack problem in a classroom

In 2016, I was teaching Tractability to a Year 13 class in Christchurch (New Zealand). I decided to focus on the Knapsack Problem and some of the algorithms we can use to solve it. I developed an Unplugged activity for the students, and it was a massive success in that the students enjoyed it and gained a firm understanding of the Knapsack Problem, which was reflected in the reports they wrote for the assessment. I shared the activity with teachers at CS4HS in Christchurch at the end of the year; however, I never formally released it. Now, I have decided to because I think it'll be valuable for teachers wanting a new topic to teach.

The activity is best for high school students. They should already have some familiarity with algorithms and costs (i.e. sorting algorithms and searching algorithms). Some of the themes might not be suitable for primary school students.

Originally I used cards with images drawn by me, however some of the images were indecipherable (which the students had a good laugh about) so have all been replaced by ones found on Public Domain Vectors. Big thanks to this website for providing free images that can be used without attribution, or royalties. It was surprisingly difficult to find straightforward pictures!

Exciting news: this activity will also be going on CS Unplugged. I've written the instructions PDF, and am waiting for it to be reviewed and put up. I'll update this as soon as it's up!

## Doing the activity

The main activity (the 4 card sets) takes around 40 - 50 minutes. Put students in groups of 3 for it.

### Learning objectives

- Solve simple knapsack problems with a group.
- Investigate and explore why the knapsack problem is a challenging problem in computer science.
- Compose a greedy algorithm for the knapsack problem, using pseudocode.

The 3rd learning objective is optional, and will depend on how much time is available.

### Card set 1

Give each group the first set of cards.

Each card represents an **item** in a (fictional) RPG (Role-Play Game). Each item has a **weight in KG** and a **value in gold**. The character controlled by the player has found these items in the terrifying fire-breathing dragon's lair and wants to take them back to the shop at the surface to sell for gold. However, there's a big problem—they can only carry up to **6 KG's** at a time, otherwise, should they encounter the dragon, they'll never have a chance of outrunning it. Unfortunately, making multiple trips to the shop is not an option, as the dragon will be sure to ensure there won't be any return trips anytime soon.

Which items should be taken, and which should be left behind? Remember, we want to make as much gold as we can, and we cannot carry any more than 6 KG. There is only one of each item, and anything left behind cannot be claimed later.

Give the groups time to think about it and come up with the solution.

Most groups quickly see the best option. When asked how they knew which items to choose, they should explain that some of the items were worth more gold per KG of weight than others. We can calculate the gold to weight ratio by dividing gold by weight — the bigger the number, the better the item. For example, the **statue** weights 6 KG's, and it is worth 2800 gold. 2800 ÷ 6 = 466.67, which means the statue is worth 466.67 gold per KG. The other items are worth 833.33 gold/KG for the sword, 333.33 gold/KG for the money bag, 800 gold/KG for the mace, and 500 gold/KG for the boots. The 3 best value items, therefore, are the sword, money bag, and the boots. Taking these 3 items won't go over the weight limit, and therefore this is the solution.

It's also okay to divide the other way around (although less intuitive), by divide the weight by the gold amount. In this case, smaller numbers mean better items.

### Card set 2

Now that the groups have solved the first problem, and they think they know a good strategy to solve it, give them the second set of cards.

This time, the character is in a different dungeon, and can carry up to **30 KG's**.

As long as they are confident with math (particularly division and ratios), groups shouldn't have too much trouble solving this one if they are patient and work through it. The best way is to lay out the cards on a piece of paper (or give the students photocopies of the cards that they can write on) and then calculate and write down the ratio next to each card. Then, sort them in order and then see whether or not the next card can be included without going over the weight limit.

Explain that this process is an **algorithm**. If there's time, get students to write some pseudocode to express the algorithm. It should go something like this (the level of formality could vary, although is unimportant. The main idea is to be able to express the process as an algorithm).

### Card set 3

For (most) students who are unfamiliar with this type of puzzle, the algorithm above might seem infallible. In math and computer science, though, an algorithm can seem perfect but than fail unexpectedly. For this reason, we must always be able to prove (explain convincingly using logical reasoning) that an algorithm is correct before we conclude that it is. In this case, it turns out that it breaks sometimes.

Instead of telling the students this, though, leave them to figure it out by giving them this next set of cards. This time, the theme is a cargo plane that has room for another **5000 KG** and wants to maximize the value of what it is carrying.

Either students will quickly realize their algorithm is no longer working correctly, they'll give an incorrect solution that's over the weight limit (probably including all but the 4th package), or they'll get the correct solution (the 3 biggest items), but not realize that it didn't actually follow the algorithm. If they give an incorrect solution, point out their error. If they don't follow the algorithm, remind them to follow it step-by-step.

In the order shown in the picture above, the value rations are $4/KG, $7/KG, $8/KG, $1/KG, $5/KG.

So our algorithm starts by taking the 3rd item, and then the 2nd item, as these two are the best value. However, then it makes a "mistake" by taking the 5th item, leaving not enough room for the 1st item (the best). There is room for the 4th item though. It would have been better to have not taken the 5th item, instead of taking the larger 1st item even though its ratio isn't as good.

We have now discovered that what seemed like a robust algorithm, is not. So, what do we do now? It seems that sometimes it's better to leave something behind to be able to take something better later (insert cheesy life analogy here, I'll leave that up to teachers!) It's tough to know when exactly we should be leaving something behind though.

Students might suggest just taking the items worth the most, ignoring the weight. However, point out that this wouldn't have worked with the previous 2 sets of cards, wherein both cases, the most valuable item was left behind.

Something else they might suggest is just trying every combination. With just 5 cards, and a little human intuition mixed in, this is straightforward. However, will it work when we have more cards?

### Card set 4

For the final part of the activity, give them the largest set of cards. This time, the theme is investments. Each of the 20 investments (identified by a letter, I need to come up with some witty names, if anybody has any ideas, please post or email me!) has an amount that would need to be put forward to invest in it, and an expected pay off. With **$2000** to invest, the challenge now is to decide which investments to choose.

This one is challenging, although the students in the class were so engaged in the activity that they spent half an hour discussing strategies and moving cards around. Towards the end of that time, one group very impressively came up with the correct answer. Given that no strategy works in every case (well, mostly true, more on this further down), it was a lot of logic, reasoning, and teamwork that got them there.

I'm not going to give the solution though (email me if you really want it), something interesting about this problem is that even determining whether or not what you think is the solution, actually *is* the solution is a challenge in itself!

## Some follow up discussions and teaching points

We call this problem **the Knapsack Problem**. Precisely, the Knapsack Problem is where we have the following 3 things:

- Some items, each of which has a
**burden**, such as a weight, and a**value**, such as a profit. - A
**constraint**in the form of "the total added**burdens**must not exceed..." - A need to
**maximize the total added values**while staying below the**constraint**.

In the RPG theme, the **burden** of an item was its weight, the **value** was how much gold it was worth, and the constraint was to keep the total weight of all items at or below what the character could carry.

Likewise, in the cargo theme, the **burden** of an item was also its weight, the **value** was how much profit could be made by carrying that item, and the constraint was to keep the total weight at or below what the plane could carry.

The investment theme was a bit different. This time, the **burden** was the initial payment required for the investment, the **value** was the expected payoff of the investment, and the **constraint** was the amount available to invest (i.e. pay initial payments with).

* Important note*: In computer science literature and most resources, the

**burden**is referred to as the

**cost**. For this blog post, however, I've chosen to refer to it as the

**burden**for clarity. The big problem with the word

**cost**is that people associate it with money and therefore can become easily confused. Even worse, computer scientists

*also*use the word

**cost**when talking about how long an algorithm takes to run. In a classroom setting, I'd recommend making sure students know it's referred to formally as the "

**cost**", but using

**burden**at least early in the course to help prevent misconceptions from the language.

### How did we know the profits from investments in advance?

Short answer: we don't. Nobody knows for sure how much an investment will bring in. If it were possible to know, we'd be in this weird paradox where everybody could optimally invest, and thus nobody would make any money.

In practice, we can make our best guesses based on the information available, and previous experience (this is what financial experts do!). Needing to come up with weights and burdens is an excellent example of the knapsack problem in the real world, sometimes we won't even know the exact value and burden of each item, so instead have to guess them.

### But surely we can find the best solution for any set of cards, right?

We could try every possible combination of cards. With 5 cards, this is easy. There are "only" 32 possible ways (that is a lot for just 5 cards when you think about it!)

- 1 way using all 5 cards.
- 5 ways using 4 cards.
- 10 ways of using 3 cards.
- 10 ways of using 2 cards.
- 5 ways of using 1 card.
- 1 way of using no cards.

Then we need to look at each of those 32 ways and see if it's below the constraint. If it is, we then look at how much value it is worth. If it's worth more than any we have seen so far, we keep track of it while we check through the rest.

So, that means we can solve it, right? Well, not quite.

With 6 cards, there are 64 ways, and with 7 there's 128. How many are there for 8 cards?

Calculate how much it would be for 100 cards. What about 1000? What if every atom in the universe was a computer and each could check a trillion combinations every second? How long would it take?

The answer? Longer than the lifetime of the universe. Crazy huh? It's clear that there is no way this algorithm will ever be useful on even a medium-sized number of items.

### Why does the number of combinations double every time?

As long as you are familiar with binary numbers, here's an easy way to understand.

Put the cards in a line. Now, consider every combination of cards, by turning over cards that aren't a part of the current combination. A turned over card could represent a "0", and a faceup card could represent a "1". It's binary numbers, with the same number of bits as cards. We know that every time we add a bit, we can make twice as many numbers with the bits. It's the same thing here.

### So now what?

Many algorithms will get * close* to the correct answer. The one that worked for the first two sets of cards is known as a

**greedy algorithm**. A greedy algorithm is an algorithm that makes the best decision right now. In this case, that means taking the best value card. It does not plan ahead to see if it would have been better to make a worse decision now for a bigger payoff later.

For the most part, the greedy algorithm will be "close enough" in the real world. Take the investments, for example. It doesn't matter too much if we don't find the "best" possible solution, as long as we find a good one, we'll probably be okay.

There are also other algorithms that give *close* solutions. Most are more difficult to understand, although one interesting one I've heard a teacher talking about is to pick a few combinations, entirely at random. Then, score those combinations and go with the best one. Statistically, if you pick a big enough sample (e.g. 100 combinations for 20 cards), then you'll "probably" have a reasonable solution amongst them.

It turns out that for *some* examples of the knapsack problem there is an algorithm that we can use. If the burdens are all whole numbers, and the constraint isn't too high, then there is something called a dynamic programming algorithm. I'm not going to go into that here, but it's something competent students might like to investigate.

However, for the most part, the knapsack problem is challenging.

### What's this about not being able to know if a suspected solution IS the solution?

There are some problems (puzzles) in computer science where it's challenging (takes a long time) to find the real solution, but once we have it, it is very, very easy to verify that it is indeed the solution. The implication of this is that if somebody claims they have the solution, it's straightforward to know whether or not they do. An excellent example of this is the factoring of a big number that is the product of 2 prime numbers.

This number here is the product of 2 prime numbers. It is all but impossible for you to figure out what those 2 prime numbers are though. You can't do much better than trying all pairs of prime numbers, and there's a **lot** of them! However, if you email me the solution, I can easily verify that you are correct by multiplying the 2 primes you give me together. Therefore, the problem is challenging to solve, but easy to verify the solution of. It is also straightforward to verify that this number is not a prime number itself.

6880434319064745378373 681849502678999216857994 996192009234084060974709 4864427766667114531654 953539241079541113087282 679155473462935461155441 6188195568121168224105 166202582925095929091849 982646329492886224917013 1711644188172712523146 18684490106760539

With the knapsack problem though, there's no simple way to check your answer. There are 2 different ways we can phrase the knapsack problem:

- Can we get at least X value (where X is any number you want) without going over the constraint?
- What is the best value we can get without going over the constraint?

We call the first a **decision problem**, and the second an **optimization problem**. Both of these are difficult in that there is no great (doesn't take lifetimes of the universe) algorithm to solve them in every case. Determining whether or not a solution to the 2nd question is correct is equivalent to needing to answer the first question. The only way we could verify the solution would be to check all other values between the one you have given and the total value sum of ALL the items.

### Tractable vs intractable

The Knapsack Problem is an example of an **intractable problem** in computer science. **Intractable** means that the only algorithms (that we know of) that get the best solution in every case are too slow to be useful.

**Algorithms**can be split into 2 groups**tractable**and**intractable**.- Most algorithms the students have previously seen: linear search, binary search, insertion sort, selection sort, quicksort, and merge sort are
**tractable**. They are fast enough to be useful in practice. There's no need to use selection sort or insertion sort,*but*if we didn't know anything faster, they'd be acceptable. - Algorithms such as trying every combination for the knapsack problem, permutation sort, or trying all prime numbers to find the 2 that were multiplied together, are
**intractable**. While they are okay when the problem size is small (not many items to sort, not many items to choose between, or the number to find the factors of is small), they quickly become unusable. By the time you get to 100 or 1000 items, or 100's of digits in a number to be factored, they become unusable as no matter how many computers, and how fast they are (limited by the speed of light), they would take billions upon billions of years to run. Aside from the fact that the universe only has a few billion years left, the person running the algorithm will be gone long before the results are ready. **Problems**can also be split into**tractable**and**intractable**. A problem is tractable if it has*at least one*tractable algorithm that can be used to get the optimal (or correct) solution in every case. A problem is intractable if all of the algorithms are intractable.- Searching and sorting are tractable. The knapsack problem is not.

### Real-world examples of the Knapsack problem

Computer scientists and mathematicians are very interested in the knapsack problem because it is used for computer security. The prime number factoring problem is the most widely used if somebody did find an algorithm (unlikely) for it that is tractable, then the world would be in chaos as internet security would be easily breakable.

A second reason is logistics. While the knapsack problem is used for some security applications, it's more likely to be seen (or variants of it) in logistics problems. A company in Christchurch called Telogis makes its fortune by finding better and better solutions for logistical problems. By packing more efficiently, taking more efficient routes, or making wiser investments, companies can make a fortune.

Some logistics examples of the knapsack problem include:

- Loading the most valuable cargo onto a plane. Each item has a weight (burden), a value, and the plane has a limited loading weight (constraint).
- Choosing companies to invest in. Each company has a cost per share (burden), and you can calculate or estimate an anticipated pay off for each share (value). The total amount you have available to invest is limited (constraint).
- Choosing which exam questions to answer on a test. Each question has a number of points (value), you can try to estimate how long you think each question will take (burden), and there is a limited amount of time the test goes for (constraint).

Some closely related problems include:

**Bin packing problem**: putting items of different shapes and sizes into the back of a truck so that as little space as possible is wasted.**Stock cutting problem**: deciding the best way to cut shapes out of a material such as fabric or metal sheets, to minimize the wastage.**Guillotine problem**: deciding the best way to cut up sheets of metal so that wastage is minimized, and they can be cut using a guillotine. Most teachers will also have had to solve this problem with paper cutouts.

All of these other problems are also intractable, i.e. the only algorithms guaranteed to find the best solution take an exponential amount of time. While it is believed that most intractable problems are intractable (that is, no tractable algorithm for them will *ever* be found on regular computers, quantum is a bit different), nobody has been able to **prove** it. Proving it is the so-called P vs NP problem.

## Making the activity work well

I have done this activity a couple of times, with a class of high school students and with a group of teachers at a conference. It went great both times, so here are my tips. I'm sure that many teachers could make it even better, though!

### Students should know a little about algorithm costs already

If they don't remember (or didn't even study) sorting and searching algorithms, it's best to spend a lesson or 2 revising it. The key ideas that students need to have grasped are:

- That a
**problem**in computer science (and math) is something to be solved. - That many problems are solved with
**algorithms**. - Some algorithms are faster than others.
- That the speed of an algorithm is measured by looking at how it slows down as the size of the problem (the number of items to be processed) is increased.

For example, searching is a **problem** that can be solved with several different **algorithms**, the 2 most well-known being **linear search** and **binary search**. When the number of items we need to search increases, linear search takes longer compared to **binary search**. In the case of sorting, the problem is **sorting**, and there are even more well-known algorithms for it. Some of the faster ones include **quicksort** and **merge sort**, and some of the slower ones include **selection sort** and **insertion sort**.

It's also worth introducing **permutation sort** (also known as "bogosort", and lots of other exciting names you'll find on Wikipedia). It generates random orderings of the items to be sorted and then checks whether or not they're sorted. If they're not, it generates another random ordering. On average, we'll need to check half of the orderings (a bit worse if we could generate the same incorrect ordering more than once, but that's not important to understanding).

- With 1 item, there's only
**1**way of ordering it. The algorithm works first try, but who cares about sorting 1 item? - With 2 items, there are
**2**ways. Again, not so bad. - With 3 items, there are
**6**. That was a fast increase! - With 4 items, there are
**24**ways. - With 5 items, there are
**120**ways. - With 6 items, there are
**720**ways.

This algorithm makes **selection sort** and **insertion sort** seem fast!

For sorting, we would never use permutation sort; many other algorithms do the job. However, for other problems, such as the knapsack problem introduced here, there are *no* alternatives.

The class I was working with needed revision in this area. I got them to read the material in the Computer Science Field Guide, in particular watching the video and trying the searching and sorting widgets. I then got them to run some sorting algorithm programs, collect data, and graph it. This was enough to remind them of the key ideas, and took about 1 hour of class time for those who finished the activity as homework, and 2 hours for those who didn't.

### The themes matter, for student engagement and understanding the bigger picture

Almost every high school student has played at least one dungeon/ fantasy (or related) RPG. Collecting various weapons, armour and miscellaneous items that serve no purpose but to trade in for gold is common in these games. Most of these games also have an inventory weight limit. The task of having to choose which items to take and which to leave behind, based on weight and payoff, is one that players of these games will encounter. So, it makes sense to use this theme as a starting point for the first 2 parts of the activity. Students like being able to think about video games while in class!

However, we have to be careful to not mislead the students into believing that the knapsack problem is only about RPG's. Many digital technology teachers in New Zealand who have taught level 3 computer science know all too well about the students who think that the travelling salesman problem is only relevant to somebody going out to check lots of craypots on the water. We most certainly don't want a repeat of this lack of generalization!

So, to address that issue, the last 2 parts use different themes. Part 3 uses a very simplified version of a logistics challenge but still using weight as the constraint, and then part 4 moves out of logistics and weight and into finance.

The result of this was that students thought about and investigated other possible situations the problem came up in and wrote about those in their reports.

In saying that, while most teachers (including all of the ones at the conference) have thought the RPG theme is okay, a few have had concerns about the appropriateness in schools. Specifically, that we wouldn't allow students to bring these items to school, so perhaps we shouldn't be discussing them in the classroom. My feeling on it (and I'm not a teacher, so I accept my opinion has limited weight here) is that I'm dubious that it's an issue. The very reason I chose this theme is that 90%+ of students are familiar with it. Dungeon-themed/ fantasy RPG's are one of the most popular genres of video games ever. Moreover, it's also important to emphasize that this activity is aimed at senior high school students (or academically capable and mature juniors).

I have thought carefully about each item included in the RPG theme though, ensuring that all weapons look fictional, and in particular, I did not include any guns (other than the bow and arrow) as I don't feel that could ever be appropriate, even if they looked fake.

I am keen on any feedback though about this issue, and about if any items are still a problem, and if so, how they could be changed without losing the overall theme.

### For classes where real-world examples are less important

I realized afterwards that while the investment example was great for my class, it might not be so great where the kids are less self-motivated, younger, or have a limited understanding of finances. As an alternative to the investment card set, I made an additional RPG challenge.

I don't recommend doing both sets of cards for part 4, as it will take a long time, and I suspect the repetition will probably be of limited value to student understanding. It might be useful to use one in class and the other as an individual homework activity though, to check that all students understood.

### Do all parts of the activity, one by one, in the correct order

Each part of the activity has a clear goal and is designed to build on the previous part.

- Solve the problem and describe a strategy that can be used to solve it.
- Formulate the strategy as an algorithm and apply it to a bigger example.
- Discover that the algorithm actually
*doesn't*work on every case. - Experiment with a large example and attempt to come up with algorithms and a solution.

I tried to skip part 2 for some groups at the teacher conference to save time. This did not work well, as some teachers did not understand the algorithm they were supposed to be following in part 3. They instead used human intuition to spot a solution trivially and did not understand how this part showed that coming up with a good algorithm to solve the problem could be challenging. By the time they got to part 4, they were so confused at what the point was that they didn't engage in the problem-solving.

### Leave enough time for the activity

This activity alone will take around 40 - 50 minutes. Students will learn a lot from it, so it is worth allocating the time needed. Rushing students through the problem-solving tasks, or worse, skipping some parts, is likely to lead to critical gaps in understanding.

### Make it a group work activity

I put the students in groups because the activity is challenging, and I thought students might struggle alone. By putting them in groups, this allowed them to discuss what they were doing, and consider different options. Indeed, this is what happened, and there was some fascinating discussion in the room. The students seemed to enjoy this approach too, and it meant that they remained engaged even when unsure how to solve the problem.

In the real world, challenging problems are generally solved by groups of people. Most of the problems that can feasibly be solved by a lone individual have been solved. The problems left are very advanced and require lots of different ways of thinking to tackle them. This is the world high school students are going into, and so being able to problem solve as part of a group is essential. This activity is a perfect opportunity to practice.

### Don't help students more than needed

While it's important to ensure each group ultimately got the understanding needed from parts 1, 2, and 3, it's also important to let them discover the ideas and not just tell them.

### Make the last set of cards a competition

Disclaimer: I have not tested this final idea, but I'm sure it'd be highly effective!

This only applies to the final card set; either investment theme or the big RPG challenge.

If students need an incentive to stay focused and to work quickly and effectively, make it a competition. Offer a prize to the group that has the best solution at the end. This could be done by giving each group an extra envelope, and when the time is up, telling them to put the cards that are part of their solution inside it. Then, go through each solution checking if it’s within the weight limit, and if it is, then add up the value. Whoever has the best value, and was within the weight limit wins. This will also emphasize the idea that sometimes we need to go with the best solution we can find, even if there actually was a better solution.

## Follow up learning

After working through the follow-up material and discussions that were after the activity, students should be able to:

- Explain what the knapsack problem is, using an example in their explanation.
- Define key terms such as tractable algorithm, tractable problem, intractable algorithm, intractable problem, greedy algorithm, and brute force algorithm.
- List examples for each of the key terms.
- Develop their own examples of knapsack (or related) problems.
- Explain why problems such as the knapsack problem are of great interest, with a focus on logistics and cryptography.

Academic interest, puzzles, and making them learn about it in a classroom don't count as reasons they are of great importance!

### Going even further, and ideas for writing advanced reports

Students who are keen to learn more should then do their own reading and learning and do some of the following:

- Demonstrate, using an example and some straightforward math, why intractable algorithms are so slow.
- Argue that no matter how fast computers become (ignoring quantum for now), they'll never be fast enough to solve the knapsack problem with a large number of items. This could be done by using very "generous" estimates, such as saying every atom in the universe is a computer, running at the speed of light.
- Implementing the brute force algorithm, the greedy algorithm, and a probability-based algorithm in a programming language for the knapsack (or similar) problem and comparing them.
- Research and report on a case study of a difficult problem in logistics and how it is addressed.
- Discuss the challenging security implications of a fast algorithm being found for the knapsack or a related problem.

Relevant areas worth investigating for very capable (math scholarship level ability) students include:

- Dynamic programming algorithms, such as the one that can be used in some cases of the knapsack problem.
- The P vs NP problem.
- The impact of quantum computing on intractable problems.

These final suggestions are more intended for steering academically capable students in the right direction. They are very advanced topics (the dynamic programming one is perhaps the easier of the 3, and is very relevant for those interested in entering programming contests in university), and only a small handful of students will go near them.

A student in the class I taught actually looked into the dynamic programming algorithm and was able to understand it. He got a well-deserved excellence grade for his externally marked report.