CSE 2312 - Assignments

Not Graded Practice for Assignment 2

This is a set of practice questions. Feel free to work on these questions, and to ask questions if you face any difficulties in coming up with answers. While collaboration with others on the graded homework is strictly prohibited, you are free to work on these practice questions together with other people.

Since they are practice questions, you do not have to, and should not, submit answers to these questions on Blackboard. These questions will not be considered in any way towards your course grade. At the same time, based on the instructor's experience, individuals spending substantial and systematic effort in answering these practice questions by themselves tend to significantly improve their overall class performance.

Practice Question 1

Assume that we have the same assembly language as defined in Task 1 of the graded assignment. Consider the following program: Consider the following program:

Line 1: load address2 R2
Line 2: load address1 R1
Line 3: if R1 6
Line 4: addi 20 R1 R3
Line 5: goto 7
Line 6: addi 10 R1 R3
Line 7: addi 5 R2 R4
Line 8: store R4 address10
Line 9: addi 30 R2 R5
Line 10: store R5 address11
Line 11: add R3 R2 R8
Line 12: store R8 address12

Assume that, when the program starts executing, address1 contains integer 0. Show how the instructions are executed through the pipeline step-by-step, until the program finishes running. Use a table of the same format as described in task 1 of the graded assignment.

Practice Question 2

For the program in practice question 1, rearrange the order of instructions, so that:

The outcome of the program (i.e., the contents of address10, address 11, address12 at the end of execution) will always be identical to what it would be without reordering the instructions, for any possible initial contents of address1, address2.
The pipeline is utilized as efficiently as possible.

Multiple solutions are possible, that utilize the pipeline more efficiently than when we do not change the order of execution. The better your solution utilizes the pipeline, the more preferable it is.

Practice Question 3

For your solution from practice question 2, show (using a table formatted the same way as in Task 1) how this solution is executed through the pipeline, assuming (as in Task 1) that when the program starts executing, address1 contains integer 0.

Practice Question 4

(This is Problem 2 from Chapter 2 of the textbook).

What is the purpose of step 2 in the list of Sec. 2.1.2? What would happen if this step were omitted?

Practice Question 5

(This is Problem 5 from Chapter 2 of the textbook).

To compete with the newly invented printing press, a medieval monastery decided to mass-produce handwritten paperback books by assembling a vast number of scribes in a huge hall. The head monk would then call out the first word of the book to be produced and all the scribes would copy it down. Then the head monk would call out the second word and all the scribes would copy it down. This process was repeated until the entire book had been read aloud and copied. Which of the parallel processor systems discussed in Sec. 2.1.6 does this system resemble most closely?

Practice Question 6

Consider the dual five-stage pipeline shown on Figure 2-5. For processing an if-else statement, two architectural choices are available:

One approach is to execute both the if branch and the else branch in parallel (one branch on each pipeline), and then cancel the results of the invalid branch.
Another approach is branch prediction, where the CPU guesses whether the if or the else branch will get executed, and processes that in both pipelines. When the prediction is wrong, we cancel the results and start from scratch.

Discuss the pros and cons of each approach.

Practice Question 7

A 2 million pixel grayscale image (commonly referred to as a black-and-white image) is represented as an array of 2 million 8-bit integers (with values between 0 and 255). Suppose that we are performing an image brightening operation, that adds a constant value C to each pixel (if the result is greater than 255, 255 is used instead of the actual result).

Why is such an image operation a good fit for a SIMD processor or a vector processor? Explain, separately for the SIMD processor case and the vector processor case, what aspects of this image operation these processors utilize to obtain increased performance compared to a regular CPU.

Practice Question 8

You have observations from 2000 meteorological stations, located all over the world. Each observation is an array of 100 32-bit integers, describing the highest temperature each day for 100 consecutive days.

Suppose that all this data is already loaded in memory. You have a 256-CPU machine, and the machine is executing a program (explicitly optimized to use all these processors) to compute, for each of the 2000 arrays, the average of the values in that array. What do you expect to be the performance bottleneck? How much faster would you expect the program to run, compared to running on a single CPU on the same machine (assuming no other program runs on the machine at the same time)? While you are not expected to compute a specific number for how much faster the parallel execution runs, you should be able to provide one or more examples of plausible numbers, and justify why these numbers are plausible.

Practice Question 9

Write number 2342 in ternary, base-5, and base-6 representation.

Practice Question 10

Write pseudocode (or real code) for a function changeBase(int N, int b)that converts a number N to base-b representation. You can assume that 2 <= b <= 9. For example:

changeBase(1452, 2) returns 10110101100, which is the binary (base-2) representation of 1452.
changeBase(1452, 3) returns 1222210, which is the ternary (base-3) representation of 1452.

Back to the list of assignments.