This is a set of practice questions. Feel free to work on these questions, and to ask questions if you face any difficulties in coming up with answers. While collaboration with others on the graded homework is strictly prohibited, you are free to work on these practice questions together with other people.
Since they are practice questions, you do not have to, and should not, submit answers to these questions on Blackboard. These questions will not be considered in any way towards your course grade. At the same time, based on the instructor's experience, individuals spending substantial and systematic effort in answering these practice questions by themselves tend to significantly improve their overall class performance.
load address2 R2
load address1 R1
if R1 6
addi 20 R1 R3
goto 7
addi 10 R1 R3
addi 5 R2 R4
store R4 address10
addi 30 R2 R5
store R5 address11
add R3 R2 R8
store R8 address12
address1
contains integer 0. Show how the instructions are executed through the pipeline step-by-step, until the program finishes running. Use a table of the same format as described in task 1 of the graded assignment.
address10, address 11, address12
at the end of execution) will always be identical to what it would be without reordering the instructions, for any possible initial contents of address1, address2
.
address1
contains integer 0.
What is the purpose of step 2 in the list of Sec. 2.1.2? What would happen if this step were omitted?
To compete with the newly invented printing press, a medieval monastery decided to mass-produce handwritten paperback books by assembling a vast number of scribes in a huge hall. The head monk would then call out the first word of the book to be produced and all the scribes would copy it down. Then the head monk would call out the second word and all the scribes would copy it down. This process was repeated until the entire book had been read aloud and copied. Which of the parallel processor systems discussed in Sec. 2.1.6 does this system resemble most closely?
Consider the dual five-stage pipeline shown on Figure 2-5. For processing an if-else statement, two architectural choices are available:
A 2 million pixel grayscale image (commonly referred to as a black-and-white image) is represented as an array of 2 million 8-bit integers (with values between 0 and 255). Suppose that we are performing an image brightening operation, that adds a constant value C to each pixel (if the result is greater than 255, 255 is used instead of the actual result).
Why is such an image operation a good fit for a SIMD processor or a vector processor? Explain, separately for the SIMD processor case and the vector processor case, what aspects of this image operation these processors utilize to obtain increased performance compared to a regular CPU.
Suppose that all this data is already loaded in memory. You have a 256-CPU machine, and the machine is executing a program (explicitly optimized to use all these processors) to compute, for each of the 2000 arrays, the average of the values in that array. What do you expect to be the performance bottleneck? How much faster would you expect the program to run, compared to running on a single CPU on the same machine (assuming no other program runs on the machine at the same time)? While you are not expected to compute a specific number for how much faster the parallel execution runs, you should be able to provide one or more examples of plausible numbers, and justify why these numbers are plausible.
changeBase(int N, int b)
that converts a number N to base-b representation. You can assume that 2 <= b <= 9. For example:
changeBase(1452, 2)
returns 10110101100, which is the binary (base-2) representation of 1452.
changeBase(1452, 3)
returns 1222210, which is the ternary (base-3) representation of 1452.
Back to the list of assignments.