CSE 4311 - Assignments - Assignment 2

List of assignment due dates.

The assignment should be submitted via Canvas. Submit a file called assignment2.zip, containing the following files:

answers.pdf, for your answers to the written tasks, and for the output that the programming task asks you to include. Only PDF files will be accepted. All text should be typed, and if any figures are present they should be computer-generated. Scans of handwriten answers will NOT be accepted.
All files containing your Python code for the programming tasks. Feel free to implement your solutions in multiple files, and to reuse code from those files in multiple tasks, as long as the executable files follow the specified naming conventions.

These naming conventions are mandatory, non-adherence to these specifications can incur a penalty of up to 20 points.

Your name and UTA ID number should appear on the top line of both documents.

Task 1 (20 points, programming)

File perceptron_inference_base.py contains incomplete code that implements the inference module for a perceptron. When completed, the code creates a perceptron with specified weights, and will compute the output of that perceptron given an input vector.

To complete that code, you must create a file called perceptron_inference_solution.py, where you implement the following Python function:

(a, z) = perceptron_inference(b, w, activation, input_vector)

The function arguments provide the following information:

b: a real number (float) specifying the bias weight for the perceptron.
w: a column vector specifying the weights w of the perceptron. It is a 2D numpy array with a single column.
activation: it is a string that specifies the activation function. The value is either "step" or "sigmoid".
input_vector: a column vector specifying the input to the perceptron. It is a 2D numpy array with a single column.

The function returns a tuple of two values:

a: a real number (float), equal to the result of step 1 (dot product weights and input, plus bias).
z: a real number (float), equal to the final output of the perceptron, obtained by applying the activation function to the result of step 1.

You can refer to slide 8 from the introductory neural network slides for the full definitions of a and z.

The code provided in perceptron_inference_base.py defines some test cases by setting the function arguments to appropriate values. Note that, in perceptron_inference_base.py, bias and weight values are read from a file, and the values for the input vector are read from another file. For the weights, you can use these example files: weights1.txt and weights2.txt. Note that the weights in weights1.txt specify the AND perceptron as discussed in the lecture slides. You are free to create more files in the same format, to test your code with more cases.

Examples of files specifying input vectors are the following:

input1_00.txt, input1_01.txt, and input1_11.txt. These input files can be used together with weight file weights1.txt, as the dimensionalities match.
input2a.txt and input2b.txt. These input files can be used together with weight file weights2.txt, as the dimensionalities match.

To test your code with different test cases, you can edit the top of file perceptron_inference_base.py to modify the values of variables weights_file, input file, activation_string. You are also free, and encouraged, to create your own files for weights and input vectors.

Output

You do not need to write any code to produce program output. The existing code already handles that. When completed, the code in perceptron_inference_base.py prints the values of:

a, which denotes the result of the first step (dot product of weights and inputs, plus bias weight).
z, which denotes the result of the second step (activation function applied to the result of the first step).

These are some examples of output from my solution:

When weights_file = "weights1.txt", input_file = "input1_00.txt", activation_string = "step":
```
a = -1.5000
z = 0.0000
```
When weights_file = "weights1.txt", input_file = "input1_11.txt", activation_string = "step":
```
a = 0.5000
z = 1.0000
```
When weights_file = "weights1.txt", input_file = "input1_11.txt", activation_string = "sigmoid":
```
a = 0.5000
z = 0.6225
```
When weights_file = "weights2.txt", input_file = "input2a.txt", activation_string = "sigmoid":
```
a = -8.1900
z = 0.0003
```

Task 2 (20 points, programming)

File nn_inference_base.py contains incomplete code that implements the inference module for a simple neural network. When completed, the code creates a layered feed-forward neural network with a specified number of layers, number of units in each layer, and weights. Your code will then compute the output of that neural network for a specified input vector. All layers in your neural network are fully connected.

To complete that code, you must create a file called nn_inference_solution.py, where you implement the following Python function:

(a_values, z_values) = nn_inference(layers, units, biases, weights, activation, input_vector)

The function arguments provide the following information:

layers, units, biases, weights: these arguments specify the number of layers, number of units per layer, and the weights b and w of each unit in each layer. The format of these arguments is documented in the nn_load function defined in file nn_load.py. The nn_load function takes care of reading this information from a file, and storing that information appropriately in the layers, units, biases, weights values that it returns.
activation: it is a string that specifies the activation function. The value is either "step" or "sigmoid".
input_vector: a column vector specifying the input to the perceptron. It is a 2D numpy array with a single column.

The function returns tuple (a_values, z_values), defined as following:

a_values is a list. a_values[i] is the vector (numpy array) of outputs of step 1 (dot products plus biases) of all units in the i-th layer.
a_values[0] should be None, since in our notation there is no layer 0.
a_values[1] should be None, because layer 1 is the input layer, and no dot products are computed there.
z_values is a list. z_values[i] is the vector (numpy array) of the outputs of all units in the i-th layer.
z_values[0] should be None, since in our notation there is no layer 0.
z_values[1] should be equal to the input vector, since layer 1 is the input layer.

As mentioned above, the code that is provided (files nn_inference_base.py and nn_load.py) loads values for parameters layers, units, biases, weights from a text file. Examples of such files are nn_xor.txt and nn2.txt. Note that the weights in nn_xor.txt specify the XOR network as discussed in the lecture slides.

Although it is not necessary, you are encouraged to read the format specifications that these text files have to follow, so that you can create your own files to test with. To minimize the amount of time you spend writing code that deals with these files, you are provided with function nn_load in file nn_load.py. Please refer to the documentation for that function to understand its return values. The code provided in nn_inference_base.py uses the nn_load function to create a test case for your program. You are encouraged to create and use your own test cases.

Input vectors are also read from a file. The files storing input vectors contain one number per line, as in Task 1. Examples of such files are

input1_00.txt, input1_01.txt, and input1_11.txt. These input files can be used together with weight file nn_xor.txt, as the dimensionalities match.
input2a.txt and input2b.txt. These input files can be used together with weight file nn2.txt, as the dimensionalities match.

To test your code with different test cases, you can edit file nn_inference_base.py to modify the values of variables nn_file, input file, activation_string.

Output

You do not need to write any code to produce program output. The existing code already handles that. When completed, the code in perceptron_inference_base.py prints the values of the a values and z values of all units in all layers. These values are printed layer by layer, starting from the input layer, and finishing with the output layer. All values are rounded to four decimal digits.

These are some examples of output from my solution.

When nn_file = "nn_xor.txt", input_file = "input1_00.txt", activation_string = "step":

Layer 1, no alpha values (input layer).
Layer 1, z values: [   0.0000   0.0000 ]

Layer 2, a values: [  -0.5000  -1.5000 ]
Layer 2, z values: [   0.0000   0.0000 ]

Layer 3, a values: [  -0.5000 ]
Layer 3, z values: [   0.0000 ]

When nn_file = "nn_xor.txt", input_file = "input1_01.txt", activation_string = "step":

Layer 1, no alpha values (input layer).
Layer 1, z values: [   0.0000   1.0000 ]

Layer 2, a values: [   0.5000  -0.5000 ]
Layer 2, z values: [   1.0000   0.0000 ]

Layer 3, a values: [   0.5000 ]
Layer 3, z values: [   1.0000 ]

When nn_file = "nn_xor.txt", input_file = "input1_01.txt", activation_string = "sigmoid":

Layer 1, no alpha values (input layer).
Layer 1, z values: [   0.0000   1.0000 ]

Layer 2, a values: [   0.5000  -0.5000 ]
Layer 2, z values: [   0.6225   0.3775 ]

Layer 3, a values: [  -0.2551 ]
Layer 3, z values: [   0.4366 ]

When nn_file = "nn2.txt", input_file = "input2a.txt", activation_string = "step":

Layer 1, no alpha values (input layer).
Layer 1, z values: [  -0.3000   0.6000   0.7000   0.4000 ]

Layer 2, a values: [  -0.0620  -0.2270 ]
Layer 2, z values: [   0.0000   0.0000 ]

Layer 3, a values: [  -0.2500  -0.2000   0.1000 ]
Layer 3, z values: [   0.0000   0.0000   1.0000 ]

When nn_file = "nn2.txt", input_file = "input2b.txt", activation_string = "sigmoid":

Layer 1, no alpha values (input layer).
Layer 1, z values: [   0.3000   0.1500  -0.1000   0.2000 ]

Layer 2, a values: [   0.2940  -0.2845 ]
Layer 2, z values: [   0.5730   0.4294 ]

Layer 3, a values: [  -0.1998   0.0177   0.0198 ]
Layer 3, z values: [   0.4502   0.5044   0.5050 ]

Task 3 (10 points).

Consider function foo(x,y) = sin(cos(x)+sin(2y)).

Part a (4 points): what is the partial derivative of foo with respect to x? Express your solution as a function of x and y, and show how you derive it.
Part b (4 points): what is the partial derivative of foo with respect to y? Express your solution as a function of x and y, and show how you derive it.
Part c (2 points): What is the gradient of foo? Express the gradient as a function of x and y, that outputs a 2-dimensional vector.

The chain rule can be useful in computing partial derivatives here. The slides on training perceptrons and training neural networks show many examples of using the chain rule.

Task 4 (20 points, programming).

File gradient_descent_base.py contains incomplete code that implements gradient descent as shown in the lecture slides on optimization. When completed, the code applies gradient descent to two functions:

f1(x, y) = x² + 2y² - 600x - 800y + xy + 50. This function was used in the slides on optimization as an example for gradient descent.
foo(x,y) = sin(cos(x)+sin(2y)). This is the function used in Task 3 above.

To complete that code, you must create a file called gradient_descent_solution.py, where you implement the following Python functions:

```
 (dfdx, dfdy) = foo_gradient(x, y)
```
This function simply implements in code your solution to Task 3. The input arguments x and y are two real numbers (floats). The output is a tuple of two floats, representing the gradient vector of foo (partial derivative with respect to x, and partial derivative with respect to y) for the two input values.
```
 (x_min, y_min, history) = gradient_descent(function, gradient, x1, y1, eta, epsilon)
```
This function implements the pseudocode on slide 58 of the slides on optimization. Note that here, in addition to function, we also need to pass in as argument the gradient of function, since otherwise Python would not know how to compute that gradient.
The function returns (x_min, y_min, history). Values x_min, y_min specify the local minimum that was found. Return value history is a list specifying all the points, starting with (x1, y1), that were visited during the gradient descent process. To be specific, history[0] = (x1, y1), each element of history is a tuple of two numbers, and the last element of history is (x_min, y_min). This history is just useful for visualization and debugging. The code provided in gradient_descent_base.py defines and uses function print_history, to print out that history.

Output

The code provided in gradient_descent_base.py, when completed, will print out the history result of gradient descent for two test cases. The first test case is:

(x_min, y_min, history) = gradient_descent(f1, f1_gradient, 300, 150, 1, 0.001)

For that test case, the following output should be printed:

t =  1, xt = 300.0000, yt = 150.0000, function(xt, yt) = -119950.0000
t =  2, xt = 225.0000, yt = 100.0000, function(xt, yt) = -121825.0000
t =  3, xt = 237.5000, yt = 143.7500, function(xt, yt) = -125575.0000
t =  4, xt = 232.8125, yt = 140.6250, function(xt, yt) = -125645.8008
t =  5, xt = 231.2500, yt = 141.7969, function(xt, yt) = -125657.7026
t =  6, xt = 230.1758, yt = 142.1875, function(xt, yt) = -125661.8893
t =  7, xt = 229.5410, yt = 142.4561, function(xt, yt) = -125663.4128
t =  8, xt = 229.1565, yt = 142.6147, function(xt, yt) = -125663.9677
t =  9, xt = 228.9246, yt = 142.7109, function(xt, yt) = -125664.1699
t = 10, xt = 228.7846, yt = 142.7689, function(xt, yt) = -125664.2435
t = 11, xt = 228.7001, yt = 142.8039, function(xt, yt) = -125664.2703
t = 12, xt = 228.6491, yt = 142.8250, function(xt, yt) = -125664.2801
t = 13, xt = 228.6183, yt = 142.8377, function(xt, yt) = -125664.2837
t = 14, xt = 228.5997, yt = 142.8454, function(xt, yt) = -125664.2850
t = 15, xt = 228.5885, yt = 142.8501, function(xt, yt) = -125664.2854
t = 16, xt = 228.5817, yt = 142.8529, function(xt, yt) = -125664.2856
t = 17, xt = 228.5776, yt = 142.8546, function(xt, yt) = -125664.2857
t = 18, xt = 228.5752, yt = 142.8556, function(xt, yt) = -125664.2857
t = 19, xt = 228.5737, yt = 142.8562, function(xt, yt) = -125664.2857
t = 20, xt = 228.5728, yt = 142.8566, function(xt, yt) = -125664.2857
t = 21, xt = 228.5723, yt = 142.8568, function(xt, yt) = -125664.2857
t = 22, xt = 228.5719, yt = 142.8569, function(xt, yt) = -125664.2857

We will not be providing the correct output for other test cases (at least not before the submission deadline).

Task 5 (10 points).

Note: In this question you should assume that the activation function of a perceptron is the step function. More specifically, this function:

outputs 0 if the weighted sum of inputs is LESS THAN 0 (not less than or equal).
outputs 1 if the weighted sum of inputs is greater than or equal to 0.

Design a perceptron that takes three Boolean inputs (i.e., inputs that are equal to 0 for false, and 1 for true), and outputs: 1 if at least two of the three inputs are true, 0 otherwise. You should NOT worry about what your perceptron does when the input values are not 0 or 1.

Your answer can be a drawing, as in the slides, or it can simply be text. Either way, your answer should clearly specify the values for the bias weight and the three regular weights.

Task 6 (10 points).

Note: In this question you should assume that the activation function of a perceptron is the step function, same as in Task 5.

Design a neural network that:

takes two inputs, A and B, that can be any real numbers.
outputs 1 if 2A + 3B = 4.
outputs 0 otherwise.

Your answer can be a drawing, as in the slides, or it can simply be text. Either way, your answer should clearly specify:

How the network is structured, what units it contains, how those units are organized in layers.
All edges connecting outputs of one layer to inputs in the next layer.
The values of all weights (including bias weights) of all perceptrons. It should be clear which weight value corresponds to which edge in the network.

Task 7 (10 points).

Note: In this question you should assume that the activation function of a perceptron is the step function, same as for Tasks 5 and 6.

Is it possible to design a neural network (which could be a single perceptron or a larger network), that satisfies these specs?

Takes a single input called X, which can be any real number.
If X < 3, the network outputs 0.
If 3 < X < 7, the network outputs 1.
If X > 7, the network outputs 0.

We don't care what the network outputs when X = 3 or X = 7.

If your answer is no, explain why not. If your answer is yes, your solution should fully specify, either using a drawing or in text, the network, as described in Task 6, by specifying the units, layers, connections between layers, and values of weights (including bias weights).

CSE 4311 - Assignments - Assignment 2