CSE 2312 - Assignments - Assignment 3

The assignment will be graded out of 100 points. Submit a ZIPPED file (no other formats will be accepted) called assignment3.zip, containing the following three documents:

A document entitled answers.xxx (where you replace xxx with whatever extension is appropriate, depending on the file format you use), that contains answers to the non-programming tasks (i.e., answers to tasks 3-7). Acceptable file formats are plain text, Word document, OpenOffice document, and PDF. Put your name and UTA ID on the first line of the document.
Files task1.xxx and task2.xxx (where xxx depends on your choice of programming language) containing your solutions to the programming tasks (i.e., solutions to tasks 1 and 2). Make sure that your solutions run on omega.

Submit your assignment3.zip file to Blackboard before the deadline. You will be able to revise your answers until the deadline with no penalty.

IMPORTANT: By submitting your answers, you are certifying that you have followed the UTA standards of academic integrity, and that these answers have been exclusively your own work. All students enrolled in this course are expected to adhere to the UT Arlington Honor Code:

I pledge, on my honor, to uphold UT Arlington's tradition of academic integrity, a tradition that values hard work and honest effort in the pursuit of academic excellence. I promise that I will submit only work that I personally create or contribute to group collaborations, and I will appropriately reference any work from other sources. I will follow the highest standards of integrity and uphold the spirit of the Honor Code.

Task 1 - Programming (20 pts)

We define the following data type, using C-style notation:

struct record
{
   int age;
   char name[12];
   int department;
};

Each int takes four bytes (32 bits) of memory. Each char takes 1 byte (8 bits) of memory. Therefore, if an object of type record is stored in memory, or on file, starting at address base, it is stored in 20 bytes as follows:

from base to (base+3): age
from (base+4) to (base+15): name
from (base+16) to (base+19): department

Write a program (in C, C++, Java, or Python) that can translate files storing such records between big-endian and little-endian. If you are using C, C++, or Java, your program should take three command line arguments: number, input_file, output_file. Similarly, if you are using Python, your function should take in three arguments: number, input_file, output_file. Your program should behave as follows:

If number is 0, then the input file should follow the big-endian format, and the output file should follow the little endian format. If number is 1, then the input file should follow the little-endian format, and the output file should follow the big endian format.
Your program should read records from input_file, in the format specified by number, and save those records to output_file, again in the format specified by number.

You should assume that the input file only contains data of type record, and every file location that is a multiple of 20 is the start of a new record.

To help you get started, we provide the following resources:

File task1.c contains most of the code that you need for the solution. Feel free to use this code, and add the code that is needed to convert this code into a complete solution for this task. If you use this file, you just need to implement this function, which has been left undefined:
- void convert_and_save(struct record item, FILE * output_handle, int number);
In the opinion of the instructor, using the code in task1.c is probably the easiest way to solve this task (my solution is 12 additional lines of code added to task1.c). Still, if you choose so, you are free to write your own code from scratch in C, C++, Java, or Python.
File test1_little.bin is a 80-byte test file that contains four records stored in little endian format:
- Record 1: age = 56, name = "john smith", department = 6.
- Record 2: age = 46, name = "mary jones", department = 12.
- Record 3: age = 36, name = "tim davis", department = 5.
- Record 4: age = 26, name = "pam clark", department = 10.
File test2_big.bin is an 80-byte test file that contains the same records as test1_little.bin, but in big-endian format.

While we have provided test1_little.bin and test2_big.bin as two test files, we will test your solutions with different test files, so your solution should be able to handle any file following the format we have specified (every file position that is a multiple of 20 is the start of a record).

Submit your solution to this task as a file called (depending on the programming language you use) task1.c, task1.cpp, task1.java, or task1.py. Your solution needs to run on omega.uta.edu, otherwise you will receive at most half credit.

Task 2 - Programming (40 pts.)

Write a program (in C, C++, Java, or Python) that can do parity-bit encoding and decoding of 7-bit words. We will interpret each 7-bit word as an ASCII code. The parity bit should be placed at the END of each codeword.

To make it easy to read and modify input and output files using a text editor, our input and output will be TEXT files, not binary files. As an example, consider 7-bit word 1010100.

Binary number 1010100 is number 84 in decimal representation.
84 is the ASCII code of character 'T'.
In the text file where we store uncoded data, we store this binary number as string '1010100'.
The parity-bit codeword for 1010100 is 10101001 (the parity bit is placed at the end).
In the text file where we store coded data, we store this binary number as string '10101001'.

If you are using C, C++, or Java, your program should take three command line arguments: number, input_file, output_file. Similarly, if you are using Python, your function should take in three arguments: number, input_file, output_file. Your program should behave as follows:

If number is 0, then the input file contains characters that are '0' or '1'. Each chunk of 7 such characters represents an ASCII code in binary. The output file should contain the parity-bit codeword corresponding to such chunk. The The parity-bit codeword is an 8-character chunk of characters that are '0' or '1'.
If number is 1, then the input file contains characters that are '0' or '1'. Each chunk of 8 such characters represents the parity-bit codeword of a 7-bit ASCII code in binary. The output file should contain the original 7-bit ASCII code for each such chunk. This ASCII code should be saved as a string of 7 characters that are '0' or '1'.

You should assume that the input file ONLY contains only characters '0' and '1', nothing else (no spaces, new lines, etc.). You should enforce that your output file follows the same convention, and also contains ONLY characters '0' and '1'.

To help you get started, we provide the following resources:

File task2.c contains most of the code that you need for the solution. Feel free to use this code, and add the code that is needed to convert this code into a complete solution for this task. If you use this file, you just need to implement these two functions, which have been left undefined:
- int convert_to_original_word(char * input_buffer, char * output_buffer);
In the opinion of the instructor, using the code in task2.c is probably the easiest way to solve this task (my solution is about 25-30 additional lines of code added to task2.c). Still, if you choose so, you are free to write your own code from scratch in C, C++, Java, or Python.
File in1.txt is a test file that (in uncoded form) the binary pattern representing text "The kangaroo is an animal that lives in Australia.".
File parity1.txt is the coded version of file in1.txt.
File parity2.txt is a corrupted version of coded1.txt, that includes 6 errors, at codewords 0, 8, 16, 24, 32, 48. Your solution should be able to detect these errors.

While we have provided these test files, we will test your solutions with different test files, so your solution should be able to handle any file following the format we have specified (containing only '0' and '1' characters).

Submit your solution to this task as a file called (depending on the programming language you use) task2.c, task2.cpp, task2.java, or task2.py. Your solution needs to run on omega.uta.edu, otherwise you will receive at most half credit.

Task 3 (5 pts.)

(This is Problem 10 from Chapter 2 of the textbook).

Genetic information in all living things is coded as DNA molecules. A DNA molecule is a linear sequence of the four basic nucleotides: A, C, G, and T. The human genome contains approximately 3 * 10⁹ nucleotides in the form of about 30,000 genes. What is the total information capacity (in bits) of the human genome? What is the maximum information capacity (in bits) of the average gene?

Task 4 (5 pts.)

(This is Problem 11 from Chapter 2 of the textbook).

A certain computer can be equipped with 1,073,741,824 bytes of memory. Why would a manufacturer choose such a peculiar number, instead of an easy-to-remember number like 1,000,000,000?

Task 5 (10 pts.)

(This is Problem 12 from Chapter 2 of the textbook).

Devise a 7-bit even-parity Hamming code for the digits 0 to 9.

Task 6 (10 pts.)

(This is Problem 13 from Chapter 2 of the textbook).

Devise a code for the digits 0 to 9 whose Hamming distance is 2.

Task 7 (10 pts.)

(This is Problem 14 from Chapter 2 of the textbook).

In a Hamming code, some bits are "wasted" in the sense that they are used for checking and not information. What is the percentage of wasted bits for messages whose total length (data + check bits) is 2ⁿ - 1? Evaluate this expression numerically for values of n from 3 to 10.

Back to the list of assignments.