Math Problem Statement
The data file includes text taken from three books of the Bible (Joshua, Jonah and Philippians) using the ESV translation. While these are all great books, our only interest for this project is how often each letter is used.
- In the Word file containing the Biblical text, use the βFindβ feature to identify how many times each letter occurs (i.e. the letterβs frequency). Create an Excel spreadsheet to display the number of occurrences of each letter in the English alphabet. (10 points)
- In the Excel spreadsheet, sum your frequencies to compute the total number of letters in the 3 books (this is sample size n). a) In your spreadsheet, use the formula π = π π to compute the sample proportion of each letterβs appearances relative to total number of letters (i.e. the relative frequency of each letter). Use the Excel sorting function to sort the letters in order of their frequencies. (6 points) b) Use the simple Confidence Interval (CI) formula π β 1.96 ππ π ,π + 1.96 ππ π to find a 95% CI on the proportion of how often each letter is used in English text in general. Enter the lower limit in the first Excel column using the formula π β1.96 ππ π and the upper limit in the next column using the formula π +1.96 ππ π . (8 points)
- Identify those letters whose Cls do not overlap with any the CIs of any of the other letters. (For example the CI (0.042, 0.052) overlaps with (0.050, 0.060) because the upper limit of the
MATH 211 2 first CI is greater than the lower limit of the second CI.) List the letters with the non-overlapping Cis and specify how many such letters there are. (6 points) 4) The previous analysis could be useful if our goal was to decipher an encrypted message, where each letter is scrambled (for example, each βaβ might become a βgβ, while each βbβ might become an βoβ and so forth). a) Assume that the letter βzβ in encrypted message has a relative frequency of 0.06 (it accounts for 6% of the total number of letters). Which letterβs Confidence Intervals (from question 2) contain 0.06 and thus are the most likely candidates to be the letter which was encrypted as βzβ? (4 points) b) Further assume that βyβ in the encrypted message has a relative frequency of 0.04 (4%). Which letterβs CIs contain 0.04? (4 points) c) If βxβ in the encrypted message has a relative frequency of 0.025 (2.5%), which letterβs CIs contain 0.025? (4 points) 5) a) How many possible ways are there to assign the actual letters of the alphabet to the encrypted letters in a message? (Hint: βAβ could be assigned to any one of the 26 letters, including itself. Once βAβ has been assigned, βBβ can be assigned to any letter except the letter that corresponds to βAβ). (4 points) b) As your answer to part (a) makes clear, there are a super-high number of possible ways all the letters could be assigned. Knowing something about each letterβs relative frequency dramatically reduces the number of likely combinations. For example, if there were only 3 possible options for half of the encrypted letters (i.e. 13 letters) in the message and only 2 possible options for the remaining 13 letters, then how many possible ways would there be to assign real letters to the letter in the encrypted message? (4 points)
Solution
Ask a new question for Free
By Image
Drop file here or Click Here to upload
Math Problem Analysis
Mathematical Concepts
Probability
Proportions
Confidence Intervals
Combinatorics
Statistical Analysis
Formulas
Sample proportion formula: p = X/n
Confidence interval formula: p Β± 1.96 * β(p(1-p)/n)
Factorial formula: n! for permutations
Reduced permutation formula: 3^13 * 2^13 for restricted assignments
Theorems
Central Limit Theorem (underlying basis for confidence intervals)
Basic principles of permutations in combinatorics
Suitable Grade Level
Grades 10-12 or Introductory College
Related Recommendation
Probability Calculation in Shift Cipher for Ciphertext B
Analyzing Patterns in Lottery Numbers: Predicting Future Numbers
Statistical Analysis of Random Number Sequences: Frequency, Range, and Patterns
Analyzing Data: Histograms, Frequencies, and Probability
Statistical Analysis of Proportion of Words Containing the Letter 'e'