Math Problem Statement

The data file includes text taken from three books of the Bible (Joshua, Jonah and Philippians) using the ESV translation. While these are all great books, our only interest for this project is how often each letter is used.

  1. In the Word file containing the Biblical text, use the โ€œFindโ€ feature to identify how many times each letter occurs (i.e. the letterโ€™s frequency). Create an Excel spreadsheet to display the number of occurrences of each letter in the English alphabet. (10 points)
  2. In the Excel spreadsheet, sum your frequencies to compute the total number of letters in the 3 books (this is sample size n). a) In your spreadsheet, use the formula ๐‘ = ๐‘‹ ๐‘› to compute the sample proportion of each letterโ€™s appearances relative to total number of letters (i.e. the relative frequency of each letter). Use the Excel sorting function to sort the letters in order of their frequencies. (6 points) b) Use the simple Confidence Interval (CI) formula ๐‘ โ€• 1.96 ๐‘๐‘ž ๐‘› ,๐‘ + 1.96 ๐‘๐‘ž ๐‘› to find a 95% CI on the proportion of how often each letter is used in English text in general. Enter the lower limit in the first Excel column using the formula ๐‘ โ€•1.96 ๐‘๐‘ž ๐‘› and the upper limit in the next column using the formula ๐‘ +1.96 ๐‘๐‘ž ๐‘› . (8 points)
  3. Identify those letters whose Cls do not overlap with any the CIs of any of the other letters. (For example the CI (0.042, 0.052) overlaps with (0.050, 0.060) because the upper limit of the

MATH 211 2 first CI is greater than the lower limit of the second CI.) List the letters with the non-overlapping Cis and specify how many such letters there are. (6 points) 4) The previous analysis could be useful if our goal was to decipher an encrypted message, where each letter is scrambled (for example, each โ€œaโ€ might become a โ€œgโ€, while each โ€œbโ€ might become an โ€œoโ€ and so forth). a) Assume that the letter โ€œzโ€ in encrypted message has a relative frequency of 0.06 (it accounts for 6% of the total number of letters). Which letterโ€™s Confidence Intervals (from question 2) contain 0.06 and thus are the most likely candidates to be the letter which was encrypted as โ€œzโ€? (4 points) b) Further assume that โ€œyโ€ in the encrypted message has a relative frequency of 0.04 (4%). Which letterโ€™s CIs contain 0.04? (4 points) c) If โ€œxโ€ in the encrypted message has a relative frequency of 0.025 (2.5%), which letterโ€™s CIs contain 0.025? (4 points) 5) a) How many possible ways are there to assign the actual letters of the alphabet to the encrypted letters in a message? (Hint: โ€œAโ€ could be assigned to any one of the 26 letters, including itself. Once โ€œAโ€ has been assigned, โ€œBโ€ can be assigned to any letter except the letter that corresponds to โ€œAโ€). (4 points) b) As your answer to part (a) makes clear, there are a super-high number of possible ways all the letters could be assigned. Knowing something about each letterโ€™s relative frequency dramatically reduces the number of likely combinations. For example, if there were only 3 possible options for half of the encrypted letters (i.e. 13 letters) in the message and only 2 possible options for the remaining 13 letters, then how many possible ways would there be to assign real letters to the letter in the encrypted message? (4 points)

Solution

To complete the project as outlined in the instructions, follow these steps:

Step 1: Count letter frequencies and create an Excel sheet

  1. Identify letter frequencies using the "Find" feature in Word:

    • Open the Word file containing the Biblical text.
    • Use Ctrl + F or the "Find" tool to search for each letter of the English alphabet (Aโ€“Z, case-insensitive).
    • Record the frequency of each letter in a table.
  2. Enter data into Excel:

    • In Column A, list the letters (Aโ€“Z).
    • In Column B, input the frequency (number of occurrences) for each letter.

Step 2: Calculate total letters and sample proportion

  1. Compute the total number of letters (n):

    • Sum the frequencies of all letters in Column B using =SUM(B1:B26) in Excel.
  2. Calculate the sample proportion for each letter (๐‘ = ๐‘‹/๐‘›):

    • In Column C, compute the proportion for each letter using the formula =B1/$B$27 (assuming the total frequency is in B27).
  3. Sort letters by frequency:

    • Highlight Columns Aโ€“C and use the Excel "Sort" feature to sort by Column B in descending order.

Step 3: Confidence Interval (CI) calculation

  1. Compute lower and upper bounds for the 95% CI:

    • Let ๐‘ž = 1 โˆ’ ๐‘. For each letter, calculate:
      • Lower limit: =C1 - 1.96 * SQRT((C1*(1-C1))/$B$27)
      • Upper limit: =C1 + 1.96 * SQRT((C1*(1-C1))/$B$27)
  2. Enter these values in Columns D (Lower CI) and E (Upper CI).


Step 4: Non-overlapping Confidence Intervals

  1. Identify letters with non-overlapping CIs:

    • Compare the confidence intervals of each letter to the others.
    • If a letter's CI does not overlap with any other letter's CI, note it.
  2. List these letters in a separate column or worksheet.


Step 5: Encrypted message analysis

  1. Identify letters whose CIs contain given frequencies:
    • For each frequency (e.g., 0.06 for โ€œzโ€), check which lettersโ€™ CIs (Columns D and E) contain the given frequency.
    • Repeat this for 0.04 (โ€œyโ€) and 0.025 (โ€œxโ€).

Step 6: Permutation calculations

  1. Number of possible assignments:

    • Compute the total number of permutations for 26 letters:
      • Formula: 26! (26 factorial). Use Excelโ€™s =FACT(26).
  2. Reduced combinations:

    • If 13 letters have 3 options and the other 13 have 2:
      • Formula: 313โ‹…2133^{13} \cdot 2^{13}.
      • Compute this in Excel as =3^13 * 2^13.

Deliverables

  • Excel file:
    • Columns for letters, frequencies, sample proportions, CIs (lower and upper bounds), sorted data, and notes on non-overlapping CIs.
  • Summary document:
    • Answers to the specific questions (e.g., which lettersโ€™ CIs contain given frequencies).

Would you like me to assist in calculating specific steps or create the initial Excel sheet structure for you?


5 Questions for Expansion

  1. How would the results change with a different translation of the Bible (e.g., NIV vs. ESV)?
  2. What is the mathematical significance of non-overlapping CIs in text analysis?
  3. Could this method be applied to other languages with different alphabets?
  4. How do letter frequencies vary between different genres of text (e.g., poetry vs. prose)?
  5. Can this method identify patterns in encrypted texts with additional complexity (e.g., digraphs)?

Tip: For faster analysis in Excel, use conditional formatting to highlight overlapping confidence intervals automatically.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Probability
Proportions
Confidence Intervals
Combinatorics
Statistical Analysis

Formulas

Sample proportion formula: p = X/n
Confidence interval formula: p ยฑ 1.96 * โˆš(p(1-p)/n)
Factorial formula: n! for permutations
Reduced permutation formula: 3^13 * 2^13 for restricted assignments

Theorems

Central Limit Theorem (underlying basis for confidence intervals)
Basic principles of permutations in combinatorics

Suitable Grade Level

Grades 10-12 or Introductory College