Based on the following information collected from emails. What is the probability that if the word "llwws" is in a document, it is spam ?

Spam Not spam
Word "aabbdd" 0.61
Word "llwws" 0.6

1 answer

To determine the probability that a document is spam given that it contains the word "llwws," we can use Bayes' Theorem. However, we need some additional information.

Let's define:

  • \( P(\text{Spam}) \): The prior probability of a document being spam.
  • \( P(\text{Not Spam}) \): The prior probability of a document not being spam.
  • \( P(\text{llwws | Spam}) \): The probability of the word "llwws" appearing given that the document is spam.
  • \( P(\text{llwws | Not Spam}) \): The probability of the word "llwws" appearing given that the document is not spam.
  • \( P(\text{llwws}) \): The total probability of the word "llwws" appearing in a document.

From your data:

  • You provided \( P(\text{llwws | Spam}) = 0.6 \).
  • However, we do not have \( P(\text{Spam}) \) or \( P(\text{Not Spam}) \), nor \( P(\text{llwws | Not Spam}) \).

Assuming you meant to provide the percentage of the emails that are spam and not spam, let's denote:

  • \( P(\text{Spam}) = p \)
  • \( P(\text{Not Spam}) = 1 - p \)
  • \( P(\text{llwws | Not Spam}) \): This is not provided, but if you have this information, we can proceed.

If we assume \( P(\text{llwws | Not Spam}) \) is significantly lower than \( P(\text{llwws | Spam}) \), we can proceed with a general approach.

Using Bayes' theorem:

\[ P(\text{Spam | llwws}) = \frac{P(\text{llwws | Spam}) \cdot P(\text{Spam})}{P(\text{llwws})} \]

For \( P(\text{llwws}) \):

\[ P(\text{llwws}) = P(\text{llwws | Spam}) \cdot P(\text{Spam}) + P(\text{llwws | Not Spam}) \cdot P(\text{Not Spam}) \]

If we could assume a value for \( P(\text{Not Spam}) \) or \( P(\text{llwws | Not Spam}) \), we could compute an answer.

Without these values, we cannot calculate an exact numeric answer. To find the probability that a document is spam given it has the word "llwws," you'll need to provide those additional probabilities.

Once you have:

  1. The prior probabilities \( P(\text{Spam}) \) and \( P(\text{Not Spam}) \)
  2. The probability \( P(\text{llwws | Not Spam}) \)

We can substitute those values into the formulas to solve for \( P(\text{Spam | llwws}) \).