LIGN 6 - Probability and Lg. Modeling

# Probability and Language Modeling
### Will Styler - LIGN 6

---

### We've got corpora now!

- Large collections of text

- Now we can start to build *language models*

- ... but there's always uncertainty

---

### Today's Plan

- Probability

- Surprisal and information

- Conditional Probability

- Using probability for language modeling

- Predictive Text and Swype

- Probability and Corpus Size

---

## What is probability?

---

## Probability

- The degree of certainty that the value of a variable (or correct answer to a question) is one thing and not another

- Humans tend to think about this as 'certain', 'impossible', 'likely', 'unlikely'

- We express this as a proportion (e.g. 0.46, or '46% of the time')
	
- 'p(event)' means 'the probability of event occurring'

---

### Sample Probabilities

- `p(Heads)` from a fair coin: 0.5

- `p(Heads)` from a weighted coin: Not 0.5!

- `p(6)` on a six sided die: 0.16666

- `p(6)` on a twenty sided die: 0.05

- `p(You winning Powerball)`: ~0

- `p(Will wearing gray pants)`: ~1

---

### Probabilities can be calculated from observation

- `p(heads)` in a coin of uncertain fairness?

- `p(somebody's wearing a red shirt in class)`?

- `p('yeet')` in a corpus?

- `p('rolex')` in a corpus?

---

## Surprisal and Information

---

### Some things in life are surprising

- What's would be a surprise?

- What's would be completely unsurprising?

- 'Surprise' comes when something we judged to be improbable happens

- When we're surprised, we usually *gain information about the world*

---

### Defining Surprisal and information

- Something which is completely certain happening... (P = 1)

- ... is completely unsurprising (surprisal = 0)
	
	- ... is completely uninformative (information = 0)
	
- Something which happens, despite seeming impossible... (P = 0)

- ... is infinitely surprising (surprisal = ∞)

- ... is infinitely informative (surprisal = ∞)

- Everything else is in the middle (0 < P < 1)

---

### What's the surprisal of...

- 'the' occurring in an English document

- The sun rising tomorrow

- Will wearing non-gray pants

- Will cancelling the final project and giving everybody A's

- 'mel-frequency cepstral coefficient' occurring in a document

- 'mel-frequency cepstral coefficient' occurring in a TikTok

- Winning Powerball

---

### Now we can quantify how likely a given event is

- How surprising it is

- ... and how informative it is!

---

### We can also estimate our uncertainty

- "How likely am I to be surprised here?"

- The sun will rise tomorrow

- Will will wear gray pants Wednesday

- 10 heads in a row while flipping a weighted coin

- 10 heads in a row while flipping a fair coin

- The coin landing on edge after being flipped

- (Apparently [1 in 6000 tosses for a nickel!](https://ui.adsabs.harvard.edu/abs/1993PhRvE..48.2547M/abstract))

- The next 20 sided dice roll being 17

---

## Conditional Probability

---

## Conditional Probability

'What is the probability of this event, given that this other event occurred?'

- `p(event|other event)` means 'the probability of an event occurring, given that the other event occurred'

---

### Probabilities are often conditional on other events

- What's `p(pun)`?  What about `p(pun|Will)`?

- What's `p(fire|smoke)`?  What about `p(smoke|fire)`?
  
	- This is not (always) symmetrical

- What's `p(Will calls in sick)`? What's `p(Will calls in sick|he did last class)`?

- What's `p(heads)` on a fair coin? What's `p(heads|prior heads)`?

- Probabilities are not always conditional!
	
---

### Differences in conditional probabilities are information!

- Does the change in conditioning event affect the observed probability?

- One event's probability **depends** on the other's!

- If so, there's an informative relationship!
	
	- Two events have "mutual information" if there's some relationship
	
- Language modeling is about finding **informative relationships** between linguistic elements!

---

### Differences in conditional probability let us model language!

- `p('you'|'how are')` vs. `p('dogs'|'how are')`

- `p(adjective|'I am')` vs. `p(noun|'I am')`

- `p(good review | "sucks")` vs. `p(bad review | "sucks")`

---

## Using Probability for language modeling

---

### Knowing probability of any individual word is helpful!

- What tasks could be done knowing just the probability of a given word?

---

### Knowing the mutual information of linguistic elements helps to solve problems!

- When we do machine learning, we learn a large set of dependent probabilities among linguistic elements!

- We're trying to **predict** one variable by **observing** others!

---

### Automatic Speech Recognition

- What kinds of probability modeling could help us?