LIGN 120 - The Morpheme-Based Lexicon

Did you hear about the guy who scammed a publisher by promising to write a dictionary of a language that doesn't exist?

He was a lexi-con man!

---

# The Morpheme-based Lexicon
### Dr. Will Styler - LIGN 120

---

## Previously, on LIGN 120

---

### How are words being accessed?

---

### How are words being built?

- "Let's see, I need to talk about gluing this thing, again, in the past.  That's 'glue', plus 're-', and '-ed'."

- "Let's see, I need to talk about gluing this thing, again, in the past. Looks like 'reglued' is the word I need!"

---

### Today, we'll look at one of the major approaches to the Lexicon

- We'll think about what it means for word-building

- What it means for theory

- ... and what it means about storage

- Then Monday, we'll go the other way!

---

### Today's Plan

- Generative Linguistics

- The Morpheme-based Lexicon

- Problems with the Morpheme-based Lexicon

---

# Generative Linguistics

---

## Generative Grammar

---

### We've talked a lot about generative morphology this quarter

- "Your analysis should create all of the forms in the data, without predicting anything that isn't there"

- "Productivity should detail all the possible forms, while preventing ones which aren't produced"

- **These are generative approaches**

---

### Generative Grammar has a straightforward goal

- Describe the structure of a language in a way that outlines rules for generating "all and only" the grammatical sentences of the language

- "Let's create rules which generate everything that's attested in language"

- "Let's build those rules such that they *do not* generate things that aren't attested"

- The perfect generative grammar for Morphology can create every grammatical word, and forbid every ungrammatical one

---

---

### For Generative Grammar, good *description* is good *creation*

- Grammar is taking known elements from an inventory and combining them by rule, then passing them on to the next level.

- Known sub-elements are combined with rules, and then passed upwards...

---

### This is true at every level

---

---

### This leads to a view of language use as *assembly*

- Take these pieces (phonemes, morphemes, lexemes) and put them together using rules

- Formalize the patterns as abstract rules to generate (only) the correct forms

- Phonological rules, Morphological Rules, Syntactic Rules

- *Storage is kept to a minimum!*

---

### Welcome to the Sandwich Shop

---

This perspective has big consequences for the storage of words!

---

# The Morpheme-based Lexicon

---

### We've been making some key assumptions about storage

- From 111: "We store phonemes, and then rules generate allophonic detail"

- "You wouldn't want to put "cats" in the dictionary, when we could just add the -s online!"

- "We don't need to create a whole new word, we can just combine "fuse" with "-ion", easy!"

- **Computation is cheap, storage is expensive!!**

---

### Generative Grammar comes from a time when storage was expensive and computation was cheap

---

### As an aside, [OMG](https://www.engadget.com/2019/02/25/1TB-microSD-cards-western-digital-micron/)

---

### In generative approaches, storage is a last resort

- "Let's store just raw phonemes, and generate details with rules"

- "Let's store bare lexemes, and generate inflected forms with rules"

- **"We're not storing that in the lexicon unless we absolutely have to!"**

---

### Things that must be stored

- Phonological Irregularity

- e.g. English ablaut, 'exceptions'
    
- Morphological Idiosyncrasy

- e.g. Fossilized forms (oxen), Class information, Mice and Deer
    
- Semantic Idiosyncrasy

- e.g. Alienable vs. inalienable possession rules, animacy, and more
    
- **We store only that which cannot be predicted by rules!**

---

### It makes for compact and graceful analysis

- ... when it works well!

---

### It also feels more economical

---

### This has big consequences for the lexicon

---

### Phonemes are real and are the medium of storage

- We're not storing acoustic signals

- We're not storing post-phonological surface forms

- We're storing words as a series of **phonemes**

---

---

### We're storing as little information as we can

- Monomorphemic words

- e.g. "cat", "suit", "ice", "Germany"

- Idiosyncratic Forms

- e.g. "oxen", "mice", "was", "best", "backpack"

- Affixes and Productive Morphological Processes

- e.g. -s, -tion, -ify, re-, un-

---

### Forms are stored with meanings

---

## '-ed'

---

## 'cat'

---

## '-s'

---

### ... and rules tell us how to combine them

---

### So, in summary...

---

### In a morpheme-based lexicon...

- Words are stored as sequences of phonemes

- Stems and affixes are stored as unanalyzable chunks with meanings

- Storage is viewed as precious, and an effort is made to avoid doing it

- We combine these based on rules to generate the non-stored words

- Then we do phonology on the output to create legal surface forms

- All of the above is done according to abstract rules generating all and only the possible forms

---

# Problems with a Morpheme-based Lexicon

---

(There are many morpheme-based theories, some of which may have different assumptions.  We're talking generally!)

---

### This works *very* well if you have transparent affixation

- "cat" 'cat' is stored, -s 'PL' is stored, so combine them for 'cat.PL'

- "walk" 'walk' is stored, -ed 'PAST' is stored, so combine them for 'walk.PAST'

- ... but we know it isn't always like that

---

## Efficiency Losses

---

### Efficiency depends on regularity

---

### ... but irregularity is regular

---

### ... but irregularity is regular

---

### As regularity is reduced, the efficiency is reduced

- Suppletive allomorphy must be stored

- 'Better' cannot be derived from 'Good' + '+er'
    
    - 'went' must be stored independently of 'go' and '-ed'
    
- The more mice, oxen, deer, and women you have, the more you're storing

- ... but this isn't a big issue, just more storage!

---

## Difficulty Predicting Derived Meanings

---

### Morpheme-based approaches thrive on predictability

- X + Y should transparently mean "X+Y"

- Bottler, Painter, Bridger, Plasterer, Lecturer, Influencer

- We can understand these words by knowing the meaning of X + the meaning of -er

- **These don't need to be stored!**

- (Note, we're here erasing the inflection/derivation difference!)

---
     
### Sometimes, we don't have both components

- Fletcher, Cobbler, Haberdasher, Butcher, Bursar

- We don't have the meaning of X, so they're functionally monomorphemic

- **These must be stored separately!**

---

### Sometimes, the meaning is different from the combination

- Hooker, Grinder, Hustler, Professor, Player, Mailer

- Here, there's meaning which is *not predictable* based on the known components

- **We must separately store the meanings of these words**

---

... but the biggest issue is ...

---

## Non-concatenative morphology

---

### Review: Common non-concatenative ways of adding meaning to words

- Zero Expressions

- e.g. I sing-ø

- Conversion

- e.g. re'peat (V) -> 'repeat (N), or ticket (N,V)

- Stem Modification

- e.g. changing tones, vowels, consonants, orderings

- Reduplication

---

### How are we storing zero morphemes?

- Do we have a whole bunch of meanings stored under '-ø'?

---

### How do we handle multiple exponence?

- Do we write optional rules?

- Do we store morphemes with multiple forms?

---

### How are we storing this?

---

### What about this?

---

### How about English ablaut?

- sit -> sat/seated, sing -> sang/sung, sting -> stung/stung, yeet -> yote/yaughten

- [List of English Irregular Verbs](https://en.wikipedia.org/Wiki/List_of_English_irregular_verbs)

---

### There are approaches to handling these phenomena!

- We could store *the pattern itself* as a lexical entry

- Morphemes are stored alongside stem modifications (etc.)

- Morphological processes are stored as entries, and combined with stems

- ... but this quickly erases the 'elegance' of the approach

---

### Morpheme-based approaches grow less compelling with increased irregularity

- Exceptions have to get stored somehow

- You end up with a bunch of edge cases

---

### Generative linguistics is like picking up broken glass in carpet

- Disposing of the big chunks is easy and effective

- You're gonna spend a long time chasing down those last slivers

- ... and you're still gonna need some tweezers from time to time

---

## So, that's a morpheme-based approach

---

## But there is another way

---

### Wrapping Up

- Generative approaches to grammar create all and only the attested forms

- Morpheme-based lexicons are great for efficiently handling regular patterns

- They struggle with irregularity and hard-to-memorize forms

---

### For Next Time

- Read the Bybee paper
       
- We'll talk about whole-word storage as an alternative

---

<huge>Thank you!</huge>