LIGN 6 - Computers and Language

Will Styler - Spring 2022

Course Information

As technology advances, we’re relying more and more on computers and ‘virtual assistants’ to understand and interact with us using human language. In this course, we’ll talk about the methods we use to help computers process, understand, analyze, and speak our language, and we’ll talk about some of the linguistic realities that make human language so hard for computers to work with. No programming or linguistic background is needed.

Teaching Team

Instructor - Dr. Will Styler

Hagyeong Shin - Graduate Instructional Assistant

Ben Lang - Graduate Instructional Assistant

Yuri Bukhradze - Undergraduate Instructional Assistant

Course Resources

Course Materials

Course Textbook: You will not be required to purchase a textbook for this class. You will find any additional readings below on Canvas.

You will need access to a general purpose computer and to a ‘Virtual Assistant’ like Alexa, Google Assistant, Siri, or Cortana. Please see “Laptops and Computer Access” for more details.

Course Schedule

Weeks are listed by the Monday they start on. Although all due-dates are fixed (barring extensive notice), given everything, expect things to change some. Please check this page regularly. Click an individual talk title to see that day’s slides (broken links indicate that slides are not yet posted). You can also add _handout before .html in the slides link to see an autogenerated version formatted for printing and note-taking.

All assignments are due on Sundays following the class session at 11:59pm (so Week 1’s ‘Due Sunday’ assignment is due on the Sunday before Week 2, at 11:59pm).

Week 1 (March 28) - Computers, Language, and Machine Learning

Week 2 (April 4) - How do humans speak and sound?

Week 3 (Apr. 11) - How do computers understand speech?

Week 4 (Apr. 18) - How do computers produce speech?

Week 5 (Apr. 25) - Corpora and Probability in Language

Week 6 (May 2) - Words in Natural Language Processing

Week 7 (May 9) - Sentence Structures and Syntax

Week 8 (May 16) - Structures and Meaning

Week 9 (May 23) - Meaning and Ethics

Week 10 (May 30) - Systems in the World

Finals Week - Final Project Submission

Student Resources for Support, Learning, and Interaction

Please see my complete listing of student resources for information on student support (e.g. counseling, crisis centers, resource centers), resources for learning (libraries, writing help, and more), resources for engaging with faculty (e.g. Coffee with a Prof, Letters of recommendation), and technical resources.

Effort Matters!

In all of my classes, teaching is a collaborative process, and my goal is straightforward: I want to help students who put in strong effort to get a great grade.

So, you will succeed in this class if you…

We will bend over backwards to help students who are sincerely doing their best to succeed in this class, and will always do our best to help students who are trying to recover, even after a rough start. And of course, we’re also always happy to help when documentable circumstances beyond your control come up which prevent you from making your full effort.

That said, if you’re “blowing off” the class, skipping class, starting assignments at the last minute, cutting corners, grade begging, grubbing, or lawyering, or turning in low-effort and low-integrity work, it’s disrespectful to me and the rest of the instructional team. As such, if you make these decisions, you’ll find us much less eager to help, accommodate, recommend, or make policy exceptions for you.

So, we’re on the same team, and we want to help you succeed, but the key is demonstrating strong effort, and we’re going to put as much effort into helping you succeed as you do.

Assessing Learning

Your final grade is based on the below formula:

Item % of Final Grade
Weekly Activities 50%
Weekly Quizzes 20%
Final Project 25%
Final Project Proposal 5%

Put differently, your grade = (50 * [Average % score of activities]) + (20 * [Percentage of Quizzes]) + (25 * [Final Project Grade]) + (5 * [Final Project Proposal Grade]))

The grading scale used for this course is the UCSD standard scale, where A+ is 97% or more, A is 96.99% to 93%, A- is 92.99 to 90%, B+ is 89.99 to 87%, and so forth. Plus and Minus grades are not assigned below “C”, and no grade changes or plus-minus adjustments will be considered for A grades. Up-to-date grade information will be provided automatically in Canvas.

Weekly activities

There will be an activity most every week, which you’ll ‘turn in’ on the Canvas discussion boards. These activities will be meant to both test your understanding of the concepts and materials given you, to experiment with these tools, and to see the relevance of the material to your daily life.

Weekly Quizzes

Every Sunday at 11:59PM, you’ll have a short quiz due on Gradescope. These are open book and open note, and are simply designed to make sure that you’re staying caught up and understanding the course material. They’ll be graded out of five points.

Extra Credit

There are no extra credit opportunities in this class, because the course is designed to favor effort. But note that if a student is ‘on the line’ between letter grades at the end of the quarter, regular, effective, on-topic and kind participation will be considered.

Before asking for ‘additional extra credit’ or other course exceptions, please see my page on student requests to get a sense of what kinds of requests are welcome, and which are unlikely to be received well.

Attendance and Modality

Although you are highly recommended to attend course sessions when you’re feeling healthy, most sessions will be Podcasted, and you’re not required to set foot in our classroom. That said, if you choose not to attend, you’re much more ‘on your own’:

Final Project

This class will feature a final project. The goal of this project is to engage with both the tools and the difficulty of natural language in computing, so a successful project will demonstrate either strong engagement with the interaction of the many systems towards a human-centric goal, or, in an implementation-based final, a practical understanding of one or more of the subtopics we’ve discussed. Topics are discussed below.

You’ll need to write up a proposal for your final project, due in Week 6, and you’ll need to write up the project itself. Your proposal is very important, and as such, is worth a portion your final course grade. The precise desires for the proposal are discussed in the final project rubric.

You are encouraged to work in groups of up to six people from this class, although the assignment can be completed alone if you prefer. In group settings, provided everybody agrees that all members have participated equally, all members will receive the same grade. Please see the “group work” section of the syllabus for more details. The final project will be graded roughly according to this rubric.

I’ll take late projects, but to be fair to the people who worked to turn them in on time, and in recognition that I have limited grading time, they’ll be penalized at 30% per day. So, a 90%-quality paper will get a 60% turned in one day late, a 30% if turned in two days late. Beyond three days late, projects will not be accepted.

You’ll also be grading your own final project, using the rubric, and submitting your grade as a part of the coversheet

The final project will take one of three forms:

Project Option 1: Design a Natural Language System

In this type of project, you’ll choose a task, whether in daily life, or in a specialized domain that you have some knowledge of (e.g. in a hospital or military setting), and design, in theory, a natural language processing pipeline which will help to accomplish it. You can think of it as designing a service or product which could use natural language to make somebody’s life easier (or more entertaining, that’s fine too). Your system must involve speech, text, some exchange of information with the computer, knowledge representation (or lookup), and some interactional component (e.g. it’s not enough to design a phone system that tells you the time when you call it), and should engage with a number of the topics in the course material.

For this project, I’ve created a document describing the Virtual Assistant Interaction process, which details a ‘complete’ interaction, and touches on some of the major issues at play here. This will form a schematic for your write-up, and help guide your paper.

The main requirement is that it be something that existing tools don’t already do. Sample topics would be something like…

… but I’d love to hear your own, creative ideas. Seriously, you’ll have more fun and learn more if you design an interesting system for an interesting situation with interesting problems, or work in something related to your hobbies, daily life, or future career. I’m absolutely willing to entertain weird ideas, so let me know if there’s a wacky idea that’s making this project sound more fun.

In addition to discussing each step in the interaction process and providing sample queries and responses, you’ll be expected to engage with the ethical questions in the virtual assistant interaction process guide above.

You do not need to spend any time on practical or business concerns like funding, server costs or efficiency, annotator costs, licensing, or monetization. You are being graded on your clear understanding and careful consideration of the linguistic elements of the task involved, so imagine that you are designing this system for the good of the users, and will be granted any resources, annotators, programmers, and funding necessary.

For more details, check the rubric.

Additionally, don’t forget to discuss, at the start of the paper, the relative contributions of each group member.

Project Option 2: Implement a NLP Task on your machine

For students with greater familiarity with computing and natural language and a desire to ‘get their hands dirty’, you’ll be asked to choose a specific free natural language processing tool and implement it on novel data, then evaluate the output. For instance, you might study how to conduct syntactic parsing, implement one of the existing toolkits, feed it a small corpus of data that you’ve chosen. You’ll need to install and implement this on your own machine (or on one of ours) (rather than using somebody else’s existed hosted service). You will then conduct a detailed error analysis, describing where it went well, how it went poorly, and some functional areas which could be improved, and then will write this up.

To write this project up, you’ll need to describe:

  1. Your task

  2. What toolkit you’re using (e.g. versions, languages)

  3. How you installed it

  4. The corpus or data you used

  5. How you trained the model (or, what the pre-trained model was trained on)

  6. The code used to run the tool (Comment your code, and provide the source)

  7. What the tool consistently “got right”

  8. Where the tool consistently failed.

  9. Specific ideas on how the tool could be improved

  10. How these advantages and disadvantages would affect implementation in a larger project.

When choosing a corpus, specialized domains are fine (and desirable), and can make for interesting failure modes in a freely available tool (“What does this parser, trained on the Wall Street Journal, make of all of these Tumblr posts?”)

Again, for more details, check the rubric.

Project Option 3: Something Else

Although I provide two options above, so long you come to discuss it with me before the proposal is due, I’m open to helping you develop an idea of your own. These can be research papers, projects, websites, videos, or more. This is a great way to combine work you’re doing on your own, or in another class, with this course.

Final Project Proposal

You’ll need to write up a proposal for your final project, due in Week 6, and you’ll need to write up the project itself. Your proposal is very important, and as such, is worth a portion of your final course grade). The precise desires for the proposal are discussed in the final project rubric.

If you’re doing Option 2 or 3, you are required to meet with Will to discuss your project briefly before submitting a proposal, to make sure you’re not heading down a rabbit hole!

You’re not completely beholden to your proposal (e.g. if you want to change your testing data, or focus on a different set of queries within your domain, that’s totally fine), but this is to prove to me that you have a topic.

You’ll submit this online. If you’re working in a group, have one group member submit it. I’ll give them the comments, which they can share with you.

Project Resources

Here are some resources that should be helpful for your project work:

Remote Instruction Preparedness

In the event of a mandatory pivot to remote learning whether generally, or due to (e.g.) my getting COVID, you will recieve a formal notification codifying any changes which need to be made, and this syllabus will be updated. Changes made may include:

Although I reserve the right to make additional changes as the circumstances merit, I do not anticipate that the overall course schedule, assignment/exam due dates, nor final grade calculation will change.

Course Policies

Masking, Illness and participation

The use of a campus-policy-approved face mask covering your mouth and nose is required in this class, per current university guidance. You will not be allowed to remain in the classroom if you are not wearing a mask or are wearing it improperly. This is not ideal for any of us, but it’s the rules of the game, and it’s a small price to help keep each other healthy. Per regulations, no exceptions will be made unless I receive a letter directly from OSD exempting you. Please make an effort to talk more loudly, given the difficulties of masked speech.

If you or somebody in your life feels sick, shows signs of illness, or is diagnosed with COVID or another communicable illness, DO NOT attend class or in-person office hours until you have been tested and cleared. Attendance is not taken, so when in doubt, stay home and catch the Podcast.

I will not ‘respect you for powering through’, I will be disappointed that you’ve endangered the class, so please, don’t hesitate to skip a session (although you will still be responsible for understanding and reviewing the work on your own).

Group Work

I’m happy to have you work in groups for the final project for this class. However, you will always need to disclose who you worked with and explain the contributions of each person in the group to the assignment.

For the final project, you may divide sections among yourselves, with one person working on Section A, the next on B, so long as you transparently discuss it. But remember, only one grade will be assigned, and when a paper is turned in with your name on it, you are responsible for all of the content, and any problems, in the paper.

Finally, you will be asked to evaluate your group members’ contributions to the final project. If one person emerges as having contributed little, or “drops out” of the project, forcing others to shoulder their load, their grade will reflect that.

If specific problems arise in your group which cannot be resolved by mature and conflict-deescalating discussion among group members, you’re welcome to reach out to the instructor, but this should be your last resort.

Laptops and Computer Access

During this course, you will need to have access to a general purpose computer with speakers and a microphone. Your life will be easiest if you’re using an Apple or a mainstream distribution of Linux (e.g. Fedora or Ubuntu or Arch btw), but Windows computers are workable as well (although it may require jumping through some hoops or logging into a remote server, instructions will be provided).

Additionally, for some assigments, you’ll want to have access to one of the main ‘Digital Assistant’ services, either Apple’s “Siri”, Google’s “Assistant”, Amazon’s “Alexa”, or Microsoft’s “Cortana”. You need not purchase additional hardware for this purpose, as most students will have some way of working with one of these services, whether on their computer, phone, or through a web interface. Please feel free to reach out to us to find the best way to do so, if you’re struggling.

Note that for many assignments, ‘walled garden’ sorts of devices like an iPad, Android Tablet, Smartphone, or Chromebook will make your life more difficult, as we’ll need to run specialized (but free) software and work at a lower level in the operating system. I understand that this can be frustrating, and I’m happy to talk with you about alternative software, labs, and remote logins, but these ‘App Store’ devices are fundamentally limited, and in a more computational degree program, you’ll need to be able to work in or log into a full operating system.

For students who don’t have easy access to a full-featured desktop or laptop computer, I’m happy to work with you to find a no-cost solution. Please reach out to me ASAP, it’s important to me that we can find an approach that works. We will work with you to ensure that availability of technology is not a barrier to successfully completing this class!

Software Policies

To maximize the accessibility of this class to students of all computing and economic backgrounds, this course will not require you to use any non-free (as in ‘gratis’) software on your local machine (excepting Zoom for any remote sessions). You will not be asked nor required to pay for any licenses, and with the exception of web applications like Canvas/ and the relevant virtual assistants, all software recommended will be free and available across platforms. Although you’re welcome to use whatever you’d like to prepare your work, please submit all project files and writeups as plaintext/markdown, HTML, or, perhaps ideally, PDF.

For your final project, do not submit or implement code that requires the use of any non-free-as-in-gratis software (e.g. Mathematica, MATLAB, SAS/SPSS, etc.) to run (with the exception being the use of the voice assistants we’re discussing in class). I also encourage the use of open-source libraries and toolkits. This is both to allow me to run the code on any machine during grading, and to encourage the use of sustainable tools. Additionally, please submit all Python code in Python 3.x, as Python 2 has been deprecated, and include description of dependencies.

Asking Questions and Office Hours

You are highly encouraged to come in to office hours to ask content questions, ask for clarifications about assignments, to ask for more information on a subject that interests you, or to get help on homeworks. Helping you learn this material is quite literally our job, so having students in office hours is no inconvenience.

Do not email us course content or homework questions! If you have a question about course material, post it on Canvas, such that everybody can benefit from the answers (because chances are, they’re struggling in the same places). Adminstrative questions (or questions you’d like to discuss in private) should still be sent to the instructor via email.

Re-grading policy

If you feel that a grade has been assigned in error you should submit a regrade request via Gradescope, or in an e-mail to the Instructor (wstyler@ucsd.edu) ccing your TAs for other grades.

This means that you’ll want to look over every assignment as soon as it’s given back, so that any possible errors can be addressed, and so that you’ll learn from any mistakes.

Academic Integrity

Although you’re welcome to collaborate and form study groups to discuss questions and help each other out with understanding the material, you need to be up-front about the nature of the collaboration, and if somebody did not offer any assistance on the assignment, they should not take credit. Please, don’t be a cheater, for your sake and ours, and refer to the UCSD policy below for more information.

Respectful Discussion Policy

Examining language and languages inevitably leads to discussions of gender, race, sexual orientation, religion, politics, nationality, etc. Opinions are welcome, but all students must be mindful and respectful of others in the class. Speak with others using respectful and kind language, just as you’d like them to do with you, and focus your discussion on the ideas, rather than individuals. Finally, remember that as we discuss and evaluate our conversations, the focus will be on the impact on an individual or group, not the intention or motivation of the actor.

Special accommodations Policy

All requests for special accomodations must be brought to the instructor in the first two weeks of class, ideally sooner. This includes things like religious holidays, university-sponsored events, athletic schedules, conflicts with exam dates, and disability services notes. Because running a big course is quite complex, if I don’t find out about it in the first two weeks, I may not be able to help.

Other Course Policies

Acknowledgements

We respectfully acknowledge that we live, learn, and work on the land of the Kumeyaay/Kumiai nation. Whose land are you on?

UCSD Academic Policies

Accessibility

Students requesting accommodations for this course due to a disability must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD) which is located in University Center 202 behind Center Hall. Students are required to present their AFA letters to Faculty (please make arrangements to contact me privately) and to the OSD Liaison in the department in advance so that accommodations may be arranged.

Contact the OSD for further information - osd@ucsd.edu | 858.534.4382

Academic Integrity

Each student in this course is expected to abide by the UC San Diego Policy on Integrity of Scholarship and to excel with integrity. Any work submitted by a student in this course for academic credit will be the student’s own work.

Academic dishonesty (actions like cheating, plagiarism, aid of academic dishonesty, fabrication, lying, blackmail, bribery, and threatening behavior) will generally result in poor recall and learning of the material, and aren’t acceptable at UCSD. In cases of academic dishonesty, possible in-class academic sanctions can include anything from a zero on the assignment/test/project in question, to a blanket lowering of your final grade by X%, to an assigned and non-negotiable grade of “F” in the course. These sanctions are assigned at the sole discretion of the instructor, and as every case is unique, additional sanctions not listed above may apply. But again, remember that doing the assignments honestly is a part of the learning process, and failure to do so will hurt you more than anybody else.

Classroom Behavior Policy

UCSD Student Conduct Code

UCSD Principles of Community

Religious Accomodation

It is the policy of the university to make reasonable efforts to accommodate students having bona fide religious conflicts with scheduled examinations by providing alternative times or methods to take such examinations. If a student anticipates that a scheduled examination will occur at a time at which his or her religious beliefs prohibit participation in the examination, the student must submit to the instructor a statement describing the nature of the religious conflict and specifying the days and times of conflict.

For final examinations, the statement must be submitted no later than the end of the second week of instruction of the quarter. For all other examinations, the statement must be submitted to the instructor as soon as possible after a particular examination date is scheduled.

If a conflict with the student’s religious beliefs does exist, the instructor will attempt to provide an alternative, equitable examination that does not create undue hardship for the instructor or for the other students in the class.

Discrimination and Harrassment

The University of California, in accordance with applicable federal and state laws and university policies, does not discriminate on the basis of race, color, national origin, religion, sex, gender, gender identity, gender expression, pregnancy (including pregnancy, childbirth, and medical conditions related to pregnancy or childbirth), physical or mental disability, medical condition, genetic information, ancestry, marital status, age, sexual orientation, citizenship, or service in the uniformed services (including membership, application for membership, performance of service, application for service, or obligation for service in the uniformed services). The university also prohibits harassment based on these protected categories, including sexual harassment, as well as sexual assault, domestic violence, dating violence, and stalking. The nondiscrimination policy covers admission, access, and treatment in university programs and activities.

If students have questions about student-related nondiscrimination policies or concerns about possible discrimination or harassment, they should contact the Office for the Prevention of Harassment & Discrimination (OPHD) at (858) 534- 8298, ophd@ucsd.edu, or reportbias.ucsd.edu.

Campus policies provide for a prompt and effective response to student complaints. This response may include alternative resolution procedures or formal investigation. Students will be informed about complaint resolution options.

A student who chooses not to report may still contact CARE at the Sexual Assault Resource Center for more information, emotional support, individual and group counseling, and/or assistance with obtaining a medical exam. For off-campus support services, a student may contact the Center for Community Solutions. Other confidential resources on campus include Counseling and Psychological Services, Office of the Ombuds, and Student Health Services.

CARE at the Sexual Assault Resource Center - 858.534.5793 or sarc@ucsd.edu Counseling and Psychological Services (CAPS) - 858.534.3755