Using RegEx for your Answer Keys

The answer key for your questions, expectations, hints, and misconceptions can be expressed Regular Expressions (RegEx). RegEx is a method for creating sequences of characters that forms search patterns, which are used for pattern matching. For ASAT, RegEx is used to improve feedback, by more accurately accounting for the possible student responses to your questions. **It is important to keep in mind that student responses will also be compared answer keys by their semantic similarity (LSA cosine similarity).** Including RegEx is a way of improving on the cosine similarity measures. Cosine similarity does not always perform well when there is a small amount of words in the answer key. 

Consider the following situation. Your answer key is "Yes", and you are using cosine similarity to compare the student input to the answer key. In certain corpora, "yes" and "no" may be very closely related. If we are solely relying on cosine similarity measures, our system may give the feedback set for a "yes" answer whenever the student says "no". RegEx helps resolve this issue.

How does RegEx resolve this issue?

First, using RegEx you can tell the AutoTutor system exactly what kind of words to look for. There is certainly more than one way of saying "no" using natural language. You want your system to know that "no" and "nope" are essentially the same thing, and that "don't agree" and "agree" are opposites, despite their cosine similarity. 

Let's take a look at the following example. 

b[Nn]o\b|\b[Nn]ope\b|\b[Nn]ah\b|[Ww]rong|[Bb]ad|[Ii]ncorrect|[Ii]sn't correct|[Ii]snt correct|[Nn]ot correct|[Dd]isagree|[Dd]on't agree|[Dd]ont agree|[Nn]ot agree,\b[Nn]o\b|\b[Nn]ope\b|\b[Nn]ah\b|[Ww]rong|[Bb]ad|[Ii]ncorrect|[Ii]sn't correct|[Ii]snt correct|[Nn]ot correct|[Dd]isagree|[Dd]on't agree|[Dd]ont agree|[Nn]ot agree,\b[Nn]o\b|\b[Nn]ope\b|\b[Nn]ah\b|[Ww]rong|[Bb]ad|[Ii]ncorrect|[Ii]sn't correct|[Ii]snt correct|[Nn]ot correct|[Dd]isagree|[Dd]on't agree|[Dd]ont agree|[Nn]ot

This is a RegEx string that can be used to account for the multiple ways a negative response can be articulated. 

  • "b .. \b" is how you set "word boundaries". 
  • Placing brackets around [Nn] indicates that both capitalized "No" and undercase "no" should be considered equivalent  
  • the I character (also referred to as "pipe" or "vertical bar") in b[Nn]o\b|\b[Nn]ope\b denotes a choice operator. This is used to tell the system that both "No" and "Nope" should be considered equivalent. 
In total, the above RegEx syntax is saying "No, no, Nope, nope, Nah, nah, Wrong, wrong" etc should all be considered to represent the same thing. You can find more details and explanations about RegEx syntax here

Comments