Sha Li

-- Once you've got a task to do, it's better to do it than live with the fear of it.
23 Oct 2018

Grounded Language Learning by Mechanical Turker Descent

Paper Title: Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Natural Language Grouding

The task of natural language grounding is to find a mapping from natural language to actionable terms.

Turkers are asked to provide training pairs $(x,y)$ where $x$ is a natural language command and $y$ is a action sequence in the game world of “Mastering the Dungeon”. The set of $y$ is rather restricted and turkers are encouraged to use flexible vocabulary in giving the command $x$.

An example is:

“Steal the crown from the troll and put it on” $\rightarrow$ take silver crown from troll, wear silver crown

Mechanical Turk Descent

This paper purposes a new way of collecting training data from turkers by encouraging competition across turkers in the short run and collaborating to train a model in the long run.

The training consists of several rounds. In each round, turkers think of new training examples for their own models (“training their own dragons”) and compete against each other for a monetary reward. Each model is evaluated on the common test set and the training sets generated by other turkers. Then at the end of each round, all of the training examples are collected and merged into the common training set and common test set.

Mechanical Turk Descent

This mechanism incentizes turkers to provide examples that are not too easy nor too hard: both too easy or too hard examples will not improve the turkers score compared to others. In a sense, the turkers are creating a curriculum for the model of harder and harder examples.

GraphWorld Representation

The world state is represented by a graph where each concept, object, location and actor is a node and the edges represent relations between them. “Relation” here is used in a quite general sense, covering positional relations and properties.


Each action has a set of prerequisites and applies a transformation on the graph.

Language Grounding Model

Since the language grounding task is essentially finding a mapping between two sequences, a natural baseline is the seq2seq model with attention. On the encoder side, our input is natural language, which fits the seq2seq model well, but on the decoder side, our output is a structured action sequence. This bring restrictions to our decoding process and also inductive biases that should be included into the model design.

  1. Compositionality of actions. Every action is associated with arguments and thus can be represented as $(action type, arg1, arg2)$ . In its vector representation, the action type embedding is concatenated by the two argument embeddings $a = [Emb(type); Emb(arg1); Emb(arg2) ]$. This compositional representation is data efficient since the different actions share some common action types or arguments.
  2. Selection preference. An attention vector is computed by taking each candidate left argument(right argument) as the query vector and the hidden state of the encoder as the key/value.
  3. Environmental clues. A context vector is added to the input of the GRU by concatenating the count of the action in the previous decoded sequence and the current location. $env_{a,j} = [count_{a,j}; location_j ]$ .
  4. Action prerequisites. Hard constraints are imposed during decoding so that only valid actions are taken.

In the AC-seq2seq model, a separate hidden state $s_{a,j}$ is maintained for each candidate action at each time step (the standard seq2seq model only has $s_j$.) This is like having $|A|$ GRUs working simutaneously on the decoder side, but all GRUs share parameters. Hidden states are updated by \(s_{a,j} = GRU([a; attn_{a,j}, env_{a,j}], s_{a,j-1})\)


Ablations for Mechanical Turk Descent include removing the time constraint for a fix number of training examples, removing feedback from the model and removing the competition altogether.


Model-wise, AC-seq2seq consistently outperforms the baseline seq2seq model.


Many companies and researchers rely on crowdworkers to provide labeled data for training complex machine learning models. Instead of developing methods to rate the quality of crowdsourced labels and remove noisy labels, we can incentize crowdworkers to provide higher quality labels by gamification. (The authors put their data collection task in the context of playing a text adventure game where each worker gets to train their own dragon.) Although the paper works on the text grounding task, the data collection gamification idea can be applied to a wide spectrum of tasks.

This is a part of the DMG reading group series. Stay tuned for updates:)

Til next time,
Zoey at 00:00