students: ilya paskhover, itay gal supervisors: oleg rokhlenko, nadav golbandi
TRANSCRIPT
Tracking user attention on web
Students: Ilya Paskhover, Itay GalSupervisors: Oleg Rokhlenko, Nadav Golbandi
2
DefinitionsAMT - Amazon Mechanical Turk is a crowd
sourcing internet marketplace that enables to coordinate the use of human intelligence to perform computational tasks
HIT - Human Intelligence Task.Requester - a user which publishes and pays
for completing HITs.Worker - a user which completes HITs and
gets paid for it.
3
GoalsBuild a framework that will enable a
researcher to run experiments to find the best possible slots for advertising on a web page.
Getting acquainted with Amazon Mechanical Turk. Defining HIT structure.Generating formatted HIT.Sending HIT using AMT API.
Receiving and displaying HIT results.Designing a GUI for the framework.
4
Workflow
Loading Data from the XML
files
templates.xml
images.xml
articles.xml New HIT
Amazon MTurk
HITs List
Result Display
hits.xml
coolDowns.xml
Receiving Results
Sending HIT
5
Used technologiesAMT SDKJAVAXMLHTMLJavaScriptSwing
6
MethodologyHIT Structure
Every HIT comprises from 4 screens showed to the worker one after another. 1) Instructions set:
Explains to the worker in AMT what he has to do.
7
MethodologyHIT Structure (cont.)
2) Article:An article which a worker has to read. Each article and a random set of images will be arranged in a chosen template.
Template - A table of 10 rows and 10 columns. Each cell will be referred as a box. Each box can be filled with a paragraph or an image. A box size can range from 1X1 up to 10X10 cells (the whole table).
The requester can add a question to each paragraph or image. This question will be added automatically to the questionnaire.
Article - An article is a set of paragraphs, each paragraph can contain a question.
Images - The database contains sets of images, each set represent a different category, such as “animals”, “fruits” etc. For Every HIT a random set of images is selected in such a way that only one image will be selected from each set.
8
Methodology (cont.)HIT Structure (cont.)
3) Cool down:A screen with an unrelated task, which can take a few seconds up to a few minutes. This task takes place in order to create a small margin from reading the article and answering the questions.
9
Methodology (cont.)HIT Structure (cont.)
4) Questionnaire:A set of questions the worker has to answer. The set will be built during the generation of the article.If the box contains a question, the question will be added and will be presented in that order.
10
Methodology (cont.)AMT API
Using AMT SDKWe chose to use Java Amazon SDK to interact with the AMT. This SDK provides a basic functionality for interacting with AMT. Java allows us to use Swing as a GUI framework.
The HIT is formatted with HTML which provides much more flexibility in designing the structure of our article and images.
11
Methodology (cont.)AMT API (cont.)
AMT enables to build a one page HIT only. JavaScript is used to manipulate the HTML code in such a way that we can create a multiple pages form.
A worker should not be able to see the article and the images once he already saw the questionnaire. Using cookies, we identify the user’s state and prevent access to the article once he stepped to the questionnaire page.
Every time a requester opens a HIT details, results will be downloaded from AMT and updated in our storage.
12
Methodology (cont.)GUI
The GUI contains 5 screens allowing the requester to define HIT structure and details, viewing available HITs including results for each HIT.
The GUI is built using Swing.
Data storageTemplate definitions, available images, articles,
cool downs and HITs (including all internal data) are saved in XML files in a well formatted and easy, readable form.
13
Completed goalsFramework requirements definition.Data Structure design.Creating HIT.Interacting with AMT API.Load and store data using xml files.Building a GUI.Displaying basic process of the results.Documentation.Ant Installer.
14
ConclusionsThe AMT API for Java is very limited for
creating designed HIT templates, therefore, we decided to implement some parts of the HIT in HTML.
AMT does not allow to create HITs containing 2 pages. Since we needed to create multi-screen HITs while preventing a user from going back to the previous screen, we had to use some JavaScript manipulations.
15
Conclusions (cont.)Frequent meetings proved to be crucial for better
understanding the project requirements and adjusting the implementation accordingly.
XML is easy to use and a very comfortable way to store structural data. Since it is common standard it is very well documented.
Eclipse Java Swing is a simple and friendly framework to design a GUI and also well documented.