final project report - sjtuwang-xb/wireless_new/material/... · final project report group 24...

47
Final Project Report Group 24 , ¿, øœ, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website Design 3 3.1 The Index Page ............................ 3 3.1.1 Region 1: Navigation Bar .................. 3 3.1.2 The Video Background ................... 4 3.1.3 Region 2: Search Bar .................... 5 3.2 The Conferences Page ........................ 6 3.3 Paper Search Results ......................... 8 3.3.1 Region 1: A Uniform interface ............... 8 3.3.2 Region 2: The Year Picker ................. 8 3.3.3 Region 3: Search Results .................. 9 3.3.4 Region 4: Prediction ..................... 9 3.3.5 Region 5: Pagination ..................... 11 3.4 Individual Paper Page ........................ 13 3.4.1 Region 1: Jumbotron .................... 13 3.4.2 Region 2: Paper Recommendation ............. 13 3.5 Author Search Results ........................ 16 3.6 Individual Author Page ....................... 17 3.6.1 Region 1: The Jumbotron .................. 17 3.6.2 Region 2: Visualisation ................... 17 3.7 Individual Conference Page ..................... 20 4 Elasticsearch 20 4.1 Why Elasticsearch .......................... 20 4.2 Installing Elasticsearch ........................ 20 4.3 The Role of Elasticsearch ...................... 21 4.4 Importing Data from Database ................... 21 4.4.1 The Configuration File for Logstash ............ 21 4.4.2 Running Logstash ...................... 23 4.5 Querying Elasticsearch ........................ 24 4.6 Aggregations ............................. 26 1

Upload: others

Post on 24-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Final Project Report

Group 24徐尚宁, 况羿, 曹建真, 吴昊

June 24, 2018

Contents

1 Introduction 2

2 Overview 2

3 Website Design 33.1 The Index Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1.1 Region 1: Navigation Bar . . . . . . . . . . . . . . . . . . 33.1.2 The Video Background . . . . . . . . . . . . . . . . . . . 43.1.3 Region 2: Search Bar . . . . . . . . . . . . . . . . . . . . 5

3.2 The Conferences Page . . . . . . . . . . . . . . . . . . . . . . . . 63.3 Paper Search Results . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3.1 Region 1: A Uniform interface . . . . . . . . . . . . . . . 83.3.2 Region 2: The Year Picker . . . . . . . . . . . . . . . . . 83.3.3 Region 3: Search Results . . . . . . . . . . . . . . . . . . 93.3.4 Region 4: Prediction . . . . . . . . . . . . . . . . . . . . . 93.3.5 Region 5: Pagination . . . . . . . . . . . . . . . . . . . . . 11

3.4 Individual Paper Page . . . . . . . . . . . . . . . . . . . . . . . . 133.4.1 Region 1: Jumbotron . . . . . . . . . . . . . . . . . . . . 133.4.2 Region 2: Paper Recommendation . . . . . . . . . . . . . 13

3.5 Author Search Results . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Individual Author Page . . . . . . . . . . . . . . . . . . . . . . . 17

3.6.1 Region 1: The Jumbotron . . . . . . . . . . . . . . . . . . 173.6.2 Region 2: Visualisation . . . . . . . . . . . . . . . . . . . 17

3.7 Individual Conference Page . . . . . . . . . . . . . . . . . . . . . 20

4 Elasticsearch 204.1 Why Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Installing Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . 204.3 The Role of Elasticsearch . . . . . . . . . . . . . . . . . . . . . . 214.4 Importing Data from Database . . . . . . . . . . . . . . . . . . . 21

4.4.1 The Configuration File for Logstash . . . . . . . . . . . . 214.4.2 Running Logstash . . . . . . . . . . . . . . . . . . . . . . 23

4.5 Querying Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . 244.6 Aggregations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1

Page 2: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

5 Backend 275.1 Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Pagination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 Connect Elasticsearch . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Paper Recommendation 376.1 In The Single Paper Page . . . . . . . . . . . . . . . . . . . . . . 376.2 In The Single Author Page . . . . . . . . . . . . . . . . . . . . . 38

7 Student Hierarchical Tree 387.1 mytree.html . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.2 mytree.php . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8 Machine Learing Improvement 458.1 Teacher–Student Relationship . . . . . . . . . . . . . . . . . . . . 458.2 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

9 Conclusion 47

1 Introduction

In this project report, we present a complete solution to the problem of academicsearching. The core of academic searching is information aggregation and re-lationship discovering, which are addressed in our project. Along with the keyfunctionality, a web user interface is designed with the target of informationpresentation and ease of use.

2 Overview

Here is an overview of the website structure. Every webpage developed by us islisted here, identified by its route.

/index The index page;

/author Depending on the query string, it is either the search results for search-ing author or the individual page for each author;

/paper Depending on the query string, it is either the search results for search-ing paper or the individual page for each paper;

/conferences The Hall of Conferences page. If the query string id is specified,this is the individual page for each conference.

The frontend is built on the Bootstrap framework from the beginning. Codeigniteris the choice of backend framework. Additionally, Elasticsearch is brought in toenhance the search functionality. d3.js supports the visualisation of information.

Tasks are distributed as follows:

曹曹曹建建建真真真 Tree of student hierarchy;

2

Page 3: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

况况况羿羿羿 Codeigniter backend, paper recommendation;

吴吴吴昊昊昊 Machine learning improvement;

徐徐徐尚尚尚宁宁宁 Elasticsearch, Bootstrap frontend

3 Website Design

The term “website design” encompassses much more than the visual elementsof a website. It concerns the user experience when he or she is exploring yourwebsite. Many design choices are made through out developing the website. TheBootstrap framework is used in the UI design to provide a consistent experience.

3.1 The Index Page

Figure 1: The index page

3.1.1 Region 1: Navigation Bar

Region 1 in Figure 1 is the navigation bar. The navigation bar is designed tobe simple so it only have two hyperlinks, Home and Conferences. The users canonly jump to these two pages through the navigation bar. The navigation baris fluid, in a sense that it will display properly on a smaller screen, like mobilephone, a feature provided by the Boostrap framework. Figure 2 illustrates thisfeature.

3

Page 4: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Figure 2: The index page, adapted for small screen

3.1.2 The Video Background

The index page employs a technique that is now common among startup web-sites. A full-width video is used as the background of an index page to conveya positive message to our users. However, the video background is a burden formobile users, and it is planned that the video background be replaced with astatic image when the user visits our website on mobile phone.

In order to make the video display full-width and act like a background, CSSin Listing 1 are applied to the video element.

Listing 1 does several things. It sets the position CSS property of thevideo to fixed to prevent space from being allocated to the video element.Then z-index is set to -100 so that the video would not cover other elements.The video’s background-size is set to cover to stretch the video to fit the viewport. In this way, the width of the video is set to the width of the view port,

4

Page 5: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

#background {

position: fixed;

top: 50%;

left: 50%;

min-width: 100%;

min-height: 100%;

width: auto;

height: auto;

z-index: -100;

-webkit-transform: translateX(-50%) translateY(-50%);

transform: translateX(-50%) translateY(-50%);

background-size: cover;

}

Listing 1: CSS to style the video background

and the height is derived from the proportion of the video.

3.1.3 Region 2: Search Bar

Region 2 contains the search bar. The search bar is carefully designed to re-duce complexity. The reduce in complexity actually comes from its enhancedsearch ability. Search for conferences is removed in favour of a dedicated con-ference webpage that includes all conferences. The functionality of searchingfor authors, papers and conferences are, in some ways, combined and unifiedinto search for papers. Only an SQL-based search for authors is retained. It isexpected that the search for authors will be less used.

A dropdown menu is provided to toggle between different modes. Note thatswitching the search mode changes not only where the form is submitted to, butalso the content of autocompletion. If searching for authors is selected, the inputbox will try to autocomplete the user input with author names. This is donein the file navbar.js. When clicking on the dropdown menu item, the action

attribute of form will be changed accordingly, and the source of autocompletewill be switched.

5

Page 6: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Figure 3: Autocomplete in action

3.2 The Conferences Page

It has been mentioned above that search for conferences are altogether removeddue to the small number of conferences. It is replaced with a newly designedHall of Conferences page, located under the URL /conferences. Conferencesare organised in cards. Each card comes with the name of the conference anda brief description. Clicking on the name of a conference will guide users to adedicated page of that conference.

Figure 4: The conferences page

This webpage is a typical use of the Bootstrap grid layout system. Contentis stored in a container and organized in a series of rows and columns. Listing 2gives an outline of the grid layout.

6

Page 7: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<div class="container">

<div class="row my-4">

<div class="col-4">

...

</div>

<div class="col-4">

...

</div>

<div class="col-4">

...

</div>

</div>

<div class="row my-4">

<div class="col-4">

...

</div>

<div class="col-4">

...

</div>

<div class="col-4">

...

</div>

</div>

...

</div>

Listing 2: Grid layout for the conferences page

7

Page 8: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.3 Paper Search Results

The page of paper search results (Figure) is the centre of navigation of anacademic search engine. It is divided into several regions and the design concernof each region is stated below.

Figure 5: Paper search results

3.3.1 Region 1: A Uniform interface

This page inherit two elements from the index page: the navigation bar and thesearch bar. Easy access to the search functionality is crucial for a search engine.The search bar functions exactly as the one on the front page.

3.3.2 Region 2: The Year Picker

A minimalistic date picker is provided to present an intuitive interface for usersto narraow the range of search results, and it mostly eliminates the need forfrontend input validation. Listing 3 shows the only two lines of code that set theinput mode to allowing years only and bind the event with the input element.

8

Page 9: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

$('.date-own').datepicker({

minViewMode: 2,

format: 'yyyy'

});

<label for="from-year" class="mx-1">Select Year Range:

From</label>↪→

<input class="date-own form-control" style="width: 64px;"

type="text" value="2008" placeholder="2018" name="from">↪→

<label for="to-year" class="mx-2">To</label>

<input class="date-own form-control" style="width: 64px;"

type="text" value="2018" placeholder="2018" name="to">↪→

Listing 3: Javascript required for the date picker and HTML for the input field

3.3.3 Region 3: Search Results

Each search results, that is, a paper, is included in a Bootstrap-style card. Apaper’s publish year and conference are organised in a tag-like element. Whenhovered on, the tag will change its colour and displays as clickable. All theauthors of a paper are shown on the card. The paper title, the conference andauthors’ names are all hyperlinked to their individual pages. The code for thecard is available in Listing 4.

<div class="card paper">

<div class="card-body">

<h5 class="card-title">

<a href="/paper?id=2038812321">efficient mining of emerging

patterns discovering trends and differences</a>↪→

</h5>

<a href="#" class="tag">1999</a>

<a href="#" class="tag">SIGKDD</a>

<h6 class="card-subtitle mb-2 text-muted">

By <a href="#" class="muted-link">xinbing wang</a>,

<a href="#" class="muted-link">xinbing wang</a>,

<a href="#" class="muted-link">xinbing wang</a>,

<a href="#" class="muted-link">xinbing wang</a> and <a

href="#">more</a>↪→

</h6>

</div>

</div>

Listing 4: HTML for the card that presents information about papers

3.3.4 Region 4: Prediction

There is one natural problem with consolidating papers, authors, and conferncessearch into one search: users will spend more time on navigating the searchresults if he or she intends to search for authors or conferences. That is why

9

Page 10: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<div class="card text-white bg-warning mb-3" style="max-width:

40rem;">↪→

<div class="card-header"><h4>Are you looking for</h4></div>

<div class="card-body">

<h5 class="card-title">Author</h5>

<ul class="card-text">

<li><a href="#" class="muted-link text-white">xinbing</a>

from Shanghai Jiaotong University</li>↪→

<li><a href="#" class="muted-link text-white">xinbing</a>

from Shanghai Jiaotong University</li>↪→

<li><a href="#" class="muted-link text-white">xinbing</a>

from Shanghai Jiaotong University</li>↪→

<li><a href="#" class="muted-link text-white">xinbing</a>

from Shanghai Jiaotong University</li>↪→

</ul>

<h5 class="card-title">Conference</h5>

<ul class="card-text">

<li><a href="#" class="muted-link text-white">WWW</a></li>

</ul>

</div>

</div>

Listing 5: HTML for the side panel, with dummy data

a side panel is designed to give narrower and more accurate prediction of whatusers want. For each potential author, his/her affiliation is displayed along withthe name so that authors with the same name can be distinguished. Listing 5gives the HTML for the side panel with dummy data.

10

Page 11: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.3.5 Region 5: Pagination

Figure 6: Pagination buttons

Like many other elements, pagination buttons are styled with Bootstrap. Theaddition of a few classes creates large, clear and delightful-to-use paginationbuttons.

11

Page 12: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<nav class="row justify-content-center" aria-label="Page

navigation">↪→

<ul class="pagination">

<li class="page-item">

<a class="page-link" href="#" aria-label="Previous">

<span aria-hidden="true">&laquo;</span>

<span class="sr-only">Previous</span>

</a>

</li>

<li class="page-item"><a class="page-link"

href="#">1</a></li>↪→

<li class="page-item"><a class="page-link"

href="#">2</a></li>↪→

<li class="page-item"><a class="page-link"

href="#">3</a></li>↪→

<li class="page-item">

<a class="page-link" href="#" aria-label="Next">

<span aria-hidden="true">&raquo;</span>

<span class="sr-only">Next</span>

</a>

</li>

</ul>

</nav>

Listing 6: HTML for pagination

The Bootstrap is considerate in designing the pagination buttons, going on tonote the accessibility issues with conventional buttons. The pagination buttonsare enclosed by nav element to indicate its functionality. It is especially helpfulfor screen reader for blind people. The aria-label attribute provides screenreader a short description of the use of the nav element.

Note that the pagination buttons are actually an unordered list of hyper-links, instead of stacks of <div>s. This organisation of pagination buttons notonly enhances the readability of code, but also aids the screen reader, becauseunnecessary information (from the view point of a blind person) is removed andthe screen reader can parse the HTML more easily.

12

Page 13: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.4 Individual Paper Page

Figure 7: Individual paper page

3.4.1 Region 1: Jumbotron

Jumbotron is a lightweight, flexible component that can optionally extend theentire viewport to showcase key marketing messages on website. In the page forevery paper, a jumbotron is used to convey key details about the paper, that is,

1. Paper title

2. Conference

3. Publish year

4. List of authors

5. Number of citations

The paper title extends to the full width of the webpage. The jumbotron issurrounded by the navigation bar and the search bar, a staple of the website.

3.4.2 Region 2: Paper Recommendation

Just below the search bar is the area for paper recommendation. The paperrecommendation interface has a horizontal scroll bar, a design that is popularcurrently due to the vast number of mobile devices, because a horizontal scrollbar gives more horizontal space. Scrolling is possible thanks to flexbox utilityin CSS, as shown in Listing 7.

13

Page 14: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

.scrolling-wrapper-flexbox {

display: flex;

flex-wrap: nowrap;

overflow-x: auto;

}

.scrollable-card {

flex: 0 0 auto;

width: 400px;

}

Listing 7: Scrollable cards that uses flexbox

scrolling-wrapper-flexbox is a wrapper for a container class where allthe scrollable cards reside in. It turns off wrapping and sets overflow-x toauto, and thus overflowing is allowed.

Then each card in the container belongs to the class scrollable-card. Itdefines a minimum width for each card, or else each card would resize itself tofit the width of the view port rather than overflow.

The flex property of each card is set to control its behaviour. Combinedwith Listing 8, Listing 7 renders scrollable cards.

14

Page 15: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<div class="row my-3 scrolling-wrapper-flexbox">

<div class="card mx-2 scrollable-card">

<div class="card-header text-light" style="background-color:

#bf3e11">Featured</div>↪→

<div class="card-body">

<h5 class="card-title">

<a href="http://127.0.0.1/paper?id=2038812321">efficient

mining of emerging patterns discovering trends and

differences</a>

↪→

↪→

</h5>

<a href="#" class="tag">1999</a>

<a href="#" class="tag">SIGKDD</a>

</div>

</div>

<div class="card mx-2 scrollable-card">

<div class="card-header text-light" style="background-color:

#bf3e11">Featured</div>↪→

<div class="card-body">

<h5 class="card-title">

<a href="http://127.0.0.1/paper?id=2038812321">efficient

mining of emerging patterns discovering trends and

differences</a>

↪→

↪→

</h5>

<a href="#" class="tag">1999</a>

<a href="#" class="tag">SIGKDD</a>

</div>

</div>

...

</div>

Listing 8: HTML for scrollable cards

15

Page 16: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.5 Author Search Results

Figure 8: The author search result page, with clickable row

Compared with previous pages, the author search result page in Figure 8 takeson a much simpler design. As most functions are moved to the unified search,the author search aims to do only one thing: provide accurate search resultsfor users who know what they are looking for. The author search is based ondatabase query. Author information is presented in a table, where each row isclickable and will lead to individual author page.

16

Page 17: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.6 Individual Author Page

Figure 9: Individual author page

3.6.1 Region 1: The Jumbotron

This webpage again utilises the jumbotron component to present key infor-mation. The author’s name and affiliation are featured prominently. Eachaffiliation is styled with a bright yellow button and stacked together.

3.6.2 Region 2: Visualisation

Figure 10: Force–Directed Graph

17

Page 18: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

The force–directed graph produces clear visualisation of relationships betweencollaborators. As the force–directed graph is the homework of the previous lab,only a brief outline is given here:

1. Predict relationships between authors that have collaborated at least onceand save the prediction in a table;

2. At the request of a client browser, fetch all relationships between therequested author and their collaborators from the table;

3. Construct a JSON file as a repsonse. In the JSON, authors are dividedinto 4 groups: no relationship, student of the requested author, advisor ofthe requested author, prediction error.

4. Generate a force–directed graph in the browser with fetched JSON.

To organise the layout better, a button group is introduced (in Figure 11)to switch between student tree, force–directed graph and the list of papers bythe author.

18

Page 19: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Figure 11: Switch buttons

19

Page 20: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

3.7 Individual Conference Page

Figure 12: Individual conference page

Each conference has its name and a short description. Total number of paperspublished at the conference is given with smaller font.

4 Elasticsearch

4.1 Why Elasticsearch

The need for Elasticsearch comes from inconsistency and low-quality resultsfrom database search. The inconsistency refers to the difference between tworesults even if the same query is executed twice. The randomness returned by adatabase query is helpful in that it improves access speed, but users will at leastbe confused about the inconsistency. And database search is in fact a limitedwildcard matching, with no support for, say, fuzzy searching.

4.2 Installing Elasticsearch

Installing Elasticsearch on a Linux computer is actually very easy, as mostdistributions have packaged Elasticsearch for use. Just fetch the package fromthe repository and install with a built-in package manager, like the installationof other software. Kibana, a data-visualisation dashboard is also installed toprovide an easy frontend to execute query. To run Elasticsearch and Kibana,start the systemd service unit:

$ sudo systemd start elasticsearch

$ sudo systemd start kibana

Listing 9: Starting Elasticsearch and Kibana

20

Page 21: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

The built-in dashboard provided by Kibana is accessible at http://127.0.0.1:9200.

4.3 The Role of Elasticsearch

The presence of Elasticsearch sometimes comes at odds with the database. Sometasks like searching for an author can be accomplished both by Elasticsearchand by an SQL, with the only difference being the results returned in JSONor in PHP arrays. However, indices in Elasticsearch can’t represent complexrelationships like a relational database. Indices of Elasticsearch act like onetable in a database which can’t link to other tables by foreign keys. Indiceshave fields, just like tables have columns. Imagine modeling the relationshipbetween paper, authors, and affiliations in just one table named paper. paper

would have a column named author which store multiple authors in just onecolumn, and some authors may have more than one affiliation. Replacing arelational database with Elasticsearch is only an invitation to diaster.

These disadavantages for Elasticsearch lead to a natural conclusion thatElasticsearch should only handle search and MariaDB should be used wheneverthere is a specific need for data.

4.4 Importing Data from Database

Unlike Solr, where developers have DIH (Data Import Handler) and can cus-tomise their schema file to map data in a database to indices, there is no standardway to import data in Elasticsearch. There is even one group who claims towrite a Python script to import data using the Python API of Elasticsearch.The method adopted here is to use the Logstash, another product by the de-veloper of Elasticsearch and Kibana. Developed specifically for analysis of logfiles, Logstash is only a bridge between Elasticsearch and the database.

4.4.1 The Configuration File for Logstash

Data are drawn from the database and imported to Logstash by the jdbc inputplugin of Logstash. The configuration file of the jdbc input plugin employs thesyntax of Ruby. Listing 10 is the configuration file used in the project.

21

Page 22: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

input {

jdbc {

jdbc_connection_string => "jdbc:mariadb://localhost:3306/academic"

jdbc_validate_connection => true

jdbc_driver_library => "/usr/share/java/mariadb-jdbc/mariadb-java-client.jar"

jdbc_driver_class => "org.mariadb.jdbc.Driver"

jdbc_user => "academic"

jdbc_password => ""

statement => "SELECT paper.id AS id, conference.name AS conference,

title, publish_year, conference_id,

GROUP_CONCAT(DISTINCT author.name SEPARATOR ',') AS authors,

GROUP_CONCAT(DISTINCT author.id SEPARATOR ',') AS authors_id

FROM paper_author_affiliation JOIN paper ON paper_id=paper.id

JOIN conference ON conference_id=conference.id

JOIN author ON author_id=author.id GROUP BY paper.id"

}

}

filter {

mutate {

split => { "authors" => "," }

split => { "authors_id" => "," }

}

}

output {

elasticsearch {

hosts => "192.168.1.108:9200"

index => "papers"

document_id => "%{id}"

}

}

Listing 10: A configuration file for the jdbc input plugin to import informationabout papers

In Listing 10, the first block determines an input source. Many parametersfor accessing a database, like the database URL, user name, passwords areconfigured in the block. jdbc driver library defines the path to the Javadatabase connector. The most important part is the statement field. It definesthe SQL to fetch all the data. The SQL will be executed and each column inthe result corresponds to a field in Elasticsearch.

The limit of only one SQL in the configuration file is too strict compared tothe schema file in Solr. To use multiple SQL statements in Elasticsearch, morethan one input source have to be configured. In Solr, multiple queries can bebound to a field, and one query can use the results of another SQL.

One obstacle was met in constructing the query. Since the Elasticsearch inour project puts its major focus on paper search, it would be convenient toprovide a list of authors for each paper. And each author’s name should be

22

Page 23: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

linked to their individual page, which requires that their IDs are known. Withan author’s ID, a hyperlink to his or her page can be generated. Then how is itpossible to squeeze many authors into one column? The answer is GROUP CONCAT

and the filter option.As the name suggested, when rows are grouped in a MySQL query, GROUP CONCAT

concatenates non-NULL values in a column together as a string. DISTINCT tellsthe database to remove duplicates. SEPARATOR defines the separator to usebetween two results. All the authors of a paper and their IDs are read intoLogstash first as a comma-separated string and later processed by filter inListing 10.

Transformations that are required by filter will be applied to data. Thesplit action in mutate make Logstash split the strings in the field authors andauthors id at commas.

The output block reiterates the point that Logstash is only a bridge betweenthe database and Elasticsearch. output configures the output-to-Elasticsearchplugin of Logstash. Fields are the host on which Elasticsearch is running, thename of the index to be created, and the requirement that the value of the fielddocument id is the value of the column id in the query. The document id fieldis internally used by Elasticsearch to uniquely identify an index, or documentas described in Elasticsearch reference. If a document id field is not supplied,Elasticsearch will generate one itself, which is redundant.

4.4.2 Running Logstash

The shell command in Listing 11 starts Logstash:

sudo /usr/share/logstash/bin/logstash\

--path.settings /etc/logstash\

--pipeline.unsafe_shutdown --path.config papers.rb

Listing 11: Start Logstash in terminal

Logstash is designed to be almost impossible to kill, at least not before allreceived events have been pushed to the outputs. The command line option--pipeline.unsafe shutdown allows users to kill the Logstash process, whichis pretty convenient in case the database import is running too long. The dataloss in the termination of Logstash is negligible because it is easy to clear theindices and start over. Once Logstash is running, it will automatically start theimporting process.

It is easy to verify that data has been imported by executing a query inDevTools of Kibana, as shown in Figure 13. The number of hits shown in theright panel matches the number of papers in the database.

23

Page 24: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Figure 13: Verify a successful import

4.5 Querying Elasticsearch

Elasticsearch provides RESTful APIs for querying. Query can be sent by anyclient that supports JSON as request body. In particular, all queries in ourproject are sent either by the Kibana interface or by curl.

Listing 12 shows the query that are used in the website.

{

"query": {

"multi_match": {

"query": "covering"

}

},

"aggs": {

"years": {

"histogram": {

"field": "publish_year",

"interval": 1,

"min_doc_count": 1

}

}

}

}

Listing 12: The query used in the website

24

Page 25: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Instead of the default basic match to match the keyword with values in afield, multi match without the field parameter will try to match the keywordagainst every field in Elasticsearch. Listing 13 is the output of the query inListing 12 sans the aggregation part.

{

"took": 11,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 33,

"max_score": 10.95135,

"hits": [

{

"_index": "papers",

"_type": "doc",

"_id": "2124166527",

"_score": 10.95135,

"_source": {

"authors_id": [

"2112608526",

"2215741427"

],

"@timestamp": "2018-06-04T12:12:37.277Z",

"authors": [

"kohei hatano",

"manfred k warmuth"

],

"conference_id": 1127325140,

"publish_year": 2003,

"@version": "1",

"conference": "NIPS",

"id": 2124166527,

"title": "boosting versus covering"

}

},

...

]

}

}

Listing 13: Output of Listing 12

25

Page 26: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

4.6 Aggregations

The second part of Listing 12, the part after query, suggest the aggregationtechnique used. Aggregations are used because they can provide insight intoresults (may be search results, and may be not) for users. Figure 14 is anexample of a visualisation of the aggregation result by Google. Each bucket inthe histogram is a year and the y–axis is the number of times that Navneet Dalal,the author of the paper Histograms of oriented gradients for human detectionis cited.

Figure 14: An example use of an aggregation result, by Google

The query aggregates over the publish year field with an interval of 1 anda min doc count of 1. By default the response will fill gaps in the histogramwith empty buckets. Thanks to min doc count, Elasticsearch can be made toreturn only buckets that have documents in it. Listing 14 is the aggregationresult of the query in Listing 12.

26

Page 27: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

{

"aggregations": {

"years": {

"buckets": [

{

"key": 1977,

"doc_count": 1

},

{

"key": 1979,

"doc_count": 1

},

{

"key": 1990,

"doc_count": 2

},

{

"key": 1991,

"doc_count": 3

},

{

"key": 1995,

"doc_count": 1

},

{

"key": 2001,

"doc_count": 3

}

}

}

}

Listing 14: Aggregation result of Listing 12

5 Backend

5.1 Frame

In the previous works, we just write all the things in one PHP file, which meanswe’ll get variables, connect the database, select the information we want, designUI and show the page that we want to show. But this is actually not necessaryand many codes that are same are written again and again. Clearly, it’s notso efficient. Moreover, doing all the different works in just one PHP is not sosence.

Now we have CodeIgniter, which will realize the separation of model, con-troller and views. It gives us a brand new chance to do the work in a moreefficient way. Generally, a traditional PHP file can be divided into two parts,one of which is connect the database and select the information we want, the

27

Page 28: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

other contains UI design and showing the page. Now we can divide the PHPfile into two parts. The previous one can be put in the Model part. And theother one can be used as a view page.

After dividing, now we have many models and views, but there is a questionthat how can the machine know that the views need what information and theinformation is from which model. That is why we need a controller. In thecontroller, we can deal with the requestion and choose which model to selectthe information and which view to show the information.

The things above are how I finish most my part. and finally, I have thecontroller, which contains several methods:

<?php

class Academic extends CI_Controller {

public function __construct() // construct the object.

{

...

}

public function index() // control to show the home page

{

...

}

public function hint() // control to select from the

database according to the author name, and echo the

result in json, which is to realize the autho complete

function.

↪→

↪→

↪→

{

...

}

public function result() // control to select all the

possible authors according to the author name and sort

them by the number of paper desc.

↪→

↪→

{

...

}

public function relationship() // control to select all the

relationship that the author has with other authors, also

echo in json.

↪→

↪→

{

...

}

public function tree() // control to select the

teacher_student relationship and echo in json.↪→

{

...

}

public function author() // control to select the certain

author and paper information of him or her.↪→

{

...

28

Page 29: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

}

public function hint1() //control to select from the

database according to the paper title, and echo the

result in json, which is to realize the autho complete

function.

↪→

↪→

↪→

{

...

}

public function result1() // control to select all the

possible papers according to the number of citations

desc.

↪→

↪→

{

...

}

public function paper() // control to select the certain

paper the information of it and some paper advice.↪→

{

...

}

public function conference() // control to show the

conference that the user choose.↪→

{

...

}

public function index3() // control to select and show all

the conference and some important information of it.↪→

{

...

}

}

Listing 15: An overview of the controllerHere I don’t show any details of the methods, for we’ll pack them with the

report, the same with the models.

<?php

class Academic_model extends CI_Model

{

public function __construct() // construct the object.

{

...

}

public function get_authors() // select all the authors

that is possible according to the author name and sort

them by the number of paper desc.

↪→

↪→

{

...

}

public function get_authorinf() // select the certain

author and paper information of him or her.↪→

29

Page 30: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

{

...

}

public function hint() // select the possible

author name from the database according to the author

name limit 10.

↪→

↪→

{

...

}

public function get_relation() // select all the

relationships that the author has with other authors.↪→

{

...

}

public function get_TS() //select the

teacher-student relationships.↪→

{

...

}

public function get_papers() // it's abandoned for we

use elastic.↪→

{

...

}

public function get_like1() // select the advised

papers according the citation relationship.↪→

{

...

}

public function get_paperinf() // select the certain paper

and some information of it.↪→

{

...

}

public function hint1() // select the possible

paper titleaccording to the paper title the user input.↪→

{

...

}

public function get_conferenceinf()// select the certain

conference.↪→

{

...

}

public function get_all_conference()//select all the

conferences and their inforamtions.↪→

{

...

}

}

30

Page 31: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Listing 16: An overview of modelsHere is a list of all the views:

1. author.php show the certain author information;

2. conference.php shows the certain conference information;

3. footer.php

4. header.php

5. home.php shows the home page;

6. home3.php shows all the conferences

7. paper.php shows the certain paper page

8. result.php shows results for author search

9. result1.php shows results for paper search

Maybe you’ll confused why home3 not home1 or home2. That’s because weused to have home1 and hoem2, they are searching for paper title and conferencename, but now, we put all the function in the home, which used to search onlyauthor name.

Now you’ll find here are two PHP files that I don’t refer to, which areheader.php and footer.php. According to their name we can know that Iwrite the head and the foot of the html page in it, which all the views needto use, which means I only to load them while I need to load a new page, Idon’t need to write once again in each PHP file. Also, I write the CSS style andJavascript in the header and footer.

5.2 Routes

It’s quite disappointing if you need to input .../index.php/controller/method...and it’s also disgusting while showing pages if the URL is like academic/index

or academic/index1. The user is not interested in which controller you chooseat all, or even they don’t really want to see the methods name like index for it’stoo ambiguous. That why we need to chang routes, which can make the URLclearer and more beautiful.

I can change the routes in the routes.php in the CI frame. here is how Ichange it.

31

Page 32: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

...

$route['default_controller'] = 'academic';

$route['404_override'] = '';

$route['translate_uri_dashes'] = FALSE;

$route['authors'] = 'academic/result';

$route['author'] = 'academic/author';

$route['papers'] = 'academic/result1';

$route['paper'] = 'academic/paper';

$route['conferences'] = 'academic/index3';

$route['conference'] = 'academic/conference';

$route['search_author_name'] = 'academic/index';

$route['search_paper_title'] = 'academic/index1';

$route['search_conference'] = 'academic/index2';

$route['relationships'] = 'academic/relationship';

$route['tree'] = 'academic/tree';

Listing 17: Define routes in routes.php

And to remove the index.php’ behind localhost, I do more other work.First I changed one line in the httpd.conf of my web server:

LoadModule rewrite_module modules/mod_rewrite.so

Listing 18: Modify httpd.conf

Here I just remove the # before it. Then I should write a file named.htaccess and put it in the root directory.

RewriteEngine on

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond $1 !^(index\.php|images|js|img|css|robots\.txt)

RewriteRule ^(.*)$ /index.php/$1 [L]

Listing 19: The .htaccess file

Finally open the config.php of the CI and change one line.

<?php

$config['index_page'] = '';

Listing 20: Change the route of the index page

32

Page 33: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

5.3 Pagination

This parts may be seemed not belong to the CI, or we can say that it not belongsto the backend. Of course, we have to admit that the pagination function shouldbe done by javascript, which is actually a work of frontend. But when I readthe official document of CI, I surpringly find the paging class. It means theCodeIgniter has it’s inner paging function! It means our work could be easierand our codes could be more clear if we use the inner class.So having knowingthat my group member had finished one paging function on a page, I decide totry to use the function. Just following the official document, I add these codeto the controller function (here I first use the authors page to have a try):

<?

$this->load->library('pagination');

$config['base_URL'] = 'authors';

$b = $data['result']['lines'];

$config['total_rows'] = $b;

$config['next_link'] = '&raquo;';

$config['prev_link'] = '&laquo;';

$config['first_link']= 'First';

$config['last_link']= 'Last';

$config['per_page'] = 10;

$config['page_query_string'] = TRUE;

$config['enable_query_strings'] = TRUE;

$this->pagination->initialize($config);

$data['page'] = $this->pagination->create_links();

Listing 21: Pagination in CodeIgniter

Here the \$data['result']['lines'] means the total number of the re-sults. Then I need to print the paging device in the corresponding view.

<?php

echo $page;

echo "<br>";

Listing 22: Output the paging device

I tried to use the paging function, but I failed. Then I find that the realprocess is that when the user click the paging button, it calls the controllerfunction again, which means it need to select the data once again from thedatabase and show the corresponding information. But it only transport onevariable per page, which means it doesn’t get the author name, so it can’t selectfrom the database. After discovering this, I add some codes to the base url

and then solve the promble perfectly.

33

Page 34: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

$name = $this->input->get('name');

$this->load->library('pagination');

$config['base_url'] = 'authors?name='.$name;

...

$this->pagination->initialize($config);

$data['page'] = $this->pagination->create_links();

Listing 23: Improving the pagination

Then, I only need to use per page to decide which to show in the view:

<?php

if(isset($_GET['per_page']))

{

$pages=$_GET['per_page'];

}

else

{

$pages = 0;

}

if($pages+10>=$result['lines'])

{

$ofset=$result['lines'];

}

else

{

$ofset=$pages+10;

}

for($i = $pages;$i<$ofset;++$i)

{

echo "<tr style=\"cursor: pointer\"

onclick=\"clickableRow('author',

".$result[$i]['auid'].")\">";

↪→

↪→

echo "<td>".$result[$i]['auname']. "</td>";

echo "<td>" . $result[$i]['afname'] . "</td>";

echo "<td>" . $result[$i]['num'] . "</td>";

echo "</tr>";

}

Listing 24: Showing the pagination buttons

As I wish, it works. and in this way, our work is easier, for my group memberneedn’t to write other pages. So after discussing, we choose the inner class.

34

Page 35: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

5.4 Connect Elasticsearch

We use elasticsearch while we search the paper title and more things. To realizethis, we need to build a query sentence in the form that elasticsearch needs.Which means we need to build the query sentence like this.

{

"query": {

"multi_match": {

"query": "xiaoou"

}

},

"aggs": {

"years": { "histogram": {

"field": "publish_year",

"interval": 3

}}

}

}

Listing 25: An example of query

Then, notice that GET method doesn’t have a request body, so we decideto pass data by using POST method. And because the Elasticsearch needs datain JSON format, it’s necessary to convert it to the JSON format.

<?php

$name = $this->input->get('title');

$post_data['size'] = 10000;

$post_data['aggs']['years']['histogram']=array('field'=>'publish_year','interval'=>3);

$post_data['aggs']['conference']['terms']['field']='conference.keyword';

$from=$this->input->get('from');

$to=$this->input->get('to');

if($from===NULL || $to===NULL)

{

$post_data['query']['multi_match']['query']=$name;

}

else

{

$post_data['query']['bool']['must']['multi_match']['query']=$name;↪→

$post_data['query']['bool']['filter']['range']['publish_year']=array('from'=>$from,'to'=>$to);↪→

} //construct a query sentence

$post_dd = json_encode($post_data);

Listing 26: Convert request data to JSON format

After building the sentence, it’s time to post the request and get results.

35

Page 36: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

$curl = curl_init();

$url = 'http://127.0.0.1:9200/papers/_search';

curl_setopt($curl, CURLOPT_URL, $url);

curl_setopt($curl, CURLOPT_HEADER, 0);

curl_setopt($curl, CURLOPT_HTTPHEADER, array(

'Content-Type: application/json',

'Content-Length: ' . strlen($post_dd))

);

curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($curl, CURLOPT_POST, 1);

curl_setopt($curl, CURLOPT_POSTFIELDS, $post_dd);

$result = curl_exec($curl);

curl_close($curl); //importing data

Listing 27: POST the data

But the form of the results is in JSON format, and the results contain toomany unnecessary information, which means we need to convert it to normalarray and pick out information we need.

36

Page 37: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

$result = json_decode($result,TRUE);

$i = 0;

foreach ($result['hits']['hits'] as $a_paper)

{

$j = 0;

$result1[$i]['paperid']=$a_paper['_source']['id'];

$result1[$i]['title']=ucfirst($a_paper['_source']['title']);

$result1[$i]['year']=$a_paper['_source']['publish_year'];

$result1[$i]['conference']=$a_paper['_source']['conference'];

$result1[$i]['conferenceid']=$a_paper['_source']['conference_id'];

foreach ($a_paper['_source']['authors_id'] as $auid)

{

$result1[$i]['author'][$j]['authorid']=$auid;

$j++;

}

$j=0;

foreach ($a_paper['_source']['authors'] as $au)

{

$result1[$i]['author'][$j]['authorname']=$au;

$j++;

}

$i++;

}

if(empty($result1))

{

show_404();

}

$data['result'] = $result1;

...

Listing 28: Construct the JSON to return

6 Paper Recommendation

We design two methods of paper recommendation. One is in the single paperpage. The other is in the single author page.

6.1 In The Single Paper Page

In the single paper page, we show the thesis recommends after the main in-formation of the certain paper. Then the question is how to realize the thesisrecommends or which relationship should we choose to find the paper to rec-ommend. After listing all the posible ralationships, we choose to use the papersthat referenced the paper or be referenced by the paper. And according to thetime that they are cited, show the most ten.

37

Page 38: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

6.2 In The Single Author Page

In the single author page, we show the thesis recommends after the main in-formation of the certain author and bafore the papers of the authors. We putthem in one line and users can drug the sliding block breadthwisely to see allthe recommends, which means it won’t take too much space. Then the idea ofthe recommends is a little bit different from it in the single paper page. Firstwe find all the posible teachers according to the result we trained. Then chooseten of them whose number of paper is most and show one of their paper. Ifthere is no posible teachers, we find posible students. If there is also no posiblestudents, of course, we won’t show.

7 Student Hierarchical Tree

The other task is to display the relationship of an author including his teachersand students in a visualized way. For this target we are going to use d3 toproduce a tree picture showing the relationships between generations. Andthanks to the model of lab3 we can derive a table of relationships between eachtwo authors. We add this table to our database in order to search relevantinformation about the authors. So we can divide this task into two parts:searching data and drawing the tree. One PHP file and one HTML file aredesigned to accomplish this task.

7.1 mytree.html

Let us begin with SVG: Scalable vector graphics are based on the extensiblemarkup language (a subset of the standard universal markup language) and areused to describe a graphic format for two-dimensional vector graphics. It wasdeveloped by the World Wide Web Consortium and is an open standard.

In this part we first need to create an SVG object and set its parameters inour HTML:

var marge = {top:-10, bottom:0, left:80, right:0};

var svg = d3.select("svg");

var width = svg.attr("width");

var height = svg.attr("height");

var g = svg.append("g")

.attr("transform","translate("+marge.left+","+marge.top+")");

Listing 29: Creating an SVG object

In the above steps we have set the size of our SVG and selected the SVG.Here selecting the SVG means we can do some manipulations on it. Thenthe next thing is to obtain relevant information. This can be achieved in thefollowing sentence:

38

Page 39: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

var authorid = getParameterByName("id");

Listing 30: Obtain the id query string

Through this method we get the author’s ID and put it in the variableauthorid. And next we send this ID to the PHP file mytree.php as a parameterto gain information:

var dataset = {};

$.getJSON("mytree.php?",{authorid:1336878}).then(function

(response) {↪→

dataset = response;

Listing 31: Gain teacher–student relationships

Then we get the response data from the PHP file and can use the data. Notethat up to now our data is still in raw JSON form so they cannot be appliedto draw our tree. As we all now the tree picture is hierarchical our data needto be transformed into similar form:

var hierarchyData = d3.hierarchy(dataset);

Listing 32: Convert to hierarchical data

The hierarchical data have now been produced (but still not enough). Thenwe create a tree generator:

var tree = d3.tree()

.size([height,0.7*width])

.separation(function(a,b){

return (a.parent==b.parent?1:3)/a.depth;

})

Listing 33: Create a tree generator

Here we also set the size and the separation distance between two nodes.With this tree generator we can produce more suitable data for the tree picture:

var treeData = tree(hierarchyData);

var nodes = treeData.descendants();

var links = treeData.links();

var thetree = d3.linkHorizontal()

.x(function(d) { return d.y; })

.y(function(d) { return d.x; });

Listing 34: Generate nodes and links

39

Page 40: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

The variable thetree is a link visualizing tool. It can produce horizontallinks and show them. What we are next going to do is to connect the data tothe picture.

var gr = g.append("g")

.selectAll("g")

.data(links)

.enter()

.append("g")

.attr("visibility","hidden")

.attr("parent",function (d) {

return d.source.data.id;

})

.attr("trigger","off");

Listing 35: Visualising data

Here g element is a kind of group that contains several kinds of elementswhich can be manipulated at the same time. In this task we put links into a g

and nodes and texts (authors’ names) into the other.

var gr = g.append("g")

.selectAll("g")

.data(links)

.enter()

.append("g")

.attr("visibility","hidden")

.attr("parent",function (d) {

return d.source.data.id;

})

.attr("trigger","off");

Listing 36: Organize links and nodes into two g elements

Now we have created gr of g elements and bind the links’ data to the g

elements. Note that to avoid g is not enough we use

.enter()

.append("g")

It means if the data are more than g, append g to be enough. After this wealso set some attributes which I will introduce in the following.

As we bind the links’ data to these g, we now append links (path) into g.

40

Page 41: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

gr.append("path")

.attr("d",function(d){

var start = {x:d.source.x,y:d.source.y};

var end = {x:d.target.x,y:d.target.y};

return thetree({source:start,target:end});

})

.attr("fill","none")

.attr("stroke","black")

.attr("stroke-width",1);

Listing 37: Set attributes of links

And we set the parameters of these links. Of them d is the main one. d is anattribute of a line which lines are led by. For instance if we set the start pointand the end point then we give an instruction to the line: from the start pointand go to the end point!

As for nodes and texts the most steps are similar but only one more is added.That is an on attribute. We add an on event to the on of nodes and texts. Inthis way we can achieve the the effect of controlling our tree is spread or closed.Let us look on the code first:

41

Page 42: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

.on("click",function () {

d3.selectAll("g[parent=\"" + this.id + "\"]")

.attr("visibility", function () {

if (this.getAttribute("trigger") === "on")

{

this.setAttribute("trigger", "off");

return "hidden";

}

else if (this.getAttribute("trigger") === "off") {

this.setAttribute("trigger", "on");

return "";

}

});

d3.selectAll("path[from=\"" + this.id + "\"]")

.attr("visibility",function () {

if (this.getAttribute("trigger") === "on")

{

this.setAttribute("trigger", "off");

return "hidden";

}

else if (this.getAttribute("trigger") === "off") {

this.setAttribute("trigger", "on");

return "";

}

console.log(this.id);

});

});

Listing 38: The onclick function

This onclick function is very long but simple. It mainly consists of twoparts: one part is to control the nodes and texts and the other is for the links.

Our idea is that we first draw a tree picture on the author page (throughsome experiments we find that most authors have relationships less than threegenerations so we draw three–layer–tree picture. But we only show one nodewhich stands for this author. If the dot is clicked his or her students will beshown and if it is clicked again the students or teachers will be hidden. Inthis way we achieve the effect of opening or closing the tree. This is the basicprinciple.

In the above I have referred to several attributes such as trigger andvisibility. In the function we utilize them to achieve the effect. I illustratethe nodes and the links are just similar.

First of all we select all nodes whose parent is the node(or the author) andset the visibility of them:

d3.selectAll("g[parent=\"" + this.id + "\"]")

.attr("visibility", function () {

Listing 39: Select specific nodes

42

Page 43: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Then we need to judge whether these children nodes are being shown or notbecause we need to deal two kinds of situations: the children nodes now areshown in which we need to have them hidden and the children nodes are nowbeing hidden in which we need to show them. For this we set the trigger. Thetrigger has two statuses: on and off. On means this node’s children nodesare now being shown. So if it is clicked its children nodes will be hidden. Andbesides its trigger will be transformed to off in order to continue to work nexttime.

d3.selectAll("g[parent=\"" + this.id + "\"]")

.attr("visibility", function () {

if (this.getAttribute("trigger") === "on")

{

this.setAttribute("trigger", "off");

return "hidden";

}

Listing 40: Set trigger

Similar to above if the trigger is off we have:

else if (this.getAttribute("trigger") === "off") {

this.setAttribute("trigger", "on");

return "";

}

Listing 41: Set trigger when it is off

For the links we deal with them in a similar way and up to now we havenearly accomplished our work. In the next part I will introduce the how thedata are transported to the HTML file.

7.2 mytree.php

This PHP file is to receive an ID of one author from the per author page andsearch his or her teacher–student information. But let us begin with a sentencein the HTML file:

$.getJSON("mytree.php?",{authorid:authorid}).then(function (response)

Listing 42: Get information in JSON

This sentence send an instruction to the PHP file to get the relevant infor-mation from PHP as the parameter is authorid. To receive the parameter thePHP file have this:

43

Page 44: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

$id=$_GET['authorid'];

Listing 43: Get the query string in PHP

Next we search the name relevant to this ID and the second layer.

<?php

$sql1 = "select a.name,a.id from author a join relationship b on

a.id=b.target where (b.is_advisor=1 and b.source='$id')";↪→

$result1 = mysqli_query($conn,$sql1);

$sql2 = "select name from author where id='$id'";

$authorname = mysqli_fetch_row(mysqli_query($conn,$sql2))[0];

$allchildren = [];

while ($row = mysqli_fetch_row($result1)){

$child =

(object)array('parent'=>$id,'name'=>$row[0],'id'=>$row[1],'children'=>[]);↪→

Listing 44: Search for teacher–student information in database

Based on the second layer we continue to search the third layer:

<?php

$sql3 = "select a.name,a.id from author a join relationship b on

a.id=b.target where (b.is_advisor=1 and b.source='$row[1]')";↪→

$result2 = mysqli_query($conn,$sql3);

$sql4 = "select name from author where id='$row[1]'";

$authorname1 = mysqli_fetch_row(mysqli_query($conn,$sql4))[0];

while ($row1 = mysqli_fetch_row($result2)){

$child1 =

(object)array('parent'=>$row[1],'name'=>$row1[0],'id'=>$row1[1],'children'=>[]);↪→

array_push($child->children,$child1);

}

Listing 45: Search the third layer

After searching we put the third-layer nodes into the second-layer nodes’children and at last put the second-layer nodes into the root node’s children.

44

Page 45: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

<?php

array_push($allchildren,$child);

}

$tree = array(

'parent'=>11111,

'name'=>$authorname,

'id'=>1336878,

'children'=>$allchildren

);

Listing 46: Third layer as second layer’s children

Of course the data should be in JSON form:

<?php

echo json_encode($tree);

Listing 47: Encoding data as JSON

8 Machine Learing Improvement

8.1 Teacher–Student Relationship

In the data visualization, we tend to use Force–Directed Graph and the treestructure diagram. In this two visualizations, we both use the teacher–studentrelationship. Thus we used the experimental results of lab3. What we shoulddo is to make the best of the algorithms in order to achieve the highest accuracyrate of judgement between teachers and students.

8.2 Improvement

To promote the accuracy rate, we have three ways. The first way is to selectthe different algorithms to calculate in different ways. To add the new featuresbetween the two authors is the second way. The third way is to change thealgorithm function parameter.

At first, I use the lab3 results. We compared several common algorithmswhich represent different methods of calculation in the labe3. In these algo-rithms, I choose the LogisticRegression, whose accuracy rate can reach 75% byusing those nine features. Therefore, I used LogisticRegression as my preferredalgorithm.

In next table, we can see clearly that comparing with other algorithms,LOgisticRegression has the highest accuracy. And tensorflow mainly deal withmore data. In this train we only have 5297 records, which is far less than theamount of data needed for Tensorflow to perform perfectly. So in this table Idon’t list the results of Tensorflow.

And then, I add two new features which both increase the accuracy rate.The first new features is affiliation-feature. I count the respective affiliation

of two authors in each collaborative paper. Then judge whether the two authors

45

Page 46: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

Table 1: Results of logistic regression

Precision Recall F1 Score Support0 0.72 0.79 .76 5471 0.84 0.79 0.81 777avg / total 0.79 0.79 0.79 1324

Table 2: Results of SVC, linear kernel

Precision Recall F1 Score Support0 0.72 0.78 0.76 5511 0.84 0.79 0.81 773avg / total 0.79 0.78 0.79 1324

have the same affiliation in each cooperative paper. Finally calculate the per-centage of the same affiliation. Because there are two authors who don’t belongto any affiliation, their affiliations will be the same in this situation. So I add anew data which is the percentage of the same affiliation and None. These twofigures make up a new feature.

The second new features is Author-Sequence-feature. In this feature extrac-tion process, I count the author sequence of two authors in each collaborative.Then we only need to judge which author’s author sequence is in the front.Eventually calculate the percentage of the first author’s order in the earlierposition.

The first new feature increase the accuracy rate by 2%. And the second newfeature increase the accuracy rate by 2%. Hence, my final accuracy rate reach79%.

I use two inner join to select the data that is the cooperative papers of thetwo scholars. What we should notice is that the two inner join has specificorder. The outer inner join decide the data we extract belonged to whom. Thefollowing codes extract the affiliationID and AuthorSequence which are belongedto scholarA from database. So I use this code twice to gain information fromtwo different scholars.

Table 3: Results of Navie Bayes, func = GaussianNB

Precision Recall F1 Score Support0 0.74 0.72 0.73 6161 0.76 0.78 0.77 708avg / total 0.76 0.76 0.76 1324

46

Page 47: Final Project Report - SJTUwang-xb/wireless_new/material/... · Final Project Report Group 24 ’ˆ†, µ¿, øœ˜, 4 June 24, 2018 Contents 1 Introduction 2 2 Overview 2 3 Website

sql10 = '''

SELECT RESULT.AffiliationID,

RESULT.AuthorSequence

FROM papers

INNER JOIN

(SELECT paper_author_affiliation.PaperID,

paper_author_affiliation.AffiliationID,

paper_author_affiliation.AuthorSequence

FROM paper_author_affiliation

INNER JOIN

(SELECT PaperID

FROM paper_author_affiliation

WHERE AuthorID = "{b}") AS B

ON paper_author_affiliation.PaperID = B.PaperID

WHERE paper_author_affiliation.AuthorID = "{a}")

AS RESULT

ON papers.PaperID = RESULT.PaperID

ORDER BY papers.PaperPublishYear ASC

'''.format(a=scholarA, b=scholarB)

Listing 48: Retrieve information from database with Python

Ultimately, I tried to modify the function parameters. Each time I only mod-ified one parameters but my accuracy rate not only did not rise but decreasedor didn’t changed. I wanted to make up several parameters to modify themtogether but I failed. I don’t fully understand the mathematical implicationsbehind each parameter. So I can’t modify several parameters at the same timeto improve the accuracy rate accurately.

The following table is my final result.

Table 4: Final machine learning results

Precision Recall F1 Score Support0 0.72 0.79 0.76 5471 0.84 0.79 0.81 777avg / total 0.79 0.79 0.79 1324

9 Conclusion

Through our collaboration, a fully functional website is up and running. Ourfinal project combines efforts in different areas: website design, frontend andbackend coding, machine learning and data visualisation. The final report ishere presented for assessment.

47