a report on info-h-415: advanced databases. project. · 2017-12-19 · introduction nowadays, we...

24
A report on INFO-H-415: Advanced Databases. Project. By Aleksei Karetnikov (000455065)

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Areporton

INFO-H-415:AdvancedDatabases.Project.

By

AlekseiKaretnikov(000455065)

Page 2: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Content

INTRODUCTION......................................................................................................................3

GRAPHDATABASE..................................................................................................................4Whatisit?..................................................................................................................................................4Differencesbetweenothertechnologies...................................................................................................4Querylanguages.........................................................................................................................................5

ARANGODB.............................................................................................................................7

Features..................................................................................................................................................7BenefitswithsameDB...............................................................................................................................9DBarchitecture..........................................................................................................................................9Storageengines........................................................................................................................................10Hardware/Softwarerequirements...........................................................................................................10Installation................................................................................................................................................10

Pricingschema......................................................................................................................................12

Interactions..........................................................................................................................................13Benchmark...............................................................................................................................................13

USECASE...............................................................................................................................16

Problem................................................................................................................................................16

Dataset.................................................................................................................................................19

Prototype.............................................................................................................................................22

Otherusecases.....................................................................................................................................23

CONCLUSION........................................................................................................................24

REFERENCES.........................................................................................................................24

Page 3: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

IntroductionNowadays,weareinteractingwithvariablyrepresentedinformation:plaintexts,tables,

trees and others. Sometimes, it is impossible to imagine how it is possible to represent theinformation,whichisexistsasatreeinrealworld,inrelativedatabase.Itisclearthatonecanimplementlotsofmany-to-manyconnectionanddifferentattributesinsuchsystembutittakesmore time to apply lots of joins in queries. The graph database can be a solution for suchproblems.Itconsistsnoopportunitiestojoindifferentdatasetsandproblemswithperformance–itisanantherNoSQLapproach,whichprovidesdirectconnectionswithdata.

There are lots of different databases which deal with structured data: neo4j, Oracle,Postgres,ArangoDB,OrientDBandother.Allofthemprovidessomecommonandsomeuniquefeatures:querylanguages,storeengine,systemrequirements,pricing,overallperformant,etc.ItisclearthatNoSQLsolutionalsorequiresaspeciallanguageforinteraction(e.g.AQL,Cypher,SPARQLandotherlanguages).

Forthisresearchproject,theArangoDBwasselectedasagraphdatabase.Itlooksalittlebitdifferentthantherivals:itisreallyyoungtechnology–itwasdevelopedin2011byGermancompanyArangoDBGmbH,butitalreadyhasitscustomers.Accordingtothedevelopers’release,ArangoDB has the best performance in comparison with competitors with impressiveopportunitiesindatainteraction.Moreover,supportofthistechnologyisreallysimple.

Furthermore, any technologymust have a commercial advantage. In other way, it isineffective.Accordingtothevendors’ information,ArangoDBisthebestsolutionbecauseit isfreeforeveryoneandnotsoexpensivewhenweneedsomeadditionalfeaturesandsupport.

So, the main aim of this research project is to explain the graph database features,advantagesanddisadvantagesinArangoDBanditscommercialopportunities.

Page 4: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

GraphdatabaseWhatisit?GraphDatabaseisadatabasetypewhichstoresdatainverticesandedgesofthegraph.

Itbasedonthemathematicalgraphtheory.Inaddition,eachelementofthegraphdatabasecanconsistanynumberofadditionalproperties(additionalkey-valuepairs).Themainfeatureofsuchtypeofstorageishighlyconnecteddataindependentlyofthevolumeofthedataset.Itmeansthatall“join-like”operationsareprocessingbyusingpersistentconnectionsbetweennodes,soittakesconstant-time[2].

Nowadaysthisapproachisbecomingmoreandmorepopular.Itisclearthatwearelivingintheworldofconnecteddata,sosuchtypeofdatastructuressignificantlybetterrepresentstherealinformationinthedatabase.

Thecommonplaceofdatainsuchtypeofdatabasesisnodes(vertex)ofthegraph.Itcaninclude different properties (usually, schema-less), label of the node and some additionalmetadata(suchasindexorconstrains).Inothertypeofdatastructuresnodesarerepresentedasrecordsinrelationaldatabasesordocumentsindocumentstorage.

All the relations in graph databases (which are represented by edges) are named forsemantic connectionbetweennode-entitiesandhave the startand theendnode.So,all therelations are directed. Normally, properties of the edges include information about distancebetweennodes,weight,cost,etc.Ontheonehand,itimprovesinteractionwithallthedirectlyconnecteddatabutontheotherhandsimpledeletionofconnectednodefollowsfulldeletionoftherelationship[1].

Therearemanydifferentapproachesofstoragemechanisms ingraphdatabase.SomeDBMS store data in tables, others in document or key-value storages,which provides bettercompatibilitywithNoSQLapproach.

Themostfamousgraphdatabasesare:ArangoDB,Neo4j,Oracle,OrientDB,SAPHANA,Teradata,SQLServer,GraphDB.

DifferencesbetweenothertechnologiesThemainaimofgraphdatabases isstoringofconnecteddatato improve interactions

betweenalltheconnectednodes.So,themaindifferencesofsuchapproachincomparisonwithothertypesofdatabasescorrespondtodifferentaimsofthedatabasesandso,differentdatastructureswhichareusedtostoreit.Inthetablebelowthedatatypeswhichcouldberecognizedlikethesameinthemostpopulartypeofthedatabasesarepresented[1].

Typeofthedatabase NameoftheelementRelational(alsotemporal,key-valueandspatial) Row/recordortupleDocument DocumentGraph NodeObject Object

Page 5: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Oneofthemainfeaturesofthegraphdatabasesincomparisonwithothertypeisbetter

performanceofinteractionwithconnecteddata.Eachofmentioneddatabasetypeisthemostrelevantforparticularcase.Forexample,relationaldatabasesarethebestsolutionforstoringtable valuesand its aggregation,document store for storingof thewholedocuments,objectdatabasesforfilestoring.Graphdatabasesaresimultaneouslyorientedforspecificproblems.Initscase,itisfindingoftheshortestpath,connecteddatastoring,etc.So,themostrelevantcasesforusingthistechnologyare:

• real-timerecommendations;• decisionmakingengines;• socialnetworks;• accessmanagement;• routeplanning;• dataandknowledgemanagement;• frauddetection.

It isclear that thevastmajorityofotherNoSQLdatabasesareaggregateorientedbut

frequently suchoperationsare veryexpensivebecauseof largevolumesofdata. In this casegraphdatabasesarethebestsolutionforsuchproblemsbecause it isnotnecessarytospendresourcesforjoiningofdifferentrecords,whichareconnectedbydefault.

QuerylanguagesOneofthefeaturesofGraphdatabaseisanewproblemofthequerylanguageselection.

Unlike traditional relational databaseswhichuse SQL as a query languages, there are lots ofdifferentquery languages for this typeofdatabases,whichareaimedfor resolvingofcertainproblems.Themostfamousquerylanguagesforgraphdatabasesare[2]:

• SQL(itstillcanbeused);• AQL(ArangoDBQueryLanguage);• Cypher(themostpopulardeclarativelanguageforNeo4j);• GraphQL(thelanguagewhichcreatedbyFacebook);• Gremlin(languageforApacheTinkerPop™project.CanbeusednativelyinJava,Scala,JS,

Groovy,ClojureandPython);• SPARQL(SQL-likelanguageforRDFgraphs).

SQLSQLforgraphdatabasesdoesnotcontainsomespecificoperatorssoitisnotnecessary

toreviewitindetails.

Page 6: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

AQLThemostsimilarforSQLlanguageisAQL,whichwascreatedespeciallyforArangoDB.In

comparisonwithSQL,“AQLdoesnotsupportdatadefinitionoperations,suchascreatinganddropping databases, collection and indexes” [1]. Sometimes it overlaps SQL keywords, forexampleFILTERclause,whichisequivalenttoWHEREclauseinSQL,butwhenallthequeriesinAQL are executed from left to right, so position of this clause in the query determines theprecedence.

Forexample,theAQLcode:“FOR user IN users FILTER user.active == true RETURN user” is the following SQL query: “SELECT * FROM users WHERE active=true”. CypherCypherisalsoSQL-inspiredquerylanguagewhichwascreatedforNeo4jdatabase.The

mainadvantageofthislanguageisopportunitytodescribetheresultwhichisnecessarytoobtainwithoutdescribingofthenecessaryprocedures,aimedtoobtaintheresult.

Forexample,thecode“MATCH (s:Sales {items:’Beer’}) RETURN s.items, s.price” corresponds to the following SQL query: “SELECT s.items,s.price FROM sales AS s WHERE s.items=’Beer’). GraphQLGraphQLisaspecialquerylanguageforFacebookAPI.So,itisnotdirectlyspecifiedfor

graphdatabases.Thequeryofthislanguagesspecifiesonlyobject’sstructure,forexample:“{ product(quantity>100) { productName productPrice } }”. ThemainusersofthislanguageareFacebook,Pinterest,Dailymotion,GitHub,Coursera,

IntuitandShopify.

GremlinThemain featureof this language isnative supportof themost famousprogramming

languages.Eachqueryconsistsasequenceofatomicoperations(steps)ofthedatastream.For

Page 7: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

example,thenextcodetakesvertexwithname“Austria”,movestocountriesaroundandselectstheirnames.

g.V().has(“name”,”Austria”).out(“border”).values(“name”) SPARQLThislanguagewascreatedbyW3CconsortiumforRDF(ResourceDescriptionFramework,

technologyfordataserializationandknowledgemanagement,whichisoriginallydesignedasametadatamodel)graphs.ThefollowingcoderepresentsselectionofthetitleinthegivenRDFtriple:

SELECT ?title WHERE { <http://example.com/book/book1>

<http://purl.org/dc/elements /1.1/title> ?title . }

ArangoDBFeaturesArangoDBhaslotsofdifferentinterestingandimportantfeatures:

• Themostimportantoneismulti-model(graph,key/valueanddocumentmodels)implementationwhichprovidesanopportunitytointeractwithdifferentmodelsinsinglequery.Suchapproachhelpstoimprovetheperformanceofthedatabase.

• Reduced operational complexity, which improves the selection of the mostappropriatemodelforgivendata.

• SupportofACID(Atomicity,Consistency,Isolation,Durability)transactions.• Modulararchitecturewhichprovidesfaulttolerance.• Lowcostofownership.• IntegratedframeworkArangoDBFoxx,whichprovidesnativeaccesstoin-memory

databyusingJavaScript.• Horizontalandverticalscalability.• WEBUIforgraphvisualizationandAQLqueries.

Page 8: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure1.ExampleofArangoweb-interface.Statistics.

Figure2.Exampleofqueryviewinweb-interface.Source:www.arangodb.com

Page 9: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure3.Exampleofgraphviewinweb-interface.Source:www.arangodb.com

BenefitswithsameDBThere are lots of different graph databases which provide almost the same range of

functions.ArangoDBcanbecomparedwithNeo4jandOrientDBbutitofferssomesupplementalopportunities (according to the research and information, provided by DB developers,vschart.comanddb-engines.comwebsites).

ThemostvaluableadvantagesofArangoDBare[1]:1. It ismulti-model DBMSwhenNeo4j is single-model graph database. If the project

requiresadifferentstoragetype,itisnecessarytouseanotherdatabasetostoreit.Atthesametime,ArangoDBprovidesgraph,documentandkey/valuemodels.

2.ArangoDBprovidesthebestperformanceininteractionswithdatafromdifferentdatamodelqueries.

3.ArangoDBoffersimpressivescalabilityopportunities.4. In comparison with Neo4j, all the transactions can be encrypted by TSL or SSL.

Moreover,inconsistrole-basedcontrol(viaFoxxframework)andauditinginEnterpriseversion.5. it compatible with larger number of NoSQL query languages than OrientDB, so it

providesbetteropportunitiesfordatainteractions.DBarchitectureA cluster of ArangoDB consist of a number of database instances which can play 4

differentroles:agents,coordinators,primaryandsecondaryservers.

Page 10: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Agents can form the Agency in the cluster, which is the main part of the clusterconfiguration.Itprovidessynchronizationservicesforthewholecluster.

CoordinatorsareworkingwithqueriesandFoxxservices.Primaryserversaretheplaceswheredataishosted.Secondary servers are asynchronous replicas of primaryones. There areoneormore

secondaryserversforeachprimary.Theyaresuitableforbackupactions.StorageenginesArangoDBprovidestwodifferentstorageengines:traditionalmemory-mappedfilesand

RocksDBbasedengine.Theselectionoftheenginedependsontherealcase.Thetraditionalmemory-mappedfilesenginehassignificantlylongerstartuptimebutit

makes impressive performance with in-memory data queries. So, this engine is suited fordatasetswhichcanberepresentedinthemainmemory.

Thenew“rocksdb”enginewasdevelopedforlargedatasetswhichcannotbekeptinthemainmemory.Incomparisonwiththetraditionalapproach,thisenginesavesalltheindexesonthediskthatmakesveryfaststartup.

Unfortunately,itisnotpossibletocombinedifferentenginesintheoneinstallationofthedatabase.So,itmustbeselectedforthewholeserverorclusteratthetimeofinstallation.

Hardware/SoftwarerequirementsArangoDB runs on Linux, OS X andMicrosoft Windows both 32bit (according to the

architecturelimits,itcanusenomorethan2-3GBofRAM)and64bitsystems.TherearenocriticalrequirementsforthisDBMS.Forexample,accordingtotheofficial

manual,itispossibletorunitonRaspberryPi.Furthemore,ArangoDBcanbe installedasaDockercontainer, inDC/OS,onWindows

AzureandAmazoncloudservices.InstallationArangoDBcanbeinstalledindifferentoperationsystems.So,thisprocessisvariousand

dependsontheenvironment.MacOSXTherearetwowaystoinstallArangoDBinOSX:1.Graphicalapplication.Inthiscasethe.appcontainershouldbedownloadedandmoved

totheApplicationfolder.2.InstallationthroughHomebrew.ItispossibletoinstallthelatestversionofArangoDB

bypackagemanagerHomebrewwiththecommand:“brew install arangodb”.Thenitcanbe started from the default installation folder /USR/LOCAL/OPT/ARANGODB/SBIN/ by usingcommand“arangod”.Consequently,weneedtoinitializethedatabasebyexecutingapplication“arango-secure-installation”andsetarootpassword.

Page 11: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

LinuxTherearefewdifferentprecompiledpackagesforRedhat,Fedora,Debian,Ubuntuand

SuseLinuxdistributives.Normally, it isnecessary todownload thepackage for thenecessaryLinuxversion.

Anotherwayofinstallation–usingofpackagemanagers.Theinstallationcodeforapt-get,yumandzypperpackagemanagersispresentedbelow.

Apt-getcurl -O

https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/Debian_9.0/Release.key

sudo apt-key add - < Release.key ECHO 'DEB

HTTPS://WWW.ARANGODB.COM/9C169FE900FF79790395784287BFA82F0DC0059375A34A2881B9B745C8EFD42E/ARANGODB32/DEBIAN_9.0/ /' | SUDO TEE /ETC/APT/SOURCES.LIST.D/ARANGODB.LIST

SUDO APT-GET INSTALL APT-TRANSPORT-HTTPS SUDO APT-GET UPDATE SUDO APT-GET INSTALL ARANGODB3E=3.2.8 YUMcd /etc/yum.repos.d/ curl -O

https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/CentOS_7/arangodb.repo

yum -y install arangodb3e-3.2.8 Zypperzypper --no-gpg-checks --gpg-auto-import-keys addrepo

https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/openSUSE_13.2/arangodb.repo

zypper --no-gpg-checks --gpg-auto-import-keys refresh zypper -n install arangodb3e=3.2.8 CompilingFurthermore, if theprecompiledpackage isnotavailable, it ispossibletocompileand

buildArangoDBfromrawsource.Forexample,forusingitonRaspberryPi.ArangoDBwastestedwithGNUC/C++andclang/clang++compilerswithC++11-enabledargument.Thismethodisnotverypopularfornormalusing,soallthenecessaryinformationaboutcompilingcanbefoundontheArangoDBdocumentationwebsite:https://docs.arangodb.com/.

Page 12: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

WindowsFor Microsoft Windows the Windows Installer package is available. At the time of

installationusercanselectthe installationpath,makesingle-ormultiuser installation,keepabackup,createadesktopiconandotherparameterswhichtheInstallationWizardofferstotheuser.

PricingschemaArangoDBislicensedunderApache2.0publiclicense.So,fromtheenduser’spointof

view(excludingdevelopment,distributing,etc.), itmeansthatArangoDB is fully free fornon-commercialandcommercialuse.

Moreover,thereare2commercialsubscriptions:BasicandEnterprise.

Features Community Basic Enterprise

CommunityEditionFeatures ✓ ✓ ✓SatelliteCollections ✓SmartGraphs ✓EncryptionatRest ✓EnhancedUserManagementwithLDAP

Training Free,onlineeducation ✓ ✓ ✓Private,on-demandtraining ✓Support SLA none 9×5 24×7ResponseTime criticalissues noguarantee 12hours* 2hourslevel2issues noguarantee 16hours* 5hourslevel3issues noguarantee 40hours* 16hoursNumberofissues 10

permonthunlimited

Supportcontacts google-grouponly

1email,web

4email,web,phone

Technicalalerts ✓ ✓Hotfixes general

release-cyclegeneralrelease-cycle

criticalissues noguarantee 12hours* 2hourslevel2issues noguarantee 16hours* 5hoursLicense Type ApacheV2 ApacheV2 Commercial

Page 13: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

In order to estimate the possible expenditures for commercial subscription to theArangoDB,theSalesdepartmentofArangoDBGmbHwascontacted.AccordingtotheArangoDBGmbHpolitics,pricemodelisvariablefordifferentprojectsandtheaveragefigurescannotbeprovided.

InteractionsArangoDB uses AQL language as amain query tool. The query can be executed from

CommandLineandWebUI.Furthermore,thereisanimpressivenumberofdriversfordifferentlanguages:

• NodeJS;• PHP;• Java;• JavaScript;• .NET;• Go;• Python;• Scala;• Railo;• D;• Dart;• Vert-X;• Gremlin;• RubyonRails;• Clojure;• Elixir;• C++.

All the native drivers are presented on the project’s GitHubhttps://github.com/arangodb/withallthenecessarydocumentation.

Benchmark Whenitcomestotheperformanceofthedatabase,wecanusethespecialbenchmarkfordifferentNoSQLdatabases (https://github.com/weinberger/nosql-tests)which isbasedonNodeJS.Thetestcollectionconsists100000elementstoread,write,findtheneighbor,makeaggregationsandfindtheshortestway.Wecansimply install itontheUNIXbasedoperationsystembysimplesteps: git clone https://github.com/weinberger/nosql-tests.git npm install . npm run data

Page 14: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Tostartallavailabletests,weneedtostarttheserverwithadditionalparameters:./bin/arangod /mnt/data/arangodb/data-2.7 --server.threads 16 --wal.sync-interval 1000 --config etc/relative/arangod.conf --javascript.v8-contexts 17

As soon as ArangoDBwas updated to version 3, scheduler.threads parameter is nowobsoleteandhavatobeexcluded.Wehavetotypethefollowingcommandtostartallthepossibletests.node benchmark arangodb -a 127.0.0.0 -t all

Itisrecommendedtostarttheserverwhere127.0.0.1–IPaddressoftheinstalledandrunArangoDB.

Unfortunately,theavailablebenchmarkwasdeveloped2yearsagoforArangoDB2,whenthecurrentversionis3.So,itwasnecessarytofixitbutitunfortunatelyitstillnotpossibletotesttheperformance

Accordingtotheexecutedtest,wehavereceivedtheresultsforthewholecollectionof100.000elements.TheresultsofthistestincomparisonwiththeresultsfromArangoDBwebsitearepresentedbelow:

Test SingleRead SingleWrite Aggregation Shortest

pathNeighbors(1+2deg)

Neighbors(withprofiledata)

Result(sec.) 12.672 20.194 - - - -Vendor’sresults

16.962 20.530 1.250 0.061 0.464 4.327

Asaresult,wecanseethatatleastsinglereadoperationbecamequickerinnewversion

ofArangoDB.It isclearthattheresults(whichareavailable)arealmostthesamethattheDBvendorprovides.So,wecanusetheirresultsforfurthercomparison,whicharepresentedinthetablebelow.

Page 15: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

SuchresultsshowusthatArangoDBhasbetterperformancethanthevastmajorityofotherGraphdatabases.

Page 16: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

UsecaseProblemNowadaysproblemoftheloadofpublictransportisbecomingmoreandmoreimportant.

Governmentsareinterestedtomoveallthepeoplefromprivatetopublictransporttoimproveroadsituationincities,atthesametimepeopleareinterestedinmaximalconvenienceofthetrip: distance from the real destination from the closest bus/tram-stop, non-overcrowdedvehiclesandoptimaltimeofthetrip.Toresolvethefirstproblems,wehavetoincreasenumberofvehiclesbuttwootherwecanresolvebydataresearch.Moreover, it isclearthatthevastmajorityofcurrentroute-plannersconsideronlynominaltrip-timebutinrealliferoadtrafficcansignificantlyeffectonthetriptime.So,thequickestroutecanbecomethelongestbecauseofsuchproblems.

Asaresult,wehaveacolossalnumberofdifferentobjectswithlotsofdifferent1:1or1:nrelations.Wecanstoreitinhabitualrelationaldatabasewhichlotsofdifferentbridges.So,tobuild the routewe need to consider lots of connections between themby using lots of joinoperations.Itisclearthatbuildingoftheroutewithsomechangescanbecomereallydifficultandhardtocompute.Fortunately,wecanusegraphdatabasetosimplifythisproblemandmakeitmoreoptimizedbecause,itIsclear,thatnormaltransportnetworkisagraph.So,wecanstoreinformationaboutourobject(thenetwork)initsnativeform–arrayofverticesandedgeswithsomeattributes.

Let’slookatthemapsofVienna’stram(Straßenbahn)andmetro(U-Bahn)maps().Wecanseethatbothimagesarethegraphs.So,wecansimplystoreitintheArangoDBgraph.Asaresult,wecanreceiveaunitedgraphoftramandmetrostationswithinformationaboutpossibletrafficproblemsfortrams(sometimes,tramrailsareseparatedfromtheroadnetworkandwedo not have to recognize road traffic for such path segments, it ismentioned on the imagebelow).

Conclusively,ourtaskistostoreourdataandbuildthequickestwaybypathfindingandrecognizingoftransportsituationontheroadwhereitisapplicablewiththebestperformancebyusingArangoDB.

Page 17: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Situation(a) Situation(b)

Figure4.Differentssituationswithtramrails

Page 18: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure5.SchemeofVienna’stram.Informationfromhttp://www.stadt-wien.at

Page 19: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure6.MapofVienna'smetro.Informationfromhttp://www.stadt-wien.at

DatasetTheproblemtouchespublictransportinVienna,sowewilluseinformationfromtheweb-

portal“AustrianOpenData”https://www.data.gv.at/.Unfortunately,allthenecessarydatasetsarerepresentedingeographicallyappropriateformatandwehavetousefewdatasetstobuildthenecessaryone,whichconsistsstopsandconnectionsbetweenthem.

Wealsoneedtostoresomeadditionalinformationwhichcan’tberepresentedasagraph(orwithlotsofdifficulties).

Wewillusethefollowingsets:• U-BahnnetzBestandWien• ÖffentlichesVerkehrsnetzHaltestellenWien

AllthedataarepresentedinWFS,soitisnecessarytomakeapreprocessingprocedure

tomakeallthenecessaryvertexesandedgesofthefinalgraph.Forthisreason,asmallconverter,basedonJava,wasdeveloped.ToimportdatatoArangoDB,onecanusejsonorcsvformatofdata.Itispossibletoimportdatabythreeways:

www.wienerlinien.at

SCHNELLVERBINDUNGEN IN WIEN

Infostelle derWiener Linien

Ticketstelle derWiener Linien

U-Bahn-Linie

S-Bahn-Linie

Lokalbahn Wien-Baden

City Airport Train(Eigener Tarif,VOR-Tickets ungültig)

Vienna InternationalBusterminal

Kundenzentrumder Wiener Linien(U3 Erdberg)

Park & Ride

Troststraße

Altes Landgut

Alaudagasse

Neulaa

Oberlaa

Nußdorf

Oberdöbling

Leopoldau

Krottenbachstr.

Gersthof

Hernals

Breitensee

Penzing

Weidling

au

Purkers

dorf-S

anato

rium

Haders

dorf

Speising

Hetzendorf

Atzgersdorf

Liesing Blumental

QuartierBelvedereMatzleinsdorfer

Platz

Schedifkaplatz

Schöpfwerk

Gutheil-Schoder-Gasse

Inzersdorf LokalbahnNeu Erlaa

Schönbrunner Allee

Vösendorf-Siebenhirten

Grillgasse

Kledering

RennwegBiocenter ViennaSt. Marx

Geisel-bergstr.

Zentralfriedhof

Kaiserebersdorf

Schwechat

Haidestraße

Praterkai

Stadlau

Erzherzog-Karl-Straße

Süßenbrunn

Gerasdorf

Siemensstraße

Brünner Straße

Jedlersdorf

Strebersdorf

Traisengasse

Franz-Josefs-Bahnhof

MessePrater

Krieau

Kaisermühlen VICWähringer Straße Volksoper

Schottentor

Schwedenplatz

Kagraner Platz

Aderklaaer Straße

Großfeldsiedlung

Kardinal-Nagl-PlatzGumpendorferStraße

Michelbeuern AKH

JosefstädterStraße

BurggasseStadthalle

Kendlerstraße

Ottakring

John-straße

Schwegler-straße

Niederhof-straße

Thaliastraße

Alser Straße

Nußdorfer Straße

Jäger-straße

Friedensbrücke

Dresdner Straße

Handelskai

Floridsdorf

Stadt-park

Karlsplatz

Stadion

Prater-stern

Keplerplatz

Reumannplatz

Landstraße (Bhf. Wien Mitte)

Rochusgasse

Schlachthausgasse

Erdberg

Gasometer

Enkplatz

Zipperer-straße

Simmering

Donauinsel

Alte Donau

Kagran

Rennbahnweg

Rathaus

Bahnhof Meidling

HütteldorferStraße

Volks-theater

Roßauer Lände

Taubstummen-gasse

Südtiroler PlatzHauptbahnhof

Taborstraße

Nestroypl.

Neue Donau

Heiligenstadt

Spittelau

Vorgartenstraße

Museums-quartier

Hütteld

orf

Ober S

t. Veit

Unter S

t. Veit

Braunsc

hweig

-

gasse

Hietzing

Schön

brunn

Meidling

Haupts

traße

Läng

enfeld

g.

Margare

ten-

gürte

l

Ketten

brücke

n-

gasse

Herren

gasse

Stuben

tor

Schottenring

Pilgram

-

gasse

Ziegle

rg.

Neuba

ug.Westbahnhof

Wolf in d

er Au

StephansplatzDonaumarina

Donaustadt-brücke

Hardeggasse

Donauspital

Aspern Nord

Am Schöpfwerk

Perfektastraße

Alterlaa

Tscherttegasse

Siebenhirten WLB Wiener Neudorf, Baden (Endstation)

Erlaaer Straße

Seestadt

Aspernstraße

Hausfeld-straße

Hirschstetten

© W

iene

r Lin

ien,

Sep

tem

ber 2

017

Page 20: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

• Web-interface,which is availableon the localwebpagehttp://127.0.0.1:8529/.Wehavetologintothisinterfaceandselectanecessarydatabase.Toimportthedatasetitisnecessarytocreateanewcollectionofdata,openitandimportthefile(byclickingImport->Selectfile->ImportJSON).Unfortunately,itisimpossibletoimportCSVfilebythisway;

• By command-line request through arangosh app. To import data we need toaccessit,enterourpassword.Toimportfilewecanusethefollowingline:arangoimp --file data.json --collection commits --create-collection true

Toimportcsvfileoneshouldadd--typecsvparameter.Incaseofnecessitywecanselectthedatabase,hostandotherparameters.Onecanreadthefulldescriptionofallpossibleoptionsintheintegratedmansupportarticlebytyping“arangoimp–help”.

• Thethirdopportunityisdirectinteractionbetweendatabaseandapplicationthroughtheappropriatedriver.

Figure7.Exampleoftherawdataset

Page 21: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure8.ExampleofthefinaldatasetformetrolineU1.

Figure9.Exampleofedgesdataset.

Page 22: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

AccordingtotheArangoDBarchitecture,alltheverticesshouldconsist“_key”attributeaswellasedges“_from”,“_to”and“_key”attributes.Sometimesthereweresomeproblemswith data import, so it was necessary to convert data in csv format to json by webtoolhttp://www.csvjson.com/csv2json.

Furthermore,someproblemswithcharsetalsoweremetbecauseoriginal languageofthe dataset is German. So, it was necessary to fix some damaged data, like missing specialsymbolslikeä,ö,ü,ë,ß.Moreover,keyattributesmustnotcontainsuchsymbolsandspaces.

PrototypeTomaketheprototypewewilluseJava1.8anddriverforArangoDBtohaveaconnection

betweenuserinterfaceandtheapplication.Afterexportingofthedatasetthefollowinggraphwasreceived:

Figure10.GraphofU-BahnnetworkinWien

Currentprototypeshowstheopportunitytomakeagraphsearchthroughtheshortest

pathalgorithmofthedatabase.TheAQLqueryis:FOR h1, h2 IN OUTBOUND SHORTEST_PATH '"+st+"' TO '"+en+"' GRAPH

'U' RETURN [h1._key,h2._key] Wherestanden–variablesofkeyforstartandendstations.Itisclearthatitisonlya

prototype. In future, there is anopportunity to extend functionality: addgraph visualization,additionalparameterstocomputeandimprovetheGUItomakeitmoreappropriateformobiledevicesanduser-experience.

Page 23: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

Figure11.Prototypescreenshot

OtherusecasesTheproposed idea isnotoneofthepossibleprojectswhichcouldbe implementedby

usingArangoDB.As itwasmentionedbefore,ArangoDBisanativegraphdatabase,soall thepossibleprojectwhichcouldbeexecutedbyusingsuchtechnology,canbedevelopedonitsbase.Forexample:

• Flightsanalysis;• Social-network analysis (e.g. for internal networks like in KPMG andMcKinsey

companies);So, according to the information from ArangoDB GmbH

(https://www.arangodb.com/why-arangodb/case-studies/thomson-reuters-fast-secure-single-view-everything-arangodb/),thekeyusersoftheArangoDBare:

• ThomsonReuters–analyticalcompanywhichprovides intelligence, technologyandhumananalysesandexpertise;

• FlightStats – company which deals with flight information: historical analysis,currentmonitoringandpredictiveservices;

• andlotsofothercompaniesandacademicprojectsbyOxforduniversity,KIT.

Page 24: A report on INFO-H-415: Advanced Databases. Project. · 2017-12-19 · Introduction Nowadays, we are interacting with variably represented information: plain texts, tables, trees

ConclusionFinally, graph database is a really important type of data storage which provides an

opportunitytostoreandrepresentconnecteddata,bymakingitsstoringmoresimilartorealworld.Forexample,relationsbetweenpeople,transport,financialoperations,decisionmakingsystems,etc.

ArangoDBisoneofthedatabaseswhichprovidesanopportunitytostoredatainsuchformat. Themain advantageof this database in comparisonwith rivals – it is a native graphstorage. Three types of store enginesmake the system very flexible for different situations.Furthermore, it is anexcellent technology todeploy a graph-depended system like transportrouting, social network and other net-oriented cases with impressive performance and lowsystemrequirements.Firstly,itwasproductedin2011,soithasnothingincommonwithsomeoutdatedtechnologies,whichcouldbereasonforanycompatibilityandperformanceproblemswithmoreagedsolutions.Fortunately,thelicensingsystemprovidestheopportunitytouseafullversionofthedatabaseforfreewithoutanylimitations.

TheperformanceofArangoDBlooksreallyimpressivewhenitdealswithbasicoperationssuchassinglereadandwrite,whensearchoftheshortestroutesometimestakesalittlebitmoretimethanthesamefigureofitsrivals(Postgres,Neo4jandMongoDB).Butthemostimportantadvantage of this technology is opportunity to store data at the same time in graph and indocument store. It is really interesting feature, because it significantly reduces time of thedevelopmentforprojectswhichstoredataindifferentstorages.

Moreover,ArangoDBhaveapowerfulclusteringsystemthatmakesitreallyimpressiveincaseofuseasadistributedstoragewithdifferentrolesofparticipants.

After the research, some undocumented features and problemswere discovered (forexample,driverconnection,performance,etc.),whichwereresolvedbycontactingthesupportorlookingforthesameproblemonStackoverflowwebsite.

As a part of the research, the prototype of routing application of public transport inViennawasdeveloped.Itusessearchingopportunityofthedatabasetofindtheshortestwaybetweenthebus-/tram-stops/metro-stations.ThedeveloperofthesystemclaimedthatnativelanguageAQLisveryusefulincaseofinteractingwithgraphdata.Itwasproofedinthisresearch.

References1. https://www.arangodb.com/2. http://neo4j.com3. https://github.com/weinberger/nosql-tests4. http://db-engines.com5. http://stackoverflow.com6. https://github.com/jgraph/jgraphx7. http://open.wien.gv.at8. http://wienerlinien.at9. http://www.csvjson.com/csv2json