a report on info-h-415: advanced databases. project. · 2017-12-19 · introduction nowadays, we...
TRANSCRIPT
Areporton
INFO-H-415:AdvancedDatabases.Project.
By
AlekseiKaretnikov(000455065)
Content
INTRODUCTION......................................................................................................................3
GRAPHDATABASE..................................................................................................................4Whatisit?..................................................................................................................................................4Differencesbetweenothertechnologies...................................................................................................4Querylanguages.........................................................................................................................................5
ARANGODB.............................................................................................................................7
Features..................................................................................................................................................7BenefitswithsameDB...............................................................................................................................9DBarchitecture..........................................................................................................................................9Storageengines........................................................................................................................................10Hardware/Softwarerequirements...........................................................................................................10Installation................................................................................................................................................10
Pricingschema......................................................................................................................................12
Interactions..........................................................................................................................................13Benchmark...............................................................................................................................................13
USECASE...............................................................................................................................16
Problem................................................................................................................................................16
Dataset.................................................................................................................................................19
Prototype.............................................................................................................................................22
Otherusecases.....................................................................................................................................23
CONCLUSION........................................................................................................................24
REFERENCES.........................................................................................................................24
IntroductionNowadays,weareinteractingwithvariablyrepresentedinformation:plaintexts,tables,
trees and others. Sometimes, it is impossible to imagine how it is possible to represent theinformation,whichisexistsasatreeinrealworld,inrelativedatabase.Itisclearthatonecanimplementlotsofmany-to-manyconnectionanddifferentattributesinsuchsystembutittakesmore time to apply lots of joins in queries. The graph database can be a solution for suchproblems.Itconsistsnoopportunitiestojoindifferentdatasetsandproblemswithperformance–itisanantherNoSQLapproach,whichprovidesdirectconnectionswithdata.
There are lots of different databases which deal with structured data: neo4j, Oracle,Postgres,ArangoDB,OrientDBandother.Allofthemprovidessomecommonandsomeuniquefeatures:querylanguages,storeengine,systemrequirements,pricing,overallperformant,etc.ItisclearthatNoSQLsolutionalsorequiresaspeciallanguageforinteraction(e.g.AQL,Cypher,SPARQLandotherlanguages).
Forthisresearchproject,theArangoDBwasselectedasagraphdatabase.Itlooksalittlebitdifferentthantherivals:itisreallyyoungtechnology–itwasdevelopedin2011byGermancompanyArangoDBGmbH,butitalreadyhasitscustomers.Accordingtothedevelopers’release,ArangoDB has the best performance in comparison with competitors with impressiveopportunitiesindatainteraction.Moreover,supportofthistechnologyisreallysimple.
Furthermore, any technologymust have a commercial advantage. In other way, it isineffective.Accordingtothevendors’ information,ArangoDBisthebestsolutionbecauseit isfreeforeveryoneandnotsoexpensivewhenweneedsomeadditionalfeaturesandsupport.
So, the main aim of this research project is to explain the graph database features,advantagesanddisadvantagesinArangoDBanditscommercialopportunities.
GraphdatabaseWhatisit?GraphDatabaseisadatabasetypewhichstoresdatainverticesandedgesofthegraph.
Itbasedonthemathematicalgraphtheory.Inaddition,eachelementofthegraphdatabasecanconsistanynumberofadditionalproperties(additionalkey-valuepairs).Themainfeatureofsuchtypeofstorageishighlyconnecteddataindependentlyofthevolumeofthedataset.Itmeansthatall“join-like”operationsareprocessingbyusingpersistentconnectionsbetweennodes,soittakesconstant-time[2].
Nowadaysthisapproachisbecomingmoreandmorepopular.Itisclearthatwearelivingintheworldofconnecteddata,sosuchtypeofdatastructuressignificantlybetterrepresentstherealinformationinthedatabase.
Thecommonplaceofdatainsuchtypeofdatabasesisnodes(vertex)ofthegraph.Itcaninclude different properties (usually, schema-less), label of the node and some additionalmetadata(suchasindexorconstrains).Inothertypeofdatastructuresnodesarerepresentedasrecordsinrelationaldatabasesordocumentsindocumentstorage.
All the relations in graph databases (which are represented by edges) are named forsemantic connectionbetweennode-entitiesandhave the startand theendnode.So,all therelations are directed. Normally, properties of the edges include information about distancebetweennodes,weight,cost,etc.Ontheonehand,itimprovesinteractionwithallthedirectlyconnecteddatabutontheotherhandsimpledeletionofconnectednodefollowsfulldeletionoftherelationship[1].
Therearemanydifferentapproachesofstoragemechanisms ingraphdatabase.SomeDBMS store data in tables, others in document or key-value storages,which provides bettercompatibilitywithNoSQLapproach.
Themostfamousgraphdatabasesare:ArangoDB,Neo4j,Oracle,OrientDB,SAPHANA,Teradata,SQLServer,GraphDB.
DifferencesbetweenothertechnologiesThemainaimofgraphdatabases isstoringofconnecteddatato improve interactions
betweenalltheconnectednodes.So,themaindifferencesofsuchapproachincomparisonwithothertypesofdatabasescorrespondtodifferentaimsofthedatabasesandso,differentdatastructureswhichareusedtostoreit.Inthetablebelowthedatatypeswhichcouldberecognizedlikethesameinthemostpopulartypeofthedatabasesarepresented[1].
Typeofthedatabase NameoftheelementRelational(alsotemporal,key-valueandspatial) Row/recordortupleDocument DocumentGraph NodeObject Object
Oneofthemainfeaturesofthegraphdatabasesincomparisonwithothertypeisbetter
performanceofinteractionwithconnecteddata.Eachofmentioneddatabasetypeisthemostrelevantforparticularcase.Forexample,relationaldatabasesarethebestsolutionforstoringtable valuesand its aggregation,document store for storingof thewholedocuments,objectdatabasesforfilestoring.Graphdatabasesaresimultaneouslyorientedforspecificproblems.Initscase,itisfindingoftheshortestpath,connecteddatastoring,etc.So,themostrelevantcasesforusingthistechnologyare:
• real-timerecommendations;• decisionmakingengines;• socialnetworks;• accessmanagement;• routeplanning;• dataandknowledgemanagement;• frauddetection.
It isclear that thevastmajorityofotherNoSQLdatabasesareaggregateorientedbut
frequently suchoperationsare veryexpensivebecauseof largevolumesofdata. In this casegraphdatabasesarethebestsolutionforsuchproblemsbecause it isnotnecessarytospendresourcesforjoiningofdifferentrecords,whichareconnectedbydefault.
QuerylanguagesOneofthefeaturesofGraphdatabaseisanewproblemofthequerylanguageselection.
Unlike traditional relational databaseswhichuse SQL as a query languages, there are lots ofdifferentquery languages for this typeofdatabases,whichareaimedfor resolvingofcertainproblems.Themostfamousquerylanguagesforgraphdatabasesare[2]:
• SQL(itstillcanbeused);• AQL(ArangoDBQueryLanguage);• Cypher(themostpopulardeclarativelanguageforNeo4j);• GraphQL(thelanguagewhichcreatedbyFacebook);• Gremlin(languageforApacheTinkerPop™project.CanbeusednativelyinJava,Scala,JS,
Groovy,ClojureandPython);• SPARQL(SQL-likelanguageforRDFgraphs).
SQLSQLforgraphdatabasesdoesnotcontainsomespecificoperatorssoitisnotnecessary
toreviewitindetails.
AQLThemostsimilarforSQLlanguageisAQL,whichwascreatedespeciallyforArangoDB.In
comparisonwithSQL,“AQLdoesnotsupportdatadefinitionoperations,suchascreatinganddropping databases, collection and indexes” [1]. Sometimes it overlaps SQL keywords, forexampleFILTERclause,whichisequivalenttoWHEREclauseinSQL,butwhenallthequeriesinAQL are executed from left to right, so position of this clause in the query determines theprecedence.
Forexample,theAQLcode:“FOR user IN users FILTER user.active == true RETURN user” is the following SQL query: “SELECT * FROM users WHERE active=true”. CypherCypherisalsoSQL-inspiredquerylanguagewhichwascreatedforNeo4jdatabase.The
mainadvantageofthislanguageisopportunitytodescribetheresultwhichisnecessarytoobtainwithoutdescribingofthenecessaryprocedures,aimedtoobtaintheresult.
Forexample,thecode“MATCH (s:Sales {items:’Beer’}) RETURN s.items, s.price” corresponds to the following SQL query: “SELECT s.items,s.price FROM sales AS s WHERE s.items=’Beer’). GraphQLGraphQLisaspecialquerylanguageforFacebookAPI.So,itisnotdirectlyspecifiedfor
graphdatabases.Thequeryofthislanguagesspecifiesonlyobject’sstructure,forexample:“{ product(quantity>100) { productName productPrice } }”. ThemainusersofthislanguageareFacebook,Pinterest,Dailymotion,GitHub,Coursera,
IntuitandShopify.
GremlinThemain featureof this language isnative supportof themost famousprogramming
languages.Eachqueryconsistsasequenceofatomicoperations(steps)ofthedatastream.For
example,thenextcodetakesvertexwithname“Austria”,movestocountriesaroundandselectstheirnames.
g.V().has(“name”,”Austria”).out(“border”).values(“name”) SPARQLThislanguagewascreatedbyW3CconsortiumforRDF(ResourceDescriptionFramework,
technologyfordataserializationandknowledgemanagement,whichisoriginallydesignedasametadatamodel)graphs.ThefollowingcoderepresentsselectionofthetitleinthegivenRDFtriple:
SELECT ?title WHERE { <http://example.com/book/book1>
<http://purl.org/dc/elements /1.1/title> ?title . }
ArangoDBFeaturesArangoDBhaslotsofdifferentinterestingandimportantfeatures:
• Themostimportantoneismulti-model(graph,key/valueanddocumentmodels)implementationwhichprovidesanopportunitytointeractwithdifferentmodelsinsinglequery.Suchapproachhelpstoimprovetheperformanceofthedatabase.
• Reduced operational complexity, which improves the selection of the mostappropriatemodelforgivendata.
• SupportofACID(Atomicity,Consistency,Isolation,Durability)transactions.• Modulararchitecturewhichprovidesfaulttolerance.• Lowcostofownership.• IntegratedframeworkArangoDBFoxx,whichprovidesnativeaccesstoin-memory
databyusingJavaScript.• Horizontalandverticalscalability.• WEBUIforgraphvisualizationandAQLqueries.
Figure1.ExampleofArangoweb-interface.Statistics.
Figure2.Exampleofqueryviewinweb-interface.Source:www.arangodb.com
Figure3.Exampleofgraphviewinweb-interface.Source:www.arangodb.com
BenefitswithsameDBThere are lots of different graph databases which provide almost the same range of
functions.ArangoDBcanbecomparedwithNeo4jandOrientDBbutitofferssomesupplementalopportunities (according to the research and information, provided by DB developers,vschart.comanddb-engines.comwebsites).
ThemostvaluableadvantagesofArangoDBare[1]:1. It ismulti-model DBMSwhenNeo4j is single-model graph database. If the project
requiresadifferentstoragetype,itisnecessarytouseanotherdatabasetostoreit.Atthesametime,ArangoDBprovidesgraph,documentandkey/valuemodels.
2.ArangoDBprovidesthebestperformanceininteractionswithdatafromdifferentdatamodelqueries.
3.ArangoDBoffersimpressivescalabilityopportunities.4. In comparison with Neo4j, all the transactions can be encrypted by TSL or SSL.
Moreover,inconsistrole-basedcontrol(viaFoxxframework)andauditinginEnterpriseversion.5. it compatible with larger number of NoSQL query languages than OrientDB, so it
providesbetteropportunitiesfordatainteractions.DBarchitectureA cluster of ArangoDB consist of a number of database instances which can play 4
differentroles:agents,coordinators,primaryandsecondaryservers.
Agents can form the Agency in the cluster, which is the main part of the clusterconfiguration.Itprovidessynchronizationservicesforthewholecluster.
CoordinatorsareworkingwithqueriesandFoxxservices.Primaryserversaretheplaceswheredataishosted.Secondary servers are asynchronous replicas of primaryones. There areoneormore
secondaryserversforeachprimary.Theyaresuitableforbackupactions.StorageenginesArangoDBprovidestwodifferentstorageengines:traditionalmemory-mappedfilesand
RocksDBbasedengine.Theselectionoftheenginedependsontherealcase.Thetraditionalmemory-mappedfilesenginehassignificantlylongerstartuptimebutit
makes impressive performance with in-memory data queries. So, this engine is suited fordatasetswhichcanberepresentedinthemainmemory.
Thenew“rocksdb”enginewasdevelopedforlargedatasetswhichcannotbekeptinthemainmemory.Incomparisonwiththetraditionalapproach,thisenginesavesalltheindexesonthediskthatmakesveryfaststartup.
Unfortunately,itisnotpossibletocombinedifferentenginesintheoneinstallationofthedatabase.So,itmustbeselectedforthewholeserverorclusteratthetimeofinstallation.
Hardware/SoftwarerequirementsArangoDB runs on Linux, OS X andMicrosoft Windows both 32bit (according to the
architecturelimits,itcanusenomorethan2-3GBofRAM)and64bitsystems.TherearenocriticalrequirementsforthisDBMS.Forexample,accordingtotheofficial
manual,itispossibletorunitonRaspberryPi.Furthemore,ArangoDBcanbe installedasaDockercontainer, inDC/OS,onWindows
AzureandAmazoncloudservices.InstallationArangoDBcanbeinstalledindifferentoperationsystems.So,thisprocessisvariousand
dependsontheenvironment.MacOSXTherearetwowaystoinstallArangoDBinOSX:1.Graphicalapplication.Inthiscasethe.appcontainershouldbedownloadedandmoved
totheApplicationfolder.2.InstallationthroughHomebrew.ItispossibletoinstallthelatestversionofArangoDB
bypackagemanagerHomebrewwiththecommand:“brew install arangodb”.Thenitcanbe started from the default installation folder /USR/LOCAL/OPT/ARANGODB/SBIN/ by usingcommand“arangod”.Consequently,weneedtoinitializethedatabasebyexecutingapplication“arango-secure-installation”andsetarootpassword.
LinuxTherearefewdifferentprecompiledpackagesforRedhat,Fedora,Debian,Ubuntuand
SuseLinuxdistributives.Normally, it isnecessary todownload thepackage for thenecessaryLinuxversion.
Anotherwayofinstallation–usingofpackagemanagers.Theinstallationcodeforapt-get,yumandzypperpackagemanagersispresentedbelow.
Apt-getcurl -O
https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/Debian_9.0/Release.key
sudo apt-key add - < Release.key ECHO 'DEB
HTTPS://WWW.ARANGODB.COM/9C169FE900FF79790395784287BFA82F0DC0059375A34A2881B9B745C8EFD42E/ARANGODB32/DEBIAN_9.0/ /' | SUDO TEE /ETC/APT/SOURCES.LIST.D/ARANGODB.LIST
SUDO APT-GET INSTALL APT-TRANSPORT-HTTPS SUDO APT-GET UPDATE SUDO APT-GET INSTALL ARANGODB3E=3.2.8 YUMcd /etc/yum.repos.d/ curl -O
https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/CentOS_7/arangodb.repo
yum -y install arangodb3e-3.2.8 Zypperzypper --no-gpg-checks --gpg-auto-import-keys addrepo
https://www.arangodb.com/9c169fe900ff79790395784287bfa82f0dc0059375a34a2881b9b745c8efd42e/arangodb32/openSUSE_13.2/arangodb.repo
zypper --no-gpg-checks --gpg-auto-import-keys refresh zypper -n install arangodb3e=3.2.8 CompilingFurthermore, if theprecompiledpackage isnotavailable, it ispossibletocompileand
buildArangoDBfromrawsource.Forexample,forusingitonRaspberryPi.ArangoDBwastestedwithGNUC/C++andclang/clang++compilerswithC++11-enabledargument.Thismethodisnotverypopularfornormalusing,soallthenecessaryinformationaboutcompilingcanbefoundontheArangoDBdocumentationwebsite:https://docs.arangodb.com/.
WindowsFor Microsoft Windows the Windows Installer package is available. At the time of
installationusercanselectthe installationpath,makesingle-ormultiuser installation,keepabackup,createadesktopiconandotherparameterswhichtheInstallationWizardofferstotheuser.
PricingschemaArangoDBislicensedunderApache2.0publiclicense.So,fromtheenduser’spointof
view(excludingdevelopment,distributing,etc.), itmeansthatArangoDB is fully free fornon-commercialandcommercialuse.
Moreover,thereare2commercialsubscriptions:BasicandEnterprise.
Features Community Basic Enterprise
CommunityEditionFeatures ✓ ✓ ✓SatelliteCollections ✓SmartGraphs ✓EncryptionatRest ✓EnhancedUserManagementwithLDAP
✓
Training Free,onlineeducation ✓ ✓ ✓Private,on-demandtraining ✓Support SLA none 9×5 24×7ResponseTime criticalissues noguarantee 12hours* 2hourslevel2issues noguarantee 16hours* 5hourslevel3issues noguarantee 40hours* 16hoursNumberofissues 10
permonthunlimited
Supportcontacts google-grouponly
1email,web
4email,web,phone
Technicalalerts ✓ ✓Hotfixes general
release-cyclegeneralrelease-cycle
✓
criticalissues noguarantee 12hours* 2hourslevel2issues noguarantee 16hours* 5hoursLicense Type ApacheV2 ApacheV2 Commercial
In order to estimate the possible expenditures for commercial subscription to theArangoDB,theSalesdepartmentofArangoDBGmbHwascontacted.AccordingtotheArangoDBGmbHpolitics,pricemodelisvariablefordifferentprojectsandtheaveragefigurescannotbeprovided.
InteractionsArangoDB uses AQL language as amain query tool. The query can be executed from
CommandLineandWebUI.Furthermore,thereisanimpressivenumberofdriversfordifferentlanguages:
• NodeJS;• PHP;• Java;• JavaScript;• .NET;• Go;• Python;• Scala;• Railo;• D;• Dart;• Vert-X;• Gremlin;• RubyonRails;• Clojure;• Elixir;• C++.
All the native drivers are presented on the project’s GitHubhttps://github.com/arangodb/withallthenecessarydocumentation.
Benchmark Whenitcomestotheperformanceofthedatabase,wecanusethespecialbenchmarkfordifferentNoSQLdatabases (https://github.com/weinberger/nosql-tests)which isbasedonNodeJS.Thetestcollectionconsists100000elementstoread,write,findtheneighbor,makeaggregationsandfindtheshortestway.Wecansimply install itontheUNIXbasedoperationsystembysimplesteps: git clone https://github.com/weinberger/nosql-tests.git npm install . npm run data
Tostartallavailabletests,weneedtostarttheserverwithadditionalparameters:./bin/arangod /mnt/data/arangodb/data-2.7 --server.threads 16 --wal.sync-interval 1000 --config etc/relative/arangod.conf --javascript.v8-contexts 17
As soon as ArangoDBwas updated to version 3, scheduler.threads parameter is nowobsoleteandhavatobeexcluded.Wehavetotypethefollowingcommandtostartallthepossibletests.node benchmark arangodb -a 127.0.0.0 -t all
Itisrecommendedtostarttheserverwhere127.0.0.1–IPaddressoftheinstalledandrunArangoDB.
Unfortunately,theavailablebenchmarkwasdeveloped2yearsagoforArangoDB2,whenthecurrentversionis3.So,itwasnecessarytofixitbutitunfortunatelyitstillnotpossibletotesttheperformance
Accordingtotheexecutedtest,wehavereceivedtheresultsforthewholecollectionof100.000elements.TheresultsofthistestincomparisonwiththeresultsfromArangoDBwebsitearepresentedbelow:
Test SingleRead SingleWrite Aggregation Shortest
pathNeighbors(1+2deg)
Neighbors(withprofiledata)
Result(sec.) 12.672 20.194 - - - -Vendor’sresults
16.962 20.530 1.250 0.061 0.464 4.327
Asaresult,wecanseethatatleastsinglereadoperationbecamequickerinnewversion
ofArangoDB.It isclearthattheresults(whichareavailable)arealmostthesamethattheDBvendorprovides.So,wecanusetheirresultsforfurthercomparison,whicharepresentedinthetablebelow.
SuchresultsshowusthatArangoDBhasbetterperformancethanthevastmajorityofotherGraphdatabases.
UsecaseProblemNowadaysproblemoftheloadofpublictransportisbecomingmoreandmoreimportant.
Governmentsareinterestedtomoveallthepeoplefromprivatetopublictransporttoimproveroadsituationincities,atthesametimepeopleareinterestedinmaximalconvenienceofthetrip: distance from the real destination from the closest bus/tram-stop, non-overcrowdedvehiclesandoptimaltimeofthetrip.Toresolvethefirstproblems,wehavetoincreasenumberofvehiclesbuttwootherwecanresolvebydataresearch.Moreover, it isclearthatthevastmajorityofcurrentroute-plannersconsideronlynominaltrip-timebutinrealliferoadtrafficcansignificantlyeffectonthetriptime.So,thequickestroutecanbecomethelongestbecauseofsuchproblems.
Asaresult,wehaveacolossalnumberofdifferentobjectswithlotsofdifferent1:1or1:nrelations.Wecanstoreitinhabitualrelationaldatabasewhichlotsofdifferentbridges.So,tobuild the routewe need to consider lots of connections between themby using lots of joinoperations.Itisclearthatbuildingoftheroutewithsomechangescanbecomereallydifficultandhardtocompute.Fortunately,wecanusegraphdatabasetosimplifythisproblemandmakeitmoreoptimizedbecause,itIsclear,thatnormaltransportnetworkisagraph.So,wecanstoreinformationaboutourobject(thenetwork)initsnativeform–arrayofverticesandedgeswithsomeattributes.
Let’slookatthemapsofVienna’stram(Straßenbahn)andmetro(U-Bahn)maps().Wecanseethatbothimagesarethegraphs.So,wecansimplystoreitintheArangoDBgraph.Asaresult,wecanreceiveaunitedgraphoftramandmetrostationswithinformationaboutpossibletrafficproblemsfortrams(sometimes,tramrailsareseparatedfromtheroadnetworkandwedo not have to recognize road traffic for such path segments, it ismentioned on the imagebelow).
Conclusively,ourtaskistostoreourdataandbuildthequickestwaybypathfindingandrecognizingoftransportsituationontheroadwhereitisapplicablewiththebestperformancebyusingArangoDB.
Situation(a) Situation(b)
Figure4.Differentssituationswithtramrails
Figure5.SchemeofVienna’stram.Informationfromhttp://www.stadt-wien.at
Figure6.MapofVienna'smetro.Informationfromhttp://www.stadt-wien.at
DatasetTheproblemtouchespublictransportinVienna,sowewilluseinformationfromtheweb-
portal“AustrianOpenData”https://www.data.gv.at/.Unfortunately,allthenecessarydatasetsarerepresentedingeographicallyappropriateformatandwehavetousefewdatasetstobuildthenecessaryone,whichconsistsstopsandconnectionsbetweenthem.
Wealsoneedtostoresomeadditionalinformationwhichcan’tberepresentedasagraph(orwithlotsofdifficulties).
Wewillusethefollowingsets:• U-BahnnetzBestandWien• ÖffentlichesVerkehrsnetzHaltestellenWien
AllthedataarepresentedinWFS,soitisnecessarytomakeapreprocessingprocedure
tomakeallthenecessaryvertexesandedgesofthefinalgraph.Forthisreason,asmallconverter,basedonJava,wasdeveloped.ToimportdatatoArangoDB,onecanusejsonorcsvformatofdata.Itispossibletoimportdatabythreeways:
www.wienerlinien.at
SCHNELLVERBINDUNGEN IN WIEN
Infostelle derWiener Linien
Ticketstelle derWiener Linien
U-Bahn-Linie
S-Bahn-Linie
Lokalbahn Wien-Baden
City Airport Train(Eigener Tarif,VOR-Tickets ungültig)
Vienna InternationalBusterminal
Kundenzentrumder Wiener Linien(U3 Erdberg)
Park & Ride
Troststraße
Altes Landgut
Alaudagasse
Neulaa
Oberlaa
Nußdorf
Oberdöbling
Leopoldau
Krottenbachstr.
Gersthof
Hernals
Breitensee
Penzing
Weidling
au
Purkers
dorf-S
anato
rium
Haders
dorf
Speising
Hetzendorf
Atzgersdorf
Liesing Blumental
QuartierBelvedereMatzleinsdorfer
Platz
Schedifkaplatz
Schöpfwerk
Gutheil-Schoder-Gasse
Inzersdorf LokalbahnNeu Erlaa
Schönbrunner Allee
Vösendorf-Siebenhirten
Grillgasse
Kledering
RennwegBiocenter ViennaSt. Marx
Geisel-bergstr.
Zentralfriedhof
Kaiserebersdorf
Schwechat
Haidestraße
Praterkai
Stadlau
Erzherzog-Karl-Straße
Süßenbrunn
Gerasdorf
Siemensstraße
Brünner Straße
Jedlersdorf
Strebersdorf
Traisengasse
Franz-Josefs-Bahnhof
MessePrater
Krieau
Kaisermühlen VICWähringer Straße Volksoper
Schottentor
Schwedenplatz
Kagraner Platz
Aderklaaer Straße
Großfeldsiedlung
Kardinal-Nagl-PlatzGumpendorferStraße
Michelbeuern AKH
JosefstädterStraße
BurggasseStadthalle
Kendlerstraße
Ottakring
John-straße
Schwegler-straße
Niederhof-straße
Thaliastraße
Alser Straße
Nußdorfer Straße
Jäger-straße
Friedensbrücke
Dresdner Straße
Handelskai
Floridsdorf
Stadt-park
Karlsplatz
Stadion
Prater-stern
Keplerplatz
Reumannplatz
Landstraße (Bhf. Wien Mitte)
Rochusgasse
Schlachthausgasse
Erdberg
Gasometer
Enkplatz
Zipperer-straße
Simmering
Donauinsel
Alte Donau
Kagran
Rennbahnweg
Rathaus
Bahnhof Meidling
HütteldorferStraße
Volks-theater
Roßauer Lände
Taubstummen-gasse
Südtiroler PlatzHauptbahnhof
Taborstraße
Nestroypl.
Neue Donau
Heiligenstadt
Spittelau
Vorgartenstraße
Museums-quartier
Hütteld
orf
Ober S
t. Veit
Unter S
t. Veit
Braunsc
hweig
-
gasse
Hietzing
Schön
brunn
Meidling
Haupts
traße
Läng
enfeld
g.
Margare
ten-
gürte
l
Ketten
brücke
n-
gasse
Herren
gasse
Stuben
tor
Schottenring
Pilgram
-
gasse
Ziegle
rg.
Neuba
ug.Westbahnhof
Wolf in d
er Au
StephansplatzDonaumarina
Donaustadt-brücke
Hardeggasse
Donauspital
Aspern Nord
Am Schöpfwerk
Perfektastraße
Alterlaa
Tscherttegasse
Siebenhirten WLB Wiener Neudorf, Baden (Endstation)
Erlaaer Straße
Seestadt
Aspernstraße
Hausfeld-straße
Hirschstetten
© W
iene
r Lin
ien,
Sep
tem
ber 2
017
• Web-interface,which is availableon the localwebpagehttp://127.0.0.1:8529/.Wehavetologintothisinterfaceandselectanecessarydatabase.Toimportthedatasetitisnecessarytocreateanewcollectionofdata,openitandimportthefile(byclickingImport->Selectfile->ImportJSON).Unfortunately,itisimpossibletoimportCSVfilebythisway;
• By command-line request through arangosh app. To import data we need toaccessit,enterourpassword.Toimportfilewecanusethefollowingline:arangoimp --file data.json --collection commits --create-collection true
Toimportcsvfileoneshouldadd--typecsvparameter.Incaseofnecessitywecanselectthedatabase,hostandotherparameters.Onecanreadthefulldescriptionofallpossibleoptionsintheintegratedmansupportarticlebytyping“arangoimp–help”.
• Thethirdopportunityisdirectinteractionbetweendatabaseandapplicationthroughtheappropriatedriver.
Figure7.Exampleoftherawdataset
Figure8.ExampleofthefinaldatasetformetrolineU1.
Figure9.Exampleofedgesdataset.
AccordingtotheArangoDBarchitecture,alltheverticesshouldconsist“_key”attributeaswellasedges“_from”,“_to”and“_key”attributes.Sometimesthereweresomeproblemswith data import, so it was necessary to convert data in csv format to json by webtoolhttp://www.csvjson.com/csv2json.
Furthermore,someproblemswithcharsetalsoweremetbecauseoriginal languageofthe dataset is German. So, it was necessary to fix some damaged data, like missing specialsymbolslikeä,ö,ü,ë,ß.Moreover,keyattributesmustnotcontainsuchsymbolsandspaces.
PrototypeTomaketheprototypewewilluseJava1.8anddriverforArangoDBtohaveaconnection
betweenuserinterfaceandtheapplication.Afterexportingofthedatasetthefollowinggraphwasreceived:
Figure10.GraphofU-BahnnetworkinWien
Currentprototypeshowstheopportunitytomakeagraphsearchthroughtheshortest
pathalgorithmofthedatabase.TheAQLqueryis:FOR h1, h2 IN OUTBOUND SHORTEST_PATH '"+st+"' TO '"+en+"' GRAPH
'U' RETURN [h1._key,h2._key] Wherestanden–variablesofkeyforstartandendstations.Itisclearthatitisonlya
prototype. In future, there is anopportunity to extend functionality: addgraph visualization,additionalparameterstocomputeandimprovetheGUItomakeitmoreappropriateformobiledevicesanduser-experience.
Figure11.Prototypescreenshot
OtherusecasesTheproposed idea isnotoneofthepossibleprojectswhichcouldbe implementedby
usingArangoDB.As itwasmentionedbefore,ArangoDBisanativegraphdatabase,soall thepossibleprojectwhichcouldbeexecutedbyusingsuchtechnology,canbedevelopedonitsbase.Forexample:
• Flightsanalysis;• Social-network analysis (e.g. for internal networks like in KPMG andMcKinsey
companies);So, according to the information from ArangoDB GmbH
(https://www.arangodb.com/why-arangodb/case-studies/thomson-reuters-fast-secure-single-view-everything-arangodb/),thekeyusersoftheArangoDBare:
• ThomsonReuters–analyticalcompanywhichprovides intelligence, technologyandhumananalysesandexpertise;
• FlightStats – company which deals with flight information: historical analysis,currentmonitoringandpredictiveservices;
• andlotsofothercompaniesandacademicprojectsbyOxforduniversity,KIT.
ConclusionFinally, graph database is a really important type of data storage which provides an
opportunitytostoreandrepresentconnecteddata,bymakingitsstoringmoresimilartorealworld.Forexample,relationsbetweenpeople,transport,financialoperations,decisionmakingsystems,etc.
ArangoDBisoneofthedatabaseswhichprovidesanopportunitytostoredatainsuchformat. Themain advantageof this database in comparisonwith rivals – it is a native graphstorage. Three types of store enginesmake the system very flexible for different situations.Furthermore, it is anexcellent technology todeploy a graph-depended system like transportrouting, social network and other net-oriented cases with impressive performance and lowsystemrequirements.Firstly,itwasproductedin2011,soithasnothingincommonwithsomeoutdatedtechnologies,whichcouldbereasonforanycompatibilityandperformanceproblemswithmoreagedsolutions.Fortunately,thelicensingsystemprovidestheopportunitytouseafullversionofthedatabaseforfreewithoutanylimitations.
TheperformanceofArangoDBlooksreallyimpressivewhenitdealswithbasicoperationssuchassinglereadandwrite,whensearchoftheshortestroutesometimestakesalittlebitmoretimethanthesamefigureofitsrivals(Postgres,Neo4jandMongoDB).Butthemostimportantadvantage of this technology is opportunity to store data at the same time in graph and indocument store. It is really interesting feature, because it significantly reduces time of thedevelopmentforprojectswhichstoredataindifferentstorages.
Moreover,ArangoDBhaveapowerfulclusteringsystemthatmakesitreallyimpressiveincaseofuseasadistributedstoragewithdifferentrolesofparticipants.
After the research, some undocumented features and problemswere discovered (forexample,driverconnection,performance,etc.),whichwereresolvedbycontactingthesupportorlookingforthesameproblemonStackoverflowwebsite.
As a part of the research, the prototype of routing application of public transport inViennawasdeveloped.Itusessearchingopportunityofthedatabasetofindtheshortestwaybetweenthebus-/tram-stops/metro-stations.ThedeveloperofthesystemclaimedthatnativelanguageAQLisveryusefulincaseofinteractingwithgraphdata.Itwasproofedinthisresearch.
References1. https://www.arangodb.com/2. http://neo4j.com3. https://github.com/weinberger/nosql-tests4. http://db-engines.com5. http://stackoverflow.com6. https://github.com/jgraph/jgraphx7. http://open.wien.gv.at8. http://wienerlinien.at9. http://www.csvjson.com/csv2json