big data: data analysis boot camp non-sql and r · intro. non-sql dbms hands-on q & a conclusion...

19
1/19 Intro. Non-SQL DBMS Hands-on Q&A Conclusion References Files Big Data: Data Analysis Boot Camp Non-SQL and R Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 31 March 2019 c Old Dominion University

Upload: others

Post on 16-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Big Data: Data Analysis Boot CampNon-SQL and R

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    31 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 201931 March 2019

    c©Old Dominion University

  • 2/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 Intro.

    2 Non-SQL DBMSClassic Non-SQL databases

    3 Hands-onAirport connections as agraph databaseSummary

    Strengths and weaknesses

    Applicabilities

    4 Q & A5 Conclusion6 References7 Files

    c©Old Dominion University

  • 3/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    What are we going to cover?

    1 Brief overview of differentNon-SQL technologies

    2 Revisit our airport service data

    3 Ask, and answer some questionsabout the airports

    c©Old Dominion University

  • 4/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Classic Non-SQL databases

    Words from the past.

    Bring up the attached PolyglotPersistence presentation.We’ll be looking at pages 1 – 22.

    Attached file.

    c©Old Dominion University

  • 5/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Classic Non-SQL databases

    Same image.

    Attached file.

    c©Old Dominion University

  • 6/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Classic Non-SQL databases

    Finding a “friend of a friend”

    A common question is: who is afriend of a friend?It comes up in all sorts ofrelationship type questions. Notonly interpersonal; but alsoorganizational, system analysis,law, etc.Easily answered in somelanguages, harder in others.

    Image from [1].

    c©Old Dominion University

  • 7/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Classic Non-SQL databases

    Same image.

    Image from [1].c©Old Dominion University

  • 8/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    Revisit our airport data.

    We’re going to look at theairport data in a different way.

    Airports become nodes (or vertices)

    Service become edges (or arcs)

    Load the attached file.chapter-06-nosql-R.R

    c©Old Dominion University

  • 9/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    Overview of the program

    1 By default the database is reset each time main() is executed

    main(resetDB = TRUE)

    ...

    if (resetDB == TRUE)

    {d

  • 10/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    A few database initialization details

    1 Need to ensure that Airport nodes are unique

    addConstraint(graph, "Airport", "name")

    2 Create the airport location file and load a subset into thedatabase

    createTextFile(airportLocationFile, airportLocationURL, overwrite=TRUE)

    temp

  • 11/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    A few database initialization details

    1 Resulting in:

    [1] "Creating airport info nodes -- Dumping the object: system.time(cypher(graph, command)) (of type: double, class: proc_time)"

    user system elapsed

    0.008 0.004 0.6752 The origin and destination data is loaded and cleaned

    unzip(flightDataZipFileName, files=flightDataFileName, exdir=tempDir)

    unzipFileName

  • 12/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    A few database initialization details

    1 Chunks of data are loaded

    for (i in 1:(length(chunks) - 1)) {write(x=c(’"src","dest"’, paste0(df$ORIGIN[chunks[i]:(chunks[i+1] - 1)], ",",

    df$DEST)[chunks[i]:(chunks[i+1] - 1)]), file=tempFile)

    command

  • 13/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Airport connections as a graph database

    Ways to modify the Airport program.

    A CYPHER1 statement or R2 can be used to query or modify thedatabase, and R can be used for the numeric heavy lifting.

    Use distance between airports as a metric to find the “diameter” of thegraph.

    Find the connectiveness (degree) distribution of the airports.

    Use an airport’s connections (degreeness) to identify the “mostimportant” airport (may not be the one with the highest degree).

    Find the path between “interesting” airports, and then remove an airportalong the path. Is there another path from the source to the destination?

    Update the missing location information.

    1https://neo4j.com/docs/developer-manual/current/cypher/

    2ls(“package:RNeo4j”)c©Old Dominion University

    https://neo4j.com/docs/developer-manual/current/cypher/

  • 14/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Summary

    Good and not so good

    Strengths:

    A graph database — typeless, schemaless,unstructured relationships

    Large capacity (˜34.4 billion nodes, andrelationships)

    ReSTful interfaces — means lots ofdifferent language support

    Weaknesses:

    Graph terminology is not consistent —node vs. vertex, arc vs. edge, etc.

    Sharding is not supported

    Licensing may be an issue for productionapplications

    c©Old Dominion University

  • 15/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Summary

    Good for, and not so good for

    Good fit;

    Anything that can be represented as a “socialgraph”

    Any “link rich” domain

    Routing, dispatch, and location based services(getting from A to B)

    Recommendation engines (“also bought”statements)

    Not so good fit:

    When updating “all” items in a DB (requirestotal graph traversal)

    c©Old Dominion University

  • 16/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Q & A time.

    Q: How many marketing peopledoes it take to change a lightbulb?A: I’ll have to get back to you onthat.

    c©Old Dominion University

  • 17/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    What have we covered?

    Talked about different types ofNo-SQL database technologies andwhat they are good for“Played” with the airport servicedata as a graph databaseAsked and answered somequestioned geared towards graphdatabase technology

    Next: Looking at crime data

    c©Old Dominion University

  • 18/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    References (1 of 1)

    [1] Marko A. Rodriguez, Problem-Solving using Graph Traversals,https://www.slideshare.net/slidarko/

    problemsolving-using-graph-traversals-searching-

    scoring-ranking-and-recommendation/88-Searching_

    Friends_SQLMySQL_vs_GremlinNeo4jWhat, 2010.

    c©Old Dominion University

    https://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation/88-Searching_Friends_SQLMySQL_vs_GremlinNeo4jWhathttps://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation/88-Searching_Friends_SQLMySQL_vs_GremlinNeo4jWhathttps://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation/88-Searching_Friends_SQLMySQL_vs_GremlinNeo4jWhathttps://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation/88-Searching_Friends_SQLMySQL_vs_GremlinNeo4jWhat

  • 19/19

    Intro. Non-SQL DBMS Hands-on Q & A Conclusion References Files

    Files of interest

    1 Neo4J Airport connection

    script

    2 R library script file

    3 Polyglot persistence (a

    PDF presentation)

    4 Making spinnable globes

    with airport data

    5 Code snippets

    c©Old Dominion University

    rm(list=ls())

    ## http://nick.readthedocs.io/en/latest/Big_Data/neo4j_examples/

    ## https://neo4j.com/docs/developer-manual/current/cypher/

    ## https://neo4j.com/docs/operations-manual/current/configuration/file-locations/

    options(java.parameters = "-Xmx8192m")

    library(RNeo4j)library(sp)library(rworldmap)library(rworldxtra)

    source("library.R")

    source("iataParsing.R")

    source("chapter-06-library.R")

    main

  • 1/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    CS-695NoSQL Database

    Polyglot Persistence; Or, The Many Ways WeStore Data

    Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge

    27 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 2015

  • 2/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Table of contents I

    1 A little history

    2 A change in the air

    3 Database layouts

    4 CRUDy stuff

    5 Databases that I/we use

    6 Conclusion

    7 References

  • 3/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Hammer and nails . . .

    “. . . it is tempting,if the only tool youhave is a hammer, totreat everything as if itwere a nail.”

    Abraham H. Maslow [8]

  • 4/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Miscellania

    Origin of “polyglot . . . ”

    Popularized by Neal Ford [4]:

    Talked about software development

    How things are evolving (SQL,XML, .NET, etc.)

    How multi-threading is hard(concurrency, coordination, etc.)

    Promoted the idea of enterprisedevelopment via Java and .NET

    Take away: choose the right tool for thejob.

    Different languages will continue to exist because each is good atsomething and all are necessary.

  • 5/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Miscellania

    The world BC (Before Codd).

    Databases existed before EdgarCodd.

    Hierarchical approach – aliveand well in our file system

    Network approach –currently underpinning ideasfor graph databases

    These suffered because peoplehad to know lots of details abouthow the database wasimplemented.

  • 6/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Miscellania

    The world after Codd.

    Separate representation fromimplementation

    Changes in database foroptimization needn’t affect dataqueries

    User interactions aren’t clutteredby “construction noise” (includingindexing and sorting)

    Codd’s relational data bank hides allimplementation information.

    Relational database management systems (RDBMS) hidinginformation about how data is stored. Data language isindependent of how data is stored [3].

  • 7/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Miscellania

    The world according to RDBMS.

    Everything is neat and tidy

    Everything can be defined ina set of tables that haverelationships between them

    If you make the databaselarge enough, you can storeanything and ask anyquestion

    Image from [10].

    RDBMS reigned supreme for 30 - 40 years (starting in 1970). Andthen reality and Big Data started to hit.

  • 8/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    How we turned and started to get to now.

    And then things started changing.

    Can’t point a finger at a specificincident, might be a critical mass.

    The Internet made it easier tocollect data.

    A new generation of peoplethought about things in a differentway.

    The new data had three attributes:velocity, volume, variety [7].

    New ways of looking at dataencouraged new questions.

    People wanted answers faster.

    Many of these items couldn’t be supported by a RDBMS.

  • 9/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Make things faster.

    Simple and complex ways

    How to get more processingpower to answer databasequestions?? Basically:

    Scale up – buy faster CPUand more RAM

    Scale out – buy more CPUsand get them to work inparallel

    Scaling up with custom CPUsgets expensive very, very quickly.

    Image from [9].

    Commodity CPUs are almost a dime a dozen. Leading to clusters,network services, distributed applications, etc.

  • 10/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Make things faster.

    Amdahl’s Law [1]

    Division and measurement of serial and parallel operations appearstime and again. (Shades of Mandelbrot.)

    “Make the common fast.”

    “Make the fast common.”

    Understand what parts haveto be done serially.

    Understand what parts canbe done in parallel.

    Need to factor in “overhead” costs when computing speed up.

  • 11/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Make things faster.

    Amdahl’s Law (A summary)

    Time for serial executiondef.== T (1)

    Portion that is NOT beparalyzable

    def.== B ∈ (0, 1]

    Number of parallel resourcesdef.== n

    T (n) = T (1) ∗ (B + 1n(1− B))

    Speed updef.== S(n)

    S(n) = T (1)T (n)

    = 1B+ 1

    n(1−B)

    Dr. Gene Amdahl (circa 1960)

  • 12/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    The questions changed.

    We knew that we didn’t know.

    Our questions and our data changed.RDBMS had limitations:

    Supported ad hoc questions onpredefined data

    Didn’t support undefined orunstructured data

    Could scale up not out, sodatabase size was practicallylimited

    SQL predicate calculus madelogic awkward

    RDBMS are very, very good at somethings, but user needs were changing.

  • 13/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    The questions changed.

    What happens when we ask a different question??

    When the RDBMS database was designed, wethought we knew what we wanted to know.That was then.

    Now if we want to look at familyrelationships (parent, child, sibling,extended family, etc.)

    We can add a column to the table forup/down relationships

    We can add a column for side to siderelationships

    We can add a column for extended familyrelationships

    The database doesn’t look like how we thinkabout the problem.

    When the data representation doesn’t match how we think, then something has

    to change.

  • 14/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    A RDBMS

    Can add well formed data easily

    Difficult to add new data fields ortypes

    Each row is expected to have thesame data

    Supports unknown (ad hoc) querieswell

    Scales up not out

    Popular RDBMS: Oracle, MySQL, MSSQL Server, PostgreSQL

    The “King of the World” for a very long time. (A version lives inyour phone.)

  • 15/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    A columnar database

    Takes the idea of a roworientated database and turns iton its side.

    Can add new columns easily

    Each row can have differentuse different columns

    Scales up and out

    Popular column orienteddatabases: IBM DB2, Sybase IQ,Teradata Image from [2].

  • 16/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    A Key-Value design

    A number (called the key) locates all otherdata (the value[s]).

    Use math on some data (may be morethan one piece)

    The math (hash function) returns onevalue (the key)

    Use the key to find the rest of the data

    Locating data can be fast

    Hash function should return unique values

    Popular Key-Value DBMS: Redis, Memcached,Amazon DynamoDB, Riak

    Key-value databases are fast when using the hash function. Not so fast if you

    aren’t.

  • 17/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    An Online Analytical Processing (OLAP) design

    A way to visualize and analyze data using a“data cube” and basic functions:

    Basic functions:

    1 Consolidation (roll-up) of themulti-dimensional data

    2 Drill-down into the data3 Slicing and dicing

    Fast execution time

    Incorporates aspects of navigational,hierarchical, and relational databases

    Popular OLAP databases: Hyperion Solutions,Cognos, MicroStrategy, Applix

    Image from [15].

    Target users are business analysts and business process management.

  • 18/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    A Graph design

    A very different way to think about data.

    Consists of two parts:

    1 Node (something that exists asan entity in the database)

    2 Arcs (something that describes arelationship between nodes)

    You can have nodes without arcs. Youcan not have arcs without nodes. Arcscan be unidirectional.

    Popular graph databases: Neo4j, OrientDB,Titan, Giraph

    Image from [6].

    Questions are driven by the relationships between nodes vice the nodes

    themselves.

  • 19/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A collection of different database layouts.

    A document design

    Document oriented databases can be “viewed,”and can have internal document databases(recursively).

    Database is organized based on “tags”

    Tag’s meaning is instance dependent

    Tags can be nested (recursively)

    Database structure maybe XML basedand represented in different ways

    Popular document databases: MongoDB,CouchDB, Couchbase, MarkLogic

    Sometimes document databases show up in unexpected places.

  • 20/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Which design to use?

    If I had a hammer, . . .

    Questions to ask:

    1 How much data will be in thedatabase??

    2 Will I be reading mostly??

    3 Will I be writing mostly??

    4 How accurate must the data be??

    5 How many simultaneous readersand writers??

    6 How robust/resilient must thedatabase be??

    7 How will the database beaccessed??

    8 What about ACID vs. BASE??

    So many choices.

  • 21/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Which design to use?

    ACID vs. BASE

    One is a design principle, the other is counter marketing.

    ACID [5]1 A – Atomicity - all or nothing2 C – Consistency - database is always valid3 I – Isolation - concurrent equal serial ops.4 D – Durable - the database is written to disk

    A database action will completecompletely.

    BASE [12]1 BA – Basically Available2 S – Soft state - user guarantees consistency3 E – Eventually consistent

    A database action will probably completeeventually.

    ACID comes with SQL. BASE comes with NoSQL.

  • 22/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Which design to use?

    Consistency, Availability, Partition tolerance (CAP)Theorem

    Sharing data in distributed systems ishard.

    Data can be consistent across thesystem

    Data can be available across thesystem

    The system can continue tofunction if partitioned/split

    You only get to choose two.

    Image from [17].

    RDBMS on a single machine means partition is undefined. Distributed systems

    only get two.

  • 23/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Create — darkness was on the face of the deep.

    Ex nihilo nihil fit (out of nothing, nothing comes).

    The CRUD approach doesn’t say what happened before the C.

    RDBMS CREATE DATABASE db name;

    CREATE TABLE table name (column name1 data type(size),column name2 data type(size), . . . );

    Columnar

    CREATE DATABASE

    CREATE table name, column name1,column name2, ...;

    Key-Value, Graph, Document

    CREATE DATABASE

    CREATE table name

    Graph, Document

    CREATE DATABASE

    Image from [11].

    Implementation agnostic.

  • 24/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Create — darkness was on the face of the deep.

    Create an entry

    RDBMSINSERT INTO table name VALUES (value1,value2,value3,...);

    ColumnarPUT table name, row name, column name1:, “value”;

    Key-ValueADD table name, key value, value;

    GraphCREATE relationship name, vertex name1, vertex name2

    DocumentINSERT table name (GML/XML/JSON “marked up” data)

  • 25/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Report — databases aren’t much good if you can’t get stuff out.

    Report/Retrieve data an entry from the database

    RDBMSSELECT column name,column name FROM table name;

    ColumnarGET table name, row name1:, column name:;

    Key-ValueGET table name, key value;

    Graph (pipe operations)GET VERTEX|EDGE FILTER(expression) (. . . )

    DocumentFIND document id

  • 26/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Update — things change.

    Update an entry

    RDBMS

    UPDATE table name SET column1=value1,column2=value2,... WHEREsome column=some value;

    Columnar

    DELETE FROM table name WHERE [expression];

    PUT table name, row name, column name1:, “value”;

    Key-Value

    SET table name, key value, value;

    Graph

    GET VERTEX | EDGE FILTER(expression) (. . . ) REMOVE propertyADD property

    Document

    UPDATE document id value (same format as CREATE)

  • 27/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Delete — to remove that which once was.

    Delete an entry

    RDBMSDELETE FROM table name WHEREsome column=some value;

    ColumnarDELETE FROM table name WHERE [expression];

    Key-ValueDROP table name, key value;

    GraphGET VERTEX|EDGE FILTER(expression) (. . . ) REMOVE

    DocumentREMOVE document id value

  • 28/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Lots and they are hidden.

    Shopping as an example

    Firefox – SQLite for browserhistory

    Shopping cart – Key-Valuebased on session ID

    Recommended purchases –graph database

    Credit card payment – SQLdatabase

    Excel record purchase –document

    Save Excel file – hierarchicaldatabase

  • 29/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A continuum.

    Things from a 50,000 foot perspective

    Messy Neat andtidy

    Rigid

    Ad-hoc

    Data

    Queries

    Free textK-V

    Doc.

    OLAP

    Col.

    RDBMS

  • 30/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    A continuum.

    Notional strengths and weaknesses

    Database type

    RDBMS K-V Col. Doc. Graph

    ACIDBASE

    Ad-hoc queries∆ Hardware

    Hardware failure

    SupportedNot supported by data model

    No statement

  • 31/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Where can I get these things??

    Popular open source databases

    RDBMS – MySQL,PostrgreSQL, SQLite

    Key-Value – Redis,Memcached, Riak

    Columnar – HBase,Accumulo, Hypertable

    Document – MongoDB,CouchDB, Couchbase

    Graph – Neo4j, OrientDB,Titan Image from [16].

    Open source does not mean free; your time costs money.

  • 32/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    In summary . . .

    What can we say??

    1 Each type of databasedesign fills a specificneed/niche.

    2 Each type could do the workof the others

    1 Each type has a datamodel tailored to itsproblem domain

    2 Performance is tied to thehardware (CPU and I/O)

    RDBMS has been the King for a long time. Expect it to remain sodue to inertia.

  • 33/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    NoSQL Distilled: A Brief Guide to the Emerging Worldof Polyglot Persistence

    by Sadalage and Fowler [14].

    Book to be used and refered toduring the course, ISBN9780321826626.

  • 34/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    Seven Databases in Seven Weeks: A Guide to ModernDatabases and the NoSQL Movement

    by Redmon and Wilson [13].

    A very nice and graspable tour ofvarious NoSQL database types.Examples of each type ispresented with exercises that canbe completed in a weekend.Book to be used and refered toduring the course, ISBn9781934356920.

  • 35/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    References I

    [1] Gene M Amdahl, Validity of the single processor approach to achievinglarge scale computing capabilities, Proceedings of the Spring JointComputer Conference, ACM, 1967, pp. 483–485.

    [2] Dale Anderson, Column oriented database technologies,http://www.dbbest.com/blog/column-oriented-database-technologies/,2012.

    [3] Edgar F. Codd, A relational model of data for large shared data banks,Communications of the ACM 13 (1970), no. 6, 377–387.

    [4] Neal Ford, Polyglot programming,http://memeagora.blogspot.com/2006/12/polyglot-programming.html,2006.

    [5] Jim Gray, The transaction concept: Virtues and limitations, Very LargeDatabases, vol. 81, 1981, pp. 144–154.

    http://www.dbbest.com/blog/column-oriented-database-technologies/

    http://memeagora.blogspot.com/2006/12/polyglot-programming.html

  • 36/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    References II

    [6] Andy Hogg, Whiteboard it the power of graph databases,http://www.computerweekly.com/feature/Whiteboard-it-the-power-of-graph-

    2013.

    [7] Doug Laney, 3d data management: Controlling data volume, velocity andvariety, META Group Research Note 6 (2001).

    [8] Abraham H. Maslow, The psychology of science, Henry Regency, 1966.

    [9] Andrea Mauro, Storage scale-up vs. scale-out,http://vinfrastructure.it/2014/06/scale-out-vs-scale-in/,2014.

    [10] David Mertz, Xml matters: Putting xml in context with hierarchical,relational, and object-oriented models,http://www.ibm.com/developerworks/library/x-matters8/, 2001.

    http://www.computerweekly.com/feature/Whiteboard-it-the-power-of-graph-databases

    http://vinfrastructure.it/2014/06/scale-out-vs-scale-in/

    http://www.ibm.com/developerworks/library/x-matters8/

  • 37/37

    A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References

    References III

    [11] Brian Panulla, If libraries were like relational databases,http://ghostednotes.com/2010/12/31/if-libraries-were-like-relational-

    2010.

    [12] Dan Pritchett, Base: An acid alternative, Queue 6 (2008), no. 3, 48–55.

    [13] Eric Redmond and Jim R Wilson, Seven databases in seven weeks,Pragmatic Bookshelf, 2012.

    [14] Pramod J Sadalage and Martin Fowler, Nosql distilled, PearsonEducation, 2012.

    [15] DatabaseJournal Staff, Examples of sql server implementations, DatabaseJournal (2010).

    [16] Wikipedia Staff, Database,https://en.wikipedia.org/wiki/Database, 2015.

    [17] Saeid Zebardast, Said experts, http://blog.zebardast.ir/, 2015.

    http://ghostednotes.com/2010/12/31/if-libraries-were-like-relational-databases

    https://en.wikipedia.org/wiki/Database

    http://blog.zebardast.ir/

    A little history

    Miscellania

    A change in the air

    How we turned and started to get to now.

    Make things faster.

    The questions changed.

    Database layouts

    A collection of different database layouts.

    Which design to use?

    CRUDy stuff

    Create — darkness was on the face of the deep.

    Report — databases aren't much good if you can't get stuff out.

    Update — things change.

    Delete — to remove that which once was.

    Databases that I/we use

    Lots and they are hidden.

    A continuum.

    Where can I get these things??

    Conclusion

    References

    ''Chuck Cartledge''

    ## https://www.r-bloggers.com/how-to-draw-connecting-routes-on-map-with-r-and-great-circles/

    rm(list=ls())

    library(tidyverse)library(maps)library(geosphere)library(rgl)library(png)

    library(RNeo4j)library(igraph)

    source("library.R")source("chapter-06-library.R")

    plot_my_connection =0), ...) lines(subset(inter, lon