atomidb dr ashis banerjee reviews

AtomicDB

Review

Dr Ashis Banerjee

GE— Research Scientist for

Robotics and

Artificial intelligence

Figure 1

Data Ingestion

Data ingestion refers to the process of importing and processing raw data from different sources into a common location for future usage and storage. Datafi-cation, on the other hand, denotes the process of converting physical events, either in the forms of measurements (quantitative data) or qualitative models (analytical, computational, or empirical), to a common, computable represen-tation. Thus, data ingestion is usually the precursor to datafication. It has large-ly been done using relational and hierarchical databases or tuple stores con-sisting of linked 2-D tables or tuples with table entries or tuple entities repre-senting the data attributes of interest. The attributes can be automatically ex-tracted once they are defined by the domain experts.

This technique has been effective in managing homogeneous (identical modali-ty) and stationary data sets of similar spatio-temporal scales. However, it be-comes extremely inefficient and often infeasible for complex cyber-physical sys-tems (CPS), where the data sources are continuously operational, inherently heterogeneous, and involve multiple spatio-temporal scales with varying attrib-utes for each modality-scale combination. All of our use cases provide such challenges, necessitating the development of a novel data ingestion, and sub-sequently datafication paradigm.

The Human Brain

One such paradigm already exists in Nature in the form of human long-term memory that is able to process, encode, consolidate, retrieve, recollect, and forget information from the constant influx of data streams via the sensory or-gans. We draw inspiration from this paradigm to propose using an associative memory (AM) tool, developed by Atomic DB Corp, which can address all the four Vs of CPS big data to realize domain-agnostic data ingestion and datafica-tion.

We first replace the data items stored in the tables or tuples with elementary “atoms” of information that reside in a common n-dimensional (n ~ 1 billion) associative vector space of contextual associations among the data attributes. The information atoms are then stored in the physical memory using another organizational vector space with 2120 distinct item locations. In this way, the AtomicDBTM AM tool (Figure 1) provides a unified and compact representation that can accommodate any data type, dynamic or stationary, structured or un-structured, of arbitrary size, origin, and granularity.

Single Instance

The atomic pieces of information are represented as byte arrays of arbitrary siz-es where the maximum size limit is determined by policy or, in its absence, en-forced by the operating system constraints. The associations among the data items are naturally formed based on all the attributes such as the names, counts, hierarchical relationships, and both quantitative and qualitative proper-ties of the items. Of course, these attributes will vary widely among items rep-resenting fundamentally different physical entities such as battery cells and tur-bine blades, but will be typically identical, albeit with different values, for the same entity type. For example, two battery cells may have different voltage specifications but will have exactly the same set of properties like open circuit voltage, maximum current, heat dissipation rate, charge/discharge rate, dimen-sions, etc. Thus, various instances of identical entity types will lie in the same sub-spaces of the n-dimensional associative vector space, whereas instances of different entities will occupy different sub-spaces therein.

Multiple occurrences of the same entity instance are automatically unified and represented as the same atomic piece of information and the transactional in-tegrity of the relationships of each instance is maintained. The provision to add more attributes, and, thereby increase the dimensionality of the occupying sub-space is always available. Note that the sub-spaces for the different entities may overlap, indicating the presence of common attributes among the entities under consideration. This occurrence builds associative “bridges” at a contextu-al level enabling automated discovery and correlation. Furthermore, all the items of a particular entity type will be contextually connected to each other, and so are coerced to be physically co-located for access efficiency.

A cluster of related attributes effectively defines a sub-space. Items within a cluster with a high degree of similarity will typically have a high density of inter-relatedness. Items belonging to different clusters may also have some connec-tions between them, but those will be much sparser. These connections are al-ways bi-directional and dynamic enabling the additions of new associations, au-tomatically re-qualifying the sub-space as more data is ingested and models are dataficated.

(N) Dimentional

While commonly contextualized data entities will be physically co-located in the organizational vector space, a virtual pointer-like token, (the basic relation-ship construct representing a relationship in an associative dimension in the vector space) provides the means to “connect” anything existing in vector space to anything anywhere else in vector space. Each token uniquely identifies a particular atom of information, is the virtual location of the item in vector space, and is the logical address of where the item exists on the physical stor-age medium. This capability ameliorates the requirement for physical co-location to articulate sub-spaces and uses instead, “associative nearness” or “dimensional proximity”, allowing endless clustering possibilities of data ele-ment collections in an unlimited number of sub-spaces that are not restricted by the physical localizations of the items in the storage medium. This multi-faceted holographic-like synthesis is key to the complex mapping required in CPS systems as it allows for viewing of data from virtually any perspective with-out the need to additionally search and process the data.

Mapping of anything to anything in any of a billion associative dimensions is an inherent capability of AtomicDBTM. These dimensions have three distinct forms to discriminate functional perspectives and qualify what can be acted upon and by whom. Specifically, one form is used to model and coordinate record-oriented data sets and data streams, one to segregate the activities of specific users, applications, or CPS system components, and one to model any number of semantic namespaces and / or taxonomies and / or ontologies. This discrimi-nation is valuable when trying to model, map, and coordinate numerous differ-ent datasets and data sequences with distributed capabilities, APIs, and pro-cessing functions within a CPS system, where tabular and node-graph (tuple-based) systems are inherently inadequate. AtomicDBTM handles this easily.

Context

Thus, AtomicDBTM provides a natural and convenient way to encode contexts among seemingly disparate data sources and models while providing a com-mon framework to house and represent all forms of physical activities, events and entities, and simultaneously provide a means to transparently add meta-information to any single piece of data or data set. A key novelty of the tool is that one no longer needs to search for identifying any form of associations amongst the data items as all such associations are maintained as tokens, co-incident with the data items related. Thus, they can be obtained merely be ref-erencing the items of interest and using the aforementioned tokens found therein to directly (through a contextual mapping algorithm) index the refer-enced items in the storage medium. This novelty, coupled with other associa-tive memory functionality, provides the capability for real-time correlation of semantic information enabling one to build an adaptable and evolvable knowledge representation framework directly within the tool itself.

Contact Info

Jean Michel LeTennier [email protected]

John Carroll [email protected]

Dr Phil Templeton [email protected]

http://www.atomicdb.net

mailto:[email protected]



atomidb dr ashis banerjee reviews

Documents

data ingestion data

data attributes

data items

data sources

data type

raw data

novel data ingestion

domainagnostic data