presentation for turkot v2 0 (dh)

15
Moscow PRESENTATION Addendum to the Grant Application from Innovation project: Cloud platform for development and procurement of semantic services (Semantic PaaS, SPaaS), making possible to extract and process text information using natural language. Company name: Avicomp Services, LLC

Upload: artiom-tsyganok

Post on 25-May-2015

822 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Presentation for turkot v2 0 (dh)

Moscow

PRESENTATIONAddendum to the Grant Application from

Innovation project: Cloud platform for development and procurement of semantic services (Semantic PaaS, SPaaS), making possible to extract and process text information using natural language.

Company name: Avicomp Services, LLC

Page 2: Presentation for turkot v2 0 (dh)

2

1. Innovation project’s resume ( called further, Project )

Current market issues that Project is suppose to address

The challenge therefore remains on how to create meaning to the content and how to link relevant content together

the amount of Web pages totals to more than 50 Million (Google)

Avalanche-like growth of the documents: in 2002 large enterprises used to process up to 18 000 documents per year, in 2003 that amount doubled, in 2004 large enterprises used to handle about 46 000 documents on average, in 2008 amount of corporate documents grew to 80 000, and in 2011 — exceeded 400 000 documents (Forrester Research )

As of today total number of internet users exceeded 2 bln., and there is an estimate that total amount of data is over 1 800 exabyte (1 exabute = 1018).

How does the Project solve the problem

Avicomp Services has been involved in semantic field for over 10 years. One of the key achievements of the company in this area has been development of powerful linguistic vehicle that is based on in-depth research in semantic area and allows automatically produce “semantic-aware & ready” content in the Internet and build new semantic services that in turn make non-structured information usage esay and flexible in the following ways:

Formation of set of services, when users can enrich meta-information (semantic data) of their documents, published thru the Web, or at the corporate archives. Extra meta-information? Attached to the document, allows improve search accuracy and quality, information categorization and combination.

Formation of set of services, when users can use extra meta-information to integrate with existing information while performing BI/OLAP analysis.

Formation of set of services when users can publish in semantic archive own sets of semantic data and get them linked with existing (e.g. Web) sets of Open Linked Data (LOD).

Formation of set of services to identify and link semantic data sets using different languages.

Formation of set of services when users can create their own applications using established archive of semantic data.

Mentioned above and other services will become available from the single software platform - Semantic PaaS (SPaaS), that is based on the technology with strong fundament of semantic and morphologic rules.

Today’s users are not capable to start analyzing non-structured information in the Internet , not to mention to take weighted decisions based on such analysis. User gets

swamped at the stage of information gathering

Page 3: Presentation for turkot v2 0 (dh)

3

2. The current market situation in search

The problem of any search systems

The Google and other search system based on keywords and matching concepts produce no results at all

Page 4: Presentation for turkot v2 0 (dh)

3. Target market

Landscape of Semantic Applications Market estimate (volume)

250

1. Today market is more then $100 bln 2. Impact of semantic technology - 20-80% less labour hours - 20-75% less operating cost - 30-60% less inventory level - 20-85% less development costSource: TopQuadrant

Page 5: Presentation for turkot v2 0 (dh)

4. Competition (Extract)

Analogues Stage(market /

development)

Price, $ Parameter 1(NLP)

Parameter 2(RDF Store)

Parameter 3(Apps/Service)

OntoText Production License model. Price from 50 K to 250 K€

Based on GATE OWL Store Search, Sort

OpenCalais Production Free and subscription (price not known)

Pure NLP Service. No store Limited set ofmash-up

GATE Research and API Service

Small subscription fee

NLP as open source or via API.

No store No services

Ontoprise Production License model and consulting service price starts at 100 K€

Only TextMining without Information Extraction

No RDF store. Only RDBMS for Indexes

Various specific Apps for ontology engineering and modelling

Comparative analysis

Analogues Functional Area Stage

PowerSet NLP Engine Bought by Microsoft

FAST Text Mining Bought by Microsoft

Freebase RDF Knowledge Base in the LOD Bought by Google

Page 6: Presentation for turkot v2 0 (dh)

6

5. Market segments where product is focused on

Potential Project product users (Russian market only as it will serve as a test-bed to fine-tune the business model)

Russian Accounting Chamber

Russian Ministry of Education

RIA News

Moscow City Government

Rusnano

President’s Administration

At the moment all these prospects have been engaged with the conversation about their needs in information handling and processing

Business model

1. B2B

• Goverment – Use SPaaS to build the Linked Open Data within Governments (licenses & deployment consulting)

• Large Enterprises – development of the instrument to extract knowledge (licenses & deployment consulting)

• Small business – instrument to produce semantic content (SaaS)

2. B2C

• To satisfy information search needs of individual users (including mobile applications)

Page 7: Presentation for turkot v2 0 (dh)

7

6. Technology of the Project –Semantic PaaS architecture

High level view of the SPaaS architecture integrating the ecosystem of

complementors and their customers

Application Services compromises modules to manage the RDF life cycle, various interfaces to search, retrieve and store data as well as core functions related to analytical functions (OLAP for RDF) and prediction modelling based on algorithmic game theory. Part of this stack will be also a set of core modules that will support demands from external applications.

Harvesting and Crawling with a heuristic approach that is able to integrate various sources (not only RSS Feeds) and a planarization method which automatically extracts the plain text from a Web page.

NLP Service that is based on a multi-agent and multilingual architecture allowing to scale. Further the service will incorporate an ontology rule based approach for information extraction (IE) enriched with statistical methods and a method that can use existing background knowledge for example in the Linked Open Data (LOD) cloud or inside Web pages (E.g. RDFa, schema.org or HTML5 metadata).

Knowledge Generation Process mainly for the handling of unique object identification and merging, ontology alignment, data authoring and interlinking.

Scalable RDF store for storing the extracted knowledge as semantic graphs using the latest technology and methods for handling RDF triples. The store will also include a plain SPARQL interface as well a layer for an intelligent and easy to use access (Data Access API). With the expected growth of digital data the RDF store architecture will also include other database storing mechanisms in order to solve the problem of “Big Data”.

Page 8: Presentation for turkot v2 0 (dh)

04/12/2023 8

7. Use Case – Linked Government Data (LGD)

Our SPaaS Offer for 5 star:• Pipeline/WF to

create RDF (LOD)• Government vocabulary

(Ontology)• Scalable RDF (LOD) store• UID or controlled named

entity name server

Later adapt LGD toLinked Enterprise Data

Enable Application and Eco-System for e-Citizen

Page 9: Presentation for turkot v2 0 (dh)

04/12/2023 9

8. Use Case – Online News

Our SPaaS for Online News:• Pipeline/WF for tagging

and NE extraction• RDFa/Microformat

injection to web pages• Scalable RDF store• Knowledge Engineering

CMS

CMS

External user/app

RE

ST

ful A

PI

(Se

ma

ntic

Pla

tfo

rm)

Topic & EntityExtraction (SPaaS)

Triple store for entities (SPaaS)

Learning corpora(topics)

OntoDix (SPaaS)

TopicsManagerTagging System API

Delivery Server(nodeJS, Fugue, SocketIO, RabbitMQ) + Routes DB HDB

(Mongo)HDB

Desktop(Sencha)

nginx+

Apache

sync

m

etad

ata

Architecture(simplified)

Page 10: Presentation for turkot v2 0 (dh)

Existing patents

Patent for an invention № 2242048 «Method of automated processing of text –based information materials». Owner «Ontos AG (Switzerland)».

Patent for an invention №2399959 «Method of automated processing of text using natural language by semantic indexing, method of processing of text collection using natural language by semantic indexing and machine-readable media». Owner «Ontos AG (Sw)».

Computer software certificate of registration №2006610704 «OntosMiner. Russian version». Owner «Avicomp Services»

Computer software certificate of registration №2008613021 «Ontos RDF Store Server. Russian version». Owner «Avicomp Services»

Computer software certificate of registration №2009611560 «Ontos SOA Server. Russian version». Owner «Avicomp Services»

Computer software certificate of registration №2009611559 «Ontos AS Processing Server. Russian version». Owner «Avicomp Services»

Computer software certificate of registration №2009611558 «Ontos AS Delivery Server. Russian version». Owner «Avicomp Services»

Computer software certificate of registration №2009611557 «Ontology Dictionary. Russian version». Owner «Avicomp Services»

12.04.2023 10

9. Intellectual property

Page 11: Presentation for turkot v2 0 (dh)

12.04.2023 11

10. Project’s Team (1)

Victor Klintsov Shareholder & General Director More than 20+ years of experience in IT industry Chief ideologist and chief architect Graduated in 1977г., from Moscow Chemical

Engineering Institute

Author of numerous papers

Director of Russian W3C office

Took part in the following projects: Public LOD resource in the field of science and technology, integrated into the international LOD space of knowledge, Analytical search and processing system of letters sent by citizens to the President of Russian Federation using semantic and linguistic methods of information extraction and etc.

Brief summary of key team members

Daniel Hladky COO – Chief Operation Officer

More than 20+ in the IT including SAP, iXOS (OpenText)

Responsible for regional development, marketing and sales and operations.

Holds a MBA from Strathclyde University.

Author of numerous papers , invited expert e.g. EU FP7, ISWC, Triplify-Challenge

Speaker at conferences such as SemTech, ESTC, I-Semantics

Dr Sören Auer CRO – Chief Research Officer

Researcher and Professor since 2003. Coordinatorof various EU FPx projects.

Responsible for research and innovation.

Studied Mathematics and Computer Science at University Dresden, Hagen and Yekaterinburg (Russia). PhD at University Leipzig.

Leader of the research group AKSW at University Leipzig.

Author of numerous papers , invited expert e.g. EU co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, WWW2012, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.

Page 12: Presentation for turkot v2 0 (dh)

12.04.2023 12

11. Project’s Team (2)

Brief summary of key team members

Grigory Drobyazko CTO – Chief Technology OfficerMore than 20+ in the IT including RDBMS and custom developmentResponsible for R&D including architecture design, UI design and software support. Co-author of scientific papers on solutions for semantic web and technologies of data extraction and information resources text analysis for analytical

processingTook part in the following projects: Public LOD resource in the field of science and technology, integrated into the international LOD space of knowledge, Analytical search and processing system of letters sent by citizens to the President of Russian Federation using semantic and linguistic methods of information extraction and etc.

• Analysts - 10 persons

• Linguists Developers - 9 persons

• Programmers-Developers - 15 persons

• Programmers-Developers of Linguistic Software - 7 persons

Page 13: Presentation for turkot v2 0 (dh)

12.04.2023 13

12. The current status

The key steps Non-stop platform development for more then 10 years Built initial platform «Alfa» of Semantic PaaS Current platform is based on the experience made with several

customer projects and with research projects (see the table below) Done of proof of concept of taggig, aggreg., news visualisation Experience from law enforcement, media and portals

Past and current financing Shareholders supported development Execution of research and development activities

Sales proceeds (R&D work)

2010 (fact) 2011 (fact) 2012-2013 (plan)

Total 46,1 mln RUB 46,6 mln. RUB. 60+ mln. RUB.

Minister of Education 21,7 mln. RUB 20+ mln. RUB.

RIA Novosti 46,1 mln. RUB 8,9 mln.RUB 40+ mln. RUB

Others 16,0

• Develop NLP module for media• Develop a portal• Research and create linguistic rule

• Develop a concept for IKB• Develop a concept for RDF storage

Page 14: Presentation for turkot v2 0 (dh)

12.04.2023 14

13. Project’s co-investor

Fund raising plan

Current phase fund raising

Co-investor 1

Ministry of Education of Russian Federation – up to 90 mln RUB

Co-investment – signed contract to perform R&D

Co-investor 2

VEB Innovation Fund – up to 90 mln. RUB

Co-investment – equity \ debt type of financing

Exit for VEB Innovation Fund – sale to the strategic investor or MBO at agreed rate

Follow on fund raising

Stage name Expected Grant financing

Expected investment from co-investor

Timing

Core platform development

90 mln RUB 90 mln. RUB. 2012-2013

Development of semantic services

20 mln. RUB 60 mln. RUB 2013-2014

Start selling platform and services

20 mln.RUB 2014-2015

Page 15: Presentation for turkot v2 0 (dh)

12.04.2023 15

14. Project development plan

2012 2013 2014 2015

Enhance the NLP system (WP1)

Large Scale Data Management (WP2) and the deployment of the solution to the cloud

Access to the system via SQL Lite and SPARQL

LyfeCycle Management of Data and Knowledge (WP4 and 5)

Enrichment (WP3)

Have use cases ready for eGov, Oil & Gas (WP6)

Performance optimization and scalability.

Work on Big Data analytics and Predictive Analysis (WP7)

Develop eCitizen Service Applications as showcases.

Cloud Platform optimization

NLP for Asian languages

180 mln RUB.

80 mln RUB.

20 mln RUB.