challenges in closing information and records management capability gaps in share point
DESCRIPTION
TRANSCRIPT
Challenges in Closing Information & Records Management
Capability Gaps
• Welcome and Introductions• Dave Sanchez of Concept Searching• Juan Celaya of COMPU-DATA• Case Study• Questions and Wrap Up
Company founded in 2002 Product launched in 2003 Focus on management of structured and unstructured
information Privately held and profitable – no funding Growth rate of 35% in 2008 and in excess of 100% for 2009 Founders and management team with company since inception
Technology Automatic concept identification, content tagging, auto-
classification, taxonomy management Only statistical vendor that can extract conceptual metadata
2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)
KMWorld ‘Trend Setting Product’ of 2009
Locations: US, UK, & South Africa
Client base: Fortune 500/1000 organizations
Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management
Microsoft Enterprise Search ISV , FAST Partner
Concept Searching • Don Miller • (408) 828-3400 • [email protected]
Concept Searching, Inc.
David Sanchez * [email protected] * 1 (713) 893-1743
Information & Records Management Capability Gaps that Increase Costs
David Sanchez * [email protected] * 1 (713) 893-1743
Lack of Information Transparency: e-Discovery and FOIA Government and Private Sector directives to tag content for retrieval Untagged Data Assets = Untapped Resources Time Gap between Information Requests and Discovery is Directly Proportional
to Volume of Data Assets
Non-Compliance with Records Management Policies Sarbanes-Oxley and Government RM Retention Schedules Record Declaration process is manual Data Stored in Wrong Location & Information not Preserved in Accordance with
Regulatory Guidelines
Increasing Volume of Unplanned Data Exposure Events Privacy Act Program (PII), Protected Health Information (PHI), HIPAA, Payment
Card Industry (PCI), etc… Organizational Confidential and Sensitive Information
Problems
Data Privacy and Security Exposure Events
By Sector
Business48%
Education21%
Government19%
Medicine12%
Source: Open Security Foundation
David Sanchez * [email protected] * 1 (713) 893-1743
Data Privacy and Security Exposure Events
By Type
DISPOSAL6% Email
4%
FRAUD8%
HACK16%
Lost/Stolen Com-puters and Docu-
ments45%
UN-KNOW
N3%
SnailMail4%
Virus1%
Web13%
Source: Open Security Foundation
David Sanchez * [email protected] * 1 (713) 893-1743
Data Privacy and Security Exposure Events
Government
DISPOSAL6%
Email4%
FRAUD6%
HACK8%
Lost/Stolen Com-puters and Docu-
ments49%
UN-KNOWN
4%
SnailMail7%
Virus0% Web
16%
Source: Open Security Foundation
David Sanchez * [email protected] * 1 (713) 893-1743
Why is this Difficult?
Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems
Metadata
Tagging
Records Retention Code
Access Rights
Document Library 1 Document Library 2
Document Library 3 Document Library 4
Server Content with Appropriate Metadata, Retention Codes, and Rights Management
Templates
Human Factors
David Sanchez * [email protected] * 1 (713) 893-1743
Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems
Limiting Factor = Human Behavior
Metadata
Tagging
Records Retention Code
Access Rights
Document Library 1 Document Library 2
Document Library 3 Document Library 4
Server Content with Appropriate Metadata, Retention Codes, and Rights Management
Templates
Why is this Difficult?
Human Factors
David Sanchez * [email protected] * 1 (713) 893-1743
How do Organization’s Typically Address These Capability Gaps
Customize system interface to force manual application of metadata Pros: data assets now have metadata Cons: high customization costs, increase in end-user labor costs, less end-user
productivity, non-standardized application of metadata across enterprise
Hire temporary staff to add metadata to data assets Pros: data assets now have metadata Cons: temporary staff = $$$$$ and results in non-standardized tagging
Acknowledge that it is a problem and do nothing
Alternatives
David Sanchez * [email protected] * 1 (713) 893-1743
Solution: conceptClassifier for SharePoint
Records Retention
Code Tagging
Automatic Content
Type Updating
Document Library 1
Document Library 2
Document Library 3
Document Library 4
Concept Classifier
for SharePoint
SharePoint Security
Services & Windows Rights
Management
Appropriate Storage
& Preservati
on
Increase Information
Retrieval Precision
for e-Discovery
Semantic Metadata Tagging
Concept Searching: Addressing the Technology Gap not the Behavior
David Sanchez * [email protected] * 1 (713) 893-1743
Taxonomy Management & Automatic Metadata Tagging in SharePoint
e-Discovery & FOIA (moss.conceptsearching.com) Auto-classification to multiple vocabularies Faceted Searching Taxonomy Browsing
Records Management Aligning Vocabulary to Records Retention Codes Record Declaration Process – tagging documents with retention codes
Information Management – Data Privacy & Security Compliance PII, PHI, and PCI tagging Sensitive content (FOUO, Secret, Internal Use Only – contracts, labor rates,
etc…)
Live Demonstration
David Sanchez * [email protected] * 1 (713) 893-1743
We Make Metadata Work For You
Automatic Conceptual Metadata Generation
Automated Classification
Taxonomy Development & Management • Proven to reduce taxonomy development by 80%
Microsoft Integration• Runs natively in SharePoint 2007 and SharePoint 2010, Microsoft Office
Applications, SharePoint Search and FAST, Windows Server 2008 R2 FCI• Fully integrated with SharePoint Content Types
Content Type Updater• Automatically changes the Content Type based on presence of
organizationally defined metadata found within the document• Identification of confidential/privacy data• Ability to identify records based on the records retention schedule and
route to the records center Technology
• Downloadable in 30 minutes – no programming required• Fully SOA compliant, delivered as Web Parts, based on open standards
• Highly scalable
conceptClassifier
David Sanchez * [email protected] * 1 (713) 893-1743
Concept Classifier for SharePoint
David Sanchez * [email protected] * 1 (713) 893-1743
Closing Information & Records Management Capability Gaps
Uses Taxonomy Manager to create and manage organizational taxonomies, ontologies, and metadata environment;
Employs conceptClassifier for SharePoint as an Automated Metadata Population Service;
Applies content types base on metadata;
Uses content types derived from metadata to drive individual and group access to data assets using inherent SharePoint Security;
Uses content types derived from metadata to drive migration of data assets to proper document libraries where RMS templates are automatically applied to restrict data asset usage.
Leveraging Metadata as an Enabling Asset
David Sanchez * [email protected] * 1 (713) 893-1743
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLC
Juan J. Celaya
President/CEOSenior Business & IT Consultant
Office: 281.292.1333
www.cdlac.com
blog.cdlac.com
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCCompany Overview
Who are we?CDI is a successful information management integrator based in Spring, Texas (North of Houston) with offices in Miami, FL and Stafford, VA. We have been in business for over 22 years with 18 of those focused in Content and Data Integration (CADI™), enterprise search, classification, capture and data management. We are a small business and designated as a certified Texas HUB contractor.
What do we do?Integration, software development and reseller of best-of-breed products for ECM solutions focused inSearch, Automatic Classification, Capture and Business Automation (Workflows). We work with Government and private industry customers in delivering successful departmental and enterprise solutions.
Who do we serve?Medium to large organizations in government, health care, manufacturing and oil industries.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLC
During this Presentation we will:
For the case study:1. Summarize the issues facing U.S. Army researchers and records managers.2. Describe our approach in resolving those issues within the constraints of a
DoD environment and discuss the software tools that comprise the solutions.3. Discuss the challenges in identifying and managing millions of documents.4. Review how automatic classification and meta data tagging enhances search
in this environment.5. Address business outcomes and benefits in automating processes.
For conceptClassifier:6. Describe how the concept Classifier is being applied as part of the JSRRC
project.7. Present Concept Searching’s technologies also working outside of the
SharePoint® environment.
Presentation Overview
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLC
Army Records ManagementProvide oversight and program management for the Army's Records Management Program.
Establish programs for records collection and preservation from garrison, training, contingency, and war time operations.
Operate and sustain the Army Electronic Archive and provides the means to identify, collect, index and retrieve important Army records, in hard copy and electronic media.
Management Information Control (AR 335-15)
U.S. Army ChallengesRecords Management
Records ScheduleHundreds of Record Series with around 4,000 individual record instructions.
End users faced with myriad choices when categorizing records.Results in improper classification.
Neglect to use schedule at all.
Affects retention durations.
Reduces impetus to retain record materials.
Reduced consistency in tagging records to schedules.
New rules and procedural training.
Hundreds of locations and data environments.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLC U.S. Army ChallengesJSRRC
U.S. Army ChallengesJSRRC
Joint Service Records Research Center (JSRRC)Validates veteran’s war-related claims for the Veterans Administration.
Primarily on Post-Traumatic Stress Disorder (PTSD), but also Agent Orange exposure and others.
Reviews cases for ALL services from WWII to present day.
Required research, among others, of DoD field documents that relate to the specific individual and event.
Literally tens-of-millions of documents.
No categorization or indexing of documents.
Plethora of data sources.
Today – millions of electronic files in multiple formats are being generated daily.
Usefulness of data – Not determined.
Manual identification – Not feasible.
For JSRRC – Finding a needle in the hay stack!
Goal – Standardize and consolidate field & internally generated data providing a common research interface.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCCDI’s Solution PhilosophyCDI’s Solution Philosophy
For JSRRCDevelop ability to integrate documents and data from myriad disparate sources utilizing CADI™ framework.
Utilize conceptClassifier to classify Army documents into discrete, searchable segments.
Leverage the classification implementation to enhance search allowing for better results for the end users.
Implement the infrastructure that can be leveraged to move forward with Records Management, FOIA and Declassification organizations at RMDA.
For Records ManagementCombination of Army process changes and implementation of technology tools.
Streamline Records into fewer functional series.
End user has minimal or no role in categorizing record.
Utilize Army’s ARIMS & SharePoint to attribute initial metadata.
Utilize conceptClassifier & conceptTaxonomyManager to correctly identify appropriate disposition based on content and metadata.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCPrimary Solution ComponentsPrimary Solution Components
Base Infrastructure:conceptClassifierconceptTaxonomyManagerconceptSearch
Application Infrastructure:DigitalAsset Finder™
Professional Services for integration and implementation of solution.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCJSRRC SolutionJSRRC Solution
Consolidate existing data sources:Access databasesApplicationsNetwork shared drives
Prepare for future data sources:Identify possible originsVolume and formatsNo standards in data deliverySupport special security needsStored in different locations
Identify types of metadata & documents that:Must be standardizedDerive concepts & contentUsed to identify data to information relationshipsCreate taxonomies
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCJSRRC SolutionJSRRC Solution
Infrastructure built to support initial two environments:Environment #1
Windows based 3-server group.
DigitalAsset Finder™, conceptSearch with Distributed Query Server, conceptClassifier & conceptTaxonomyManager
Initial configuration for support of 200 terabytes of index-able data.
Microsoft Office & other text based files.
PDFs and searchable PDFs.
Image files (Tiff, JPG and others).
Environment #2
Windows based server
DigitalAsset Finder™, conceptSearch, conceptClassifier & conceptTaxonomyManager.
Currently supporting over 5 million records and growingMicrosoft Office & other text based files.
PDFs and searchable PDFs.
Image files (Tiff, JPG and others).
Structured data with no file reference.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCJSRRC SolutionJSRRC Solution
Creation of taxonomies used to:Enhance Search
Categorize or Identify documents
Some of the taxonomies created include:Unit Names
Dates
Document Types (Names & Content)
Locations
Results include:Consolidation of information into distinct groups allowing a focused approach to the required research.
Controlled vocabulary that can be applied to the data sets as requirements evolve.
Access to information that previously was impossible to reach due to the resource requirements needed to collate the raw data.
Collaboration among researchers increase as they share information by contributing their knowledge to existing data for future reference and retrieval.
Preserving the Worlds Knowledge - Available Anytime AnywhereSM
©2010 COMPU-DATA International, LLC, All Rights Reserved
COMPU-DATA International, LLCData Process PipelineData Process Pipeline
File FilterProcess
File type
File size
Folder name
Archive content
processing.
FileSynchronizer
PDFGeneration
conceptClassifierconceptSearch
Classification db
Search Indexes
DigitalAsset Finder™
Classification Metadata Assigned Search Indexes Independent of the data location