text mining: opportunities and barriers john mcnaught deputy director national centre for text...
TRANSCRIPT
![Page 1: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/1.jpg)
Text Mining: Opportunities and Barriers
John McNaught
Deputy Director
National Centre for Text [email protected]
![Page 2: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/2.jpg)
Topics
• What is text mining? (briefly)• What can it offer? (selectively)• What are the obstacles? (mostly)
![Page 3: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/3.jpg)
NaCTeM
• First publicly-funded (JISC) national text mining centre in the world
• Remit: provide services to research community
• Initial focus on biology, then social sciences, medicine, chemistry, …
• Processing on a large scale, e.g. for UKPMC (Wellcome T.+17 other funders)
• www.nactem.ac.uk
![Page 4: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/4.jpg)
What is text mining?
• Goal: Discover new knowledge from old• How:
– Process very large amounts of text• Millions of documents, the more the better
– Identify and extract information– (Link extracted information to already curated
knowledge)– Mine to discover implicit significant associations– Flag (unknown) associations for researcher to
investigate further– Spin-off on the way: render information explicit
![Page 5: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/5.jpg)
From text to new knowledge
![Page 6: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/6.jpg)
What does it offer?• Finds unsuspected knowledge
– E.g. Disease-gene associations
• Enables discoveries human effort could not achieve (information overload/overlook)
• Enables better search/navigation of literature– Semantic search via extracted semantic metadata
• Reduces time spent searching– 15-48% of researcher time spent on classic
search, 20-50% of classic searches unsatisfied
• E.g. Systematic reviews: months to weeks
![Page 7: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/7.jpg)
What does it offer?
• Text mining boosts research– Makes research possible that would otherwise
be impossible or unfeasible
• Research drives growth and innovation• Research produces more information• More information is available for text
mining• Text mining boosts research …
![Page 8: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/8.jpg)
Barriers
• Access to the literature• Format issues (tied to next point…)
– “PDF is evil” (Lynch)
• Main blocks: copyright and licensing issues– <8% of scientific claims found in full article
appear in its abstract (Blake)– Abstracts deficient on argumentation,
discussion, methods, background, …– Full texts needed to realise full benefits of TM
![Page 9: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/9.jpg)
Barriers• Need to copy documents to analyse them• Licences typically not favourable to TM• Licences established on per institution basis
– Prevents community-oriented services• Results only for internal use by institutional users
– Hinders mining over collections of content from different providers
• Inconsistency: human can search and manually analyse, but cannot use machine to do same job on same data already subscribed to
![Page 10: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/10.jpg)
Barriers
• Problem even with liberal OA licences– Author attribution required
• Author attribution in a data mining environment is impossible/unfeasible– Association finding: cannot track positive, negative,
neutral individual author contributions
• Derived works in a TM environment– Every author of every text processed to produce
new derived knowledge may have a claim…– Rights clearance thus an effective barrier
![Page 11: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/11.jpg)
Barriers
• Laudable effort 1: NESLi2 model licence (JISC Collections) allows TM– Publisher <> single institution– But how many publishers retain TM provisions?– But cannot display annotations produced by TM on
document itself
• Laudable effort 2: NPG licence for self-archived content allows TM– But “content must be destroyed when experiment
complete” is vague. So services for community?
![Page 12: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/12.jpg)
Conclusion
• Copyright and licensing restrictions block full realisation of TM benefits– Economic savings and potential for growth are
stifled
• Japan has introduced an information analysis exception to copyright law– National Diet Library (= British Library) has
recently changed its motto to:
“Through knowledge we prosper”– Can we say the same in the UK?
![Page 13: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/13.jpg)
Extras
![Page 14: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/14.jpg)
Info=degree of surprise
Finding unknown associations: reproducing a discovery reported 5 days ago in Nature Medicine
![Page 15: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/15.jpg)
UKPMC EvidenceFinder by NaCTeM: Questions generated by deep analysis, with known answers
![Page 16: Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining John.McNaught@manchester.ac.uk](https://reader035.vdocuments.mx/reader035/viewer/2022062518/56649e635503460f94b5fa6f/html5/thumbnails/16.jpg)
Click on a question to see relevant extracted evidence(from OA subset of the archive)