linked inventor biography data - a new method of …...how to kill inventors: testing the...
Post on 13-Jul-2020
4 Views
Preview:
TRANSCRIPT
Max Planck Institute for Innovation and Competition | Munich
Linked Inventor Biography Data -A new method of inventor disambiguation
ORCID-OECD-Crossref Workshop on Identifiers and Intellectual Property
Paris, June 22nd 2017
Matthias Dorner Max-Planck Institute for Innovation and Competition (MPI-IC)
andInstitute for Employment Research (IAB)
2Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
§ Patent data are the main data source for economic analyses of innovation and IP (Griliches 1990)
§ No unique inventor id across patents è “who-is-who” problem
§ Quality of disambiguation is crucial for quality of any research and policy advice§ Naive name disambiguations yield overinflated patent portfolios due to common names
§ More adequate disambiguation approaches consider additional features
1. “Internal”- Only (internal) information from patent register data
- Deterministic/ probalisitic assignment rules to group patent-inventor records into unique persons(see e.g., Trajtenberg et al. 2006, Raffo & Lhuillery 2009, Li et al. 2014, Pezzoni et al. 2014, Ventura et al. 2015, Morrison et al. 2017)
2. “External”
Motivation
3Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
§ Motivation (see Dorner et al. 2014)• Big data approach: use structure of external data and record linkage for disambiguation
• Merge additional variables from external data (è research)
§ Requirements• Biographical data with unique person id, name and address information
• Data must be regularly updated and include time stamps
• Data availability (open vs. confidential data) and coverage (subpopulation vs. global)
Ø Administrative labor market data collected within the German social security system
§ Two step approach• (1) Identification: Identify subset of inventors in the external data
• (2) Grouping: Completion of inventor biographies for the matched inventors
“External” inventor disambiguation
4Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Patstat: Inventors (ambiguous)i) listed on EP Patents 1999-2011 ii) residential address in Germany
IAB: Full population of employees in Germany (subject to social security 1999-2011)
Disambiguated inventors/employees (I)COMPLETE employment biography data (1980-2015), INCOMPLETE patent biography data (only 1999-2011)
Patstat: Full population of patent-inventor-assignee records in Germany
(1980-2015)
Disambiguated inventors/employees (II)COMPLETE inventor biography data covering patents and
employment 1980-2015
(2) Grouping step
(1) Identification step
5Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Data sources
+more detailed address information frominventor designation files ofEPpatents
Variables§ Social security ID(=persistentidentifer)
§ Lastname§ Firstname§ Residental address§ Firmaddress§ Yearofemployment§ OthersEducation,employment status,occupation,wage,establishment (id,size,industry,...)
+more detailed addressinformation from assignee file of
EPpatents
1 2 3 4 5 6 7 8 9P u t z ke rR e n eH a u p t s t r a ß e 1 2
4 2 5 7 9D E
0 1 0 1 2 0 1 0 3 1 1 2 2 0 1 0 6 6 6 6 6 6
1 2 3 4 5 6 7 8 9
H e i l i g e n h a u s
(A) (B)External data
6Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Identification step
A- Inventor-patentsrecords 1999-2011
FirstnameLastnameStreet
HousenumberCity
ZipCodeYear(patentapplication)
B- Employees 1999-2011
FirstnameLastnameStreet
HousenumberCity
ZipCodeYear(employment episode)
Pairwiserecord linkageProbabilistic string
matching,blocking by year
7Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
1975 1999 2011…
2015
E1
E3
E4
E4
E4
E5E5
E5
P1 P2 P3 P4 P5
E4
E4E4
E5
E3
1994…
Identifcation step
Legend: Employment spellof inventor I in establishment E
8Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Grouping step (1)
Pairwise recordlinkage
Probabilistic stringmatching,blocking by
year
BInventor-patentrecordsprior to 1999and after2011
FirstnameLastname
Application yearMunicipality code workplace(=geocoded assignee address)Municipality code residence(=geocoded inventor address)Technologyarea ofpatent
(Schmoch 2008)
A2Biography after2011FirstnameLastname
YearofemploymentMunicipality code workplace (IEB)Municipality code residence (IEB)
Industry code (current establishment)
A1Biography prior to 1999FirstnameLastname
YearofemploymentMunicipality code workplace (IEB)
Industry code (current establishment)
9Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
§ Classification features• Name similarity score (previous slide)
• Geographical proximity (residence/workplace) increases likelihood of belonging to the same entityè municipality distance matrices
• Technology profile of industries è industry-technology concordances (Dorner & Harhoff 2017)
• Common names (based on administrative labor market data of IAB)
• Additional features to be considered in the future: e.g., shared co-inventors/co-assignees
§ Supervised classification• Extract and label training data
• Classify full set of records based on labeled training data (currently: logistic regression model)
Ø Out-of-sample prediction of record pairs based on previously labeled data, disregard false positive records
Ø Result: Complete inventor biography (= list of patents per disambiguated inventor)
Grouping step (2)
10Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Supervised classification – A training example
KIRSTENKUHLE
KRISTINKUHLE
11Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
P6 P71975 1999 2011…
2015P1 P2 P3 P4 P5
E4
E4E4
E5
E3
1994…
E1
E3
E4
E4
E4
E5E5
E5
Identifcation step Grouping step
12Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
Biographical information(è similar to IAB but with lower timer resolution;
international coverage is an advantage)
Unique person id and name(s)(è generate mutiple accounts per person)
CVs recorded in social/ career networks(è use APIs, SiSOB crawling (Geuna et al. 2015)
Discussion: Transferability to ORCID
Further information on emplyoment context(è use concordances to match keywords with
scientific fields or industries)
13Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
§ Disambiguation is essential for micro economic research focussing on inventors• Internal vs. external disambiguation
• External approach identifies unique inventors based on structured external data
Ø Approach could be adapted to ORCID id or other persistent identifiers
§ Limitations• Subject to availability and quality/precision of external data
• Only feasible for subpopulation of disambiguated inventors (vs. global scale)
§ Further steps in our project• Finalize data generation and quality assessment of disambiguation
• Release anonymized research data set via IAB-FDZ and update data at a regular basis
• Research: topics include e.g., labor mobility of inventors and productivity, knowledge spillovers in inventorteams and within firms.
Summary and outlook
14Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, MunichMax Planck Institute for Innovation and Competition, Munich
§ Dorner, M., Bender, S., Harhoff, D., Hoisl, K., P. Scioch (2014). The MPI-IC-IAB-Inventor data 2002 (MIID 2002): Record-linkage of patent register datawith labor market biography data of the IAB. FDZ-Methodenreport 06/2014. Nürnberg: IAB.
§ Dorner, M., D. Harhoff (2017). A novel technology-industry concordance based on linked inventor-establishment data. Unpublished working paper. Research Policy - revise and resubmit.
§ Geuna, A., Kataishi, R., Toselli, M., Guzmán, E., Lawson, C., Fernandez-Zubieta, A. , B. Barros (2015): SiSOB data extraction and codification: A tool to analyze scientific careers. Research Policy, 44(9), 1645-1658.
§ Griliches, Z. (1990). Patents as Economic Indicators: A Survey. Journal of Economic Literature, 28(4), 1661-1707.
§ Li, G.-C. et al. (2014). Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010). Research Policy, 43(6), 941–955.
§ Morrison, G., Riccaboni, M., F. Pammolli (2017). Disambiguation of patent inventors and assignees using high-resolution geolocation data. Nature –Scientific Data, 4:170064, DOI: 10.1038/sdata.2017.64.
§ Pezzoni, M., Lissoni, F., G. Tarasconi, (2014). How to kill inventors: testing the Massacrator algorithm for inventor disambiguation. Scientometrics, 101(1), 477–504.
§ Raffo J., S. Lhuillery (2009). How to play the ‘names game’: Patent retrieval comparing different heuristics. Research Policy, 38(10), 1617–1627.
§ Schmoch, U. (2008). Concept of a Technology Classification for Country Comparisons. Final Report to the World Intellectual Property Office (WIPO), Karlsruhe: Fraunhofer ISI.
§ Trajtenberg, M., G. Shiff, R. Melamed (2006). The Names Game: Harnessing Inventors’ Patent Data for Economic Research. NBER Working Paper No. 12479. Cambridge/MA.
§ Ventura S., Nugent R., E. Fuchs (2015). Seeing the non-stars:(some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records. Research Policy, 44(9), 1672–1701.
References
15Linked Inventor Biography Data - A new method of inventor disambiguation
Max Planck Institute for Innovation and Competition, Munich
Thank you!
Matthias DornerMax-Planck Institute for Innovation and Competition (MPI-IC)
and Institute for Employment Research (IAB)matthias.dorner@ip.mpg.de
top related