perl dist::surveyor 2011
DESCRIPTION
Slides on my lightning talk at the London Perl Workshop, November 2011.TRANSCRIPT
Dist::Surveyor“what’s in that lib directory?”
Tim Bunce - Nov 2011
Creative Commons BY-NC-SA 3.0
The Context
Perl 5.8
CPAN modules
Applications
Business modules
The Context
• A large library of CPAN distributions- In a local::lib style dir .../cpan-5.008/{man,bin,lib}/
- Installed over many years
- No external record of what has been installed
- Almost 5000 modules
- In production in many systems on many machines
The Itch
• Want to upgrade from perl 5.8- so need to clone our local library of CPAN modules
- to .../cpan-5.012/{man,bin,lib}/
- with recompiled perl extensions
• Want the exact set of distribution versions- so when testing “nothing but perl changed”
“What’s in that lib directory?”
Innocence and Hope
• Vague memory of something called ‘packlists’
• Vague memory of perllocal.pod install log
• Vague memory of some work by brian d foy
• Usual hope that someone’s already done this
• “How hard can it be?”
/.packlist
• Records only what files were installed
• Doesn’t record the origin distribution
• Useless for my needs
what_dists.pl
• Chris Williams’s github.com/bingos/throwaway
• Matches installed modules to distributions
• Only matches to the latest distributions
• Looked like a good place to start
• I hacked it to use perllocal.pod data and a bunch of heuristics.
• It worked, mostly. Annoying edge cases.
• Lots of hacks, heuristics, and blind luck.
perllocal.pod
• Records a “name” and “version”
• Name is the Makefile.PL NAME- can be the module or distribution name
- or something else entirely
• Version is the Makefile.PL VERSION
- not always the version in the distribution filename
• Incomplete!- Not written by Module::Build based distributions
BackPAN::Version::Discover
• “Figure out exactly which dist versions you have installed”
• Based on BackPAN::Index
• Incomplete and “very alpha”
• Matching logic not very robust
• Just doesn’t work very well for us
DPAN
• “start with an existing Perl distribution and work backward to the MiniCPAN that would re-install the same thing” - brian d foy
• Indexes MD5 and other metadata for all BackPAN modules and scripts
• Incomplete: doesn’t yet work out what distribution versions are installed.
GitPAN
• Git repo for every distribution on CPAN
• Includes all distro versions on BackPAN
• Pondered using git hashes and the github API
• But GitPAN isn’t being maintained
☹
MetaCPAN
MetaCPAN• Repository for CPAN metadata- ElasticSearch distributed database (Lucene)
- RESTful API
• CPAN and entire BackPAN fully indexed
• Very detailed metadata
• Full Of Awesome
MetaCPAN
• Find all releases that contain a particular version of a module:
curl -XPOST api.metacpan.org/v0/file/_search -d '{ "query": { "filtered":{ "query":{"match_all":{}}, "filter":{"and":[ {"term":{"file.module.name":"DBI::Profile"}}, {"term":{"file.module.version":"2.014123"}} ]} }}, "fields":["release"]}'
☺
The Method
• Get installed module names, versions, file sizes
• For every module:- find “candidate distributions” that included that
module version, ideally also matching the file size.
• For every candidate distribution:- get all modules and versions shipped in that distro
- score each candidate by the proportion of its modules and versions which match what’s installed
An Example
Cloning From The List
Cloning From The List
• Can’t simply feed results to cpanm- It’ll fetch the latest version of any prereqs
• Tried to put the list in dependancy order
• Tried to use MiniCPAN::Inject
• Finally added a --makecpan dir option
- Fetches distro tarballs and writes index
- can be used as CPAN repo by cpanm
Typical UsageSurvey what distributions are installed in a library:
$ dist_surveyor.pl --makecpan my_cpan \/a/perl/lib/dir > installed_dists.txt
Install exactly those distributions in a new library:
$ cpanm --mirror file:$PWD/my_cpan --mirror-only \-l new_lib < installed_dists.txt
Bonus: re-tests all distros with current prereqs
Status
• Currently a single script
• Ought to be turned into a module
• Looking for a maintainer