why a new cpan client cpm is fast

Why a new CPAN client

cpm is fastShoichi Kaji

Me• Shoichi Kaji

• Tokyo, Japan

• pause/github: skaji

• Perl5: cpm, App::FatPacker::Simple, Mojo::SlackRTM

• Perl6: mi6, Frinfon, evalbot in Slack:)

Agenda• What is cpm, and why?

• cpanm VS cpm

• The internal of cpm

• divide installing processes into pieaces

• learn from go language

• Roadmap

Q: What is cpm?

A: It’s yet another CPAN client

Why a new CPAN client?

• Yes, I always use cpanm to install CPAN modules. It’s awesome!

• Because cpanm installs modules in series,it takes quite a lot of time to install a module that has many dependencies

I want to install CPAN modules

as fast as possible

Why a new CPAN client?

• So I created cpm

• Actually cpm is not a new CPAN client, but it uses cpanm in parallel,so that it can install CPAN modules much faster

How fast? cpanm VS cpm

installing Plack

cpanm: 30sec cpm: 10sec

cpm is 3x faster than cpanm!

Why cpm is so fast? — The internal of cpm —

First, let’s think simple

$ cat modules | xargs cpanm

Can we just use xargs to parallelize cpanm?

NO, WE CAN’T.

The problem with

• The modules to be installed are not determined in advance.

• Even if you have a list of modules to be installed, cpanm workers will be broken unless you synchronize cpanm workers

• So we have to

• (1) divide installing process of CPAN module into pieces that can be executed individually

• (2) synchronize cpanm workers in some way

$ cat modules | xargs cpanm

(1) Divide installing process of CPAN modules

sub installing_process { my $module = shift; # 1. resolve # query cpanmetadb my $dist_url = resolve($module);

# 2. fetch (and extract) # wget && tar xzf && read META.json my ($dir, @configure_deps) = fetch($dist_url); install_module($_) for @configure_deps;

# 3. configure # perl Makefile.PL/Build.PL && read MYMETA.json my @deps = configure($dir); install_module($_) for @deps;

# 4. install # make install (or ./Build install) install($dir);}

I divided the process into 4 jobs:

* resolve * fetch * configure * install

which are independent

(2) synchronize cpanm workers

Take a look at go language…go introduces two concurrency primitives: * goroutines * channels They are very simple but powerful.

func work(in <-chan string, out chan<- string) {for {

job := <-in// do work with jobout <- "result"

}}

func main() {in := make(chan string)out := make(chan string)go work(in, out)in <- "job"result := <-out

}

Take a look at go language…func main() {

in1 := make(chan string)out1 := make(chan string)go work(in1, out1)

in2 := make(chan string)out2 := make(chan string)go work(in2, out2)

in1 <- "job1"in2 <- "job2"

select {case result1 := <-out1:

// do something with result1case result2 := <-out2:

// do something with result2}

}

It is very easy to increase workers

You can use select() to await multiple channels simultaneously

Can we adopt this idea to Perl5?

Of cource, we can.

go <-> Perl5

go Perl5

goroutine fork(2)

channel pipe(2)

select select(2)

The internal of cpmMaster

cpnam worker

cpnam worker

cpnam worker

select

pipe x 2

pipe x 2 pipe x 2

cpanm worker 1. get job via pipe 2. work, work, work! 3. send result via pipe

Master 1. prepare pipes for

workers by pipe(2) 2. launch workers by

fork(2) and connect them with pipes

3. loop {calculate jobs and send jobs to idle workers. if all workers are busy, then wait them and recieve results by select(2)}

Roadmap

• Last year I talked with Tatsuhiko Miyagawa about cpanm 2.0 (menlo)

• Then he said “why don’t you merge cpm into cpanm itself?”

• I was very happy to hear that!

Roadmap• So if you all find cpm is useful and stable,

then cpm should be merged into cpanm 2.0

• Before merging, there are some problems that need to be resolved:

• The log file is very messy

• I will highly appreciate your feedback!

try cpm now$ cpanm -nq App::cpm

thanks!

why a new cpan client cpm is fast

Engineering