simon murphy simon.murphy@codematic developer – codematic ltd

23
Management Paper - Comparison of Spreadsheets with other development tools (limitations, solutions, workarounds and alternatives) Simon Murphy [email protected] t Developer – Codematic Ltd

Upload: vanna

Post on 07-Jan-2016

47 views

Category:

Documents


1 download

DESCRIPTION

Management Paper - Comparison of Spreadsheets with other development tools (limitations, solutions, workarounds and alternatives). Simon Murphy [email protected] Developer – Codematic Ltd. Spreadsheet background. 5-30 Mb size 20-200,000 formulas 1-10,000 unique formulas - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Management Paper - Comparison of Spreadsheets with other development

tools (limitations, solutions, workarounds and alternatives)

Simon Murphy

[email protected]

Developer – Codematic Ltd

Page 2: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Spreadsheet background• 5-30 Mb size• 20-200,000 formulas• 1-10,000 unique formulas• 5-10,000 lines of VBA• £1M to Billions in values• Often linked to other technologies such as

OLAP, ADO, COM or .net etc.• Finance, Banking and Sales and Marketing

areas• Growth rate 500 pa pa (12k data items ph)

Page 3: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Agenda• 2 Apologies – Excel v Spreadsheet, Critical• Motivation• Definitions• Brief Summary Of Main Points• In Depth Analysis And Demos

– Select by location– Worksheet & VBA insecurity– Formula complexity

• Culture• Summary• Conclusion• Any Questions

Page 4: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Motivation (For writing the paper)

• Spreadsheets seem more difficult to test effectively than databases and source code.

• Most mainstream/formally trained developers shun spreadsheets as a development tool.

• Most spreadsheet developers only work in spreadsheets, rarely databases, or procedural languages.

• Spreadsheets seem to be stuck on ‘Garbage in Garbage Out’ whilst mainstream development has moved to the much more robust ‘No Garbage In’ approach.

• Many observers recommend extra tools, methodologies or training to improve spreadsheet quality, they miss the key point – People use spreadsheets because of their flexibility not in spite of it.

• Is it realistic to comply with the spirit of Sarbanes-Oxley with spreadsheets?

• I believe spreadsheets should have a much more limited role in important information systems.

Page 5: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Definitions• Spreadsheet – powerful and flexible, single end user,

analysis and presentation tool, optimised for speed of initial development.

• Spreadsheet Paradigm – normal reference based formulas, excludes lookups and pivots etc.

• Systems Development lifecycle – Requirements, Analysis, Design,<Technology Choice>, Construction, Test, Release, Maintain. In some shape or form.

• Spreadsheet Development lifecycle – “Oh! I need a model” – clickety-click, reasonableness check, release, (Test & Maintain in live environment).

Page 6: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Brief Summary

1. Inherently fragile2. Not type safe3. Only have global scope4. Lack of data/logic separation5. Insecure6. Don’t scale well7. Limited development tools8. VBA generally poorly written9. Often ad-hoc10. External links can be dangerous

• Spreadsheets are great for many jobs, but their flexibility make them ...

1. Focus on smaller models2. Clear layout, data validation3. Use blocks with spaces4. Layout for understanding5. Use compiled language6. Use in requirements phase7. 3rd Party tools from web8. Take training/coaching9. Sketch design on paper first10. Use auditable import routine

Issue Possible workaround

Page 7: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

External link example from a commercial model

• 1 workbook linked to 34 other workbooks, – 20 of which were found – 14 missing (one was

‘book1’ (i.e. not saved)) – over 100 links found in

total. • Excel generally does not

know or warn about circular references through external links.

• Links lock directory structure

• Note: Approx 1.4 Million data items

Page 8: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demos

• Select by Location v Select by Value

• Worksheet and VBA Insecurity

• Formula complexity

Page 9: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 1

Select by location not value

Original paper spreadsheet had only one mode of access, by the human user, reading and writing the text and numbers.

Page 10: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 1

Select by location not value

Electronic spreadsheets have two modes of access1.The human user, reading and typing the text and numbers, as before. 2.The spreadsheet itself, calculating formulas based on cell address (location).

This disconnect is great for flexibility but makes spreadsheets very fragile.

Page 11: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 1

Database approach

In a database the user issues queries based on the values in the actual data, the database uses these same values to provide results. There is only one access mode, there is no disconnect.

SELECT * FROM PL WHERE LineItem = ‘Sales’

Page 12: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 1

Select by location not value

Demo 1: Which worksheet is correct

Demo 2: The database approach

Page 13: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 2

Worksheet and VBA InsecurityKey point 1: Security is measured in time and effort.

Eg. Safes – TXTL-60X6 – Torch, explosive and tool resistant for 60 minutes (all 6 walls)In software – User skill, access and specialist tools are considerations

(Personal experience: Relying on spreadsheet security can make bad things happen.)

Demo 1: Break sheet protection faster than setting it.

Demo 2: Cracking VBA protection for fun.

Key point 2: Spreadsheet security is not weak or poor. If anything it is probably too strong (for a single end user tool).

Page 14: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 3Formula Complexity (1)

How many people are intermediate or better in spreadsheets?

How many people have any experience of C#?

Page 15: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 3Formula Complexity (2)

Who can tell us what this (array) formula does?

{=INDEX(Circulation!CU1:CU175,

MAX(ROW(Circulation!CU71:CU175)

*(Circulation!CU71:CU175<>0)))}

Page 16: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 3Formula Complexity (3)

Who can tell us what this custom worksheet formula does?

=LastNonZeroValueFromVerticalList(

Circulation!CU71:CU175)

Page 17: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 3Formula Complexity (4)

Note: Although it is possible to write worksheet functions in .net, c++ xlls are still the standard for performance reasons.

Note: function defined once, used many times in workbook

Page 18: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

In Depth and Demo 3Formula Complexity (4)

Performance (57,000 Calcs, 10 times)Version Time secs % of Excel

Excel 27 100%VBA 29 107%C++ 13 48%C# 260 963%(Uses XLL+ from Planatech)

Page 19: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Optional Relative complexitySpreadsheets v mainstream

Glass suggests that for each 25% increase in problem complexity the complexity of the required solution doubles.

Spreadsheet

Database/ ProceduralLang

Glass 1:4 ratio

Effort required to learn data theory and/or structured

programming

Problem Complexity

So

luti

on

Co

mp

lexi

ty

Relative Complexity - Spreadsheets v Mainstream tools

Page 20: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Culture• If Spreadsheets have such clear limits, why are

they so regularly abused?• Must be commercial pressure, preference for speed over

accuracy. Considering current skills, time and cost to build, spreadsheets are a great tool. Considering Total Cost of Ownership they aren’t.

• If the research is right about error rates why no large scale business collapse?

• Spreadsheet errors must not be material (or not mission critical at least). Probably spreadsheets are only a part of the decision support system.

Page 21: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Summary• Although spreadsheets have many uses, they

also have some features that make them inappropriate for certain types of development. We looked at 3 in depth:– Inherent weakness of the ‘Select by location’

approach compared to ‘Select by value’– No real security to protect model integrity or

intellectual property.– Rather cryptic native syntax (even experienced

developers sometimes have difficulty)• Spreadsheet robustness and reliability can be

increased by use of complimentary technologies like .net and databases.

Page 22: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Conclusion• The ‘Select by Location’ approach is a

fundamental cause of spreadsheet complexity and fragility.

• Spreadsheets are a superb tool with many uses, mainly at the smaller and shorter-term end of system developments

• Be aware of their limitations

• You have a choice of tools – choose wisely.

Page 23: Simon Murphy simon.murphy@codematic Developer – Codematic Ltd

Questions?

[email protected]– Spreadsheet consulting, reviewing,

maintaining, rescuing, migrating, add-in development etc.

• Websites– www.codematic.net– www.xlanalyst.co.uk