In this post I talk about my new paper, which is going to be published in the proceedings of the 2nd Workshop on Software Engineering Methods in Spreadsheets (SEMS ’15) in Florence, Italy. It is a position paper describing our aims to automatically extract business rules from spreadsheets. [Pre-print in Arxiv.org is here]
What are Business Rules?
In her book Business Rules Applied: Building Better Systems Using the Business Rule Approach, Barbara von Halle writes that a business rule is a “statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business”. Thus the term business rule is somewhat an over-encompassing term used to imply all sorts of assertions that govern the operations of a business. A purist’s definition of business rule has a specific syntax associated with it and fixed method of framing the rules, however the essence remains same i.e. such rules are all the assertions that unambiguously states the actions or operations that are necessary to keep a business running. Therefore business rules are what that contain key business knowledge of an organization or a domain.
What we know about businesses and organizations?
We know that a vast majority of businesses and organizations use spreadsheets for their operations. Therefore it is needless to say that spreadsheets are gold mines of crucial business knowledge with plethora of business rules implemented inside them. Inside spreadsheets they are usually hidden behind cryptic formulas and the heavy presence of numerical data make them difficult to be discerned.
What we can do about it?
By extracting the business rules from spreadsheets we can reap the business knowledge embedded inside them and make it available in a better readable format. And we intend to do that in an automated way with a software tool.
What are the potential benefits of doing that?
1) High Level Analysis of Spreadsheets – Extracting business rules enables generation of documentation for spreadsheets at a higher abstraction level than the spreadsheets themselves. This facilitates the following:
a) Comprehension – It becomes easier for end-users, who are typically not programmers, to understand the structure and operation of large and complex spreadsheets helping them efficiently work with or modify such spreadsheets with reduced errors and mistakes.
b) Comparison – Comparing spreadsheets becomes possible in order to estimate whether they implement same or similar functionalities, or even are identical behavior-wise only differing in data values. The latter cannot be done for example by an application that compares spreadsheets in data and formula level.
c) Validation – Organizations using set of well-formed and pre-laid business rules can validate whether the spreadsheets created by their employees accurately implement those rules or if there are errors in the logical level.
2) Understanding of Organizational Business Rationale – Some organization may not have their business strategies well laid out in business rule format; yet vital business knowledge of experts working in the company is hidden in spreadsheets. Extracting this knowledge would help to form a clear picture of how that organization works and its structure.
3) Support for Migration – IT architects need to understand the business logic when migrating functionalities and computations implemented in spreadsheets into conventional software. Furthermore business analysts need to ensure that the IT architects understood it correctly. This can be achieved through knowledge extraction and an automated process would largely help in this regard.
4) Safe Re-use and Replication of Spreadsheets – Often spreadsheets are created on ad-hoc basis by experts in an organization to implement their unique strategies for certain scenarios. Over time such spreadsheets grow in size and complexity and are used by several employees for similar scenarios but with different data sets. Invariably the users are forced to employ the method of copy-paste to replicate the original spreadsheet and customize it according to their needs by manipulating data and formula. However this process has been found to be extremely error-prone. It is probably safer to re-generate spreadsheets from scratch using the blueprint or structure of the original spreadsheet instead of copy-pasting. Automated business rule extraction can facilitate such blueprint generation and thus make replications of spreadsheets safer.
What goal we have set?
We want to devise an algorithm and subsequently an application that can automatically extract business rules from spreadsheets. Once that is done, we want to find answers to the two following research questions:
RQ1: How accurate the automatically extracted business rules will be as compared to those extracted manually by domain experts and spreadsheet users?
RQ2: How efficient is the automatic extraction process compared to manually extracting business rules from spreadsheets?
We aim to do user-studies and controlled experiments in which we will compare the results of automatic and manual extraction of business rules from spreadsheets.
What a typical problem will look like?
Typical spreadsheets implement business rules to calculate results. For example in the above figure the cell E19 contains the formula SUM(E13:E18). From this formula our algorithm has to infer the business rule “Total earned revenue = Admissions+…+Other earned revenue”.
Mapping E13:E18 to Admissions…Other earned revenue is straightforward. However there is more to determine as the Total Earned Revenue is divided into columns for Last Year, Current Year, etc. Thus the mapping becomes two dimensional.
Furthermore a parser will reach three blank rows and an auxiliary header row (actuals, budget, etc.) before it reaches the “Year” column header row.
Making things even more challenging, the whole structure is repeated into vertical blocks viz. Earned Revenue, Private Sector Revenue.
When mapping the rule “Total private sector revenue=…” the parser will encounter formulas in the 19th row instead of reaching the column headers! Thus, same formula repeated both vertically (in blocks) and horizontally (in year columns), yet being distinct semantically, is a considerable challenge.
What are the principal difficulties?
Spreadsheets do not impose fixed structural uniformity with regards to layout. This is an inherent flexibility of spreadsheets which at the same time is an advantage and also a disadvantage. This makes the mapping between data and labels vary to a large extent from spreadsheet to spreadsheet. Towards extracting business rules, this will be a very crucial challenge to overcome.
In my last blog post I talked about the Aragorn tool. I am happy to announce that since that post which was quite some time back, Aragorn has evolved quite a bit. It’s much more robust now and has lesser bugs. It has also gone through some functional changes!
Aragorn is currently available as Excel 2010 add-in.
Results of user-study for evaluation of Aragorn
- Category A Questions: Responses_All_141124
- Category B Questions:
- Scenario 1: Assume it is Friday afternoon and you are eagerly waiting for the week-end to start. Your own assigned tasks are pretty much done and so you are quite relaxed. However your team-lead/manager calls you up and gives you an urgent task. One of your colleagues who had been working on a model workbook is absent due to sickness. Your lead was working on it but suddenly has been called up for an urgent management level meeting so she requests you to finish the remaining modifications on the workbook as things can’t wait. So now you are left with a workbook which probably you have never laid your eyes upon and have to make changes in the span of a few hours.
- Scenario 2: Assume you are running a heavy schedule on a day full of meetings. You receive a call from the client side requesting that the model workbooks that were to be sent, be sent a week earlier; as due to forecasted market dynamics they really need it earlier than expected. You hurry to check work status from your team-members and find that work is almost ready but some errors have come up in testing. In order to hurry up the whole process you join in the activity of removing the errors from these workbooks which your junior team-members have created. Results: Responses_All_141122
A quick summary of what Aragorn provides:
- It gives you instantaneous warning about dependents of cells you are about modify; even when they are situated in different worksheets.
- It requires one-time processing of your Excel workbook and for workbooks of size of the order of 12 – 14MB the time required for processing is about half a minute.
Please go through our original post on Aragorn for a better picture about it.
Aragorn is a research initiative and is in experimental stage. Please make sure you have adequate back-up copies of your workbooks on which you want to try it out. Trying Aragorn on data or workbook of critical importance to your work is not recommended at present.
Once you activate Aragorn on a workbook, any changes you made before closing will be by default saved even if you do not save it pro-actively.
Also note that if you make some changes, you need to pro-actively save the workbook in order to see the reflection of the changes in the Arasense behavior in the same session. When you save, Arasense will re-process your workbook in order to consider the new changes.
When AraSense is in turned ON state, the ‘undo’ feature does not work. If you turn it OFF it works.
So what (who) is Aragorn?
We all know who Aragorn is! The mighty brave king of Men, Isildur’s heir, the skilled warrior, protector of the Fellowship of the Ring and the hobbits. Unfortunately we are not talking about him here. We are in the process of developing an Excel add-in tool, which quite similar to the real Aragorn, will help you protect yourself from orcs, goblins, and such nasty creatures (mistakes, errors, pitfalls) while you work on your spreadsheets.
What made us develop Aragorn?
We were trying to have a closer look at spreadsheet dependence tracing techniques. In my blog post I discussed some of our findings from a review of the research on this topic. We found that there was still enough need for good and reliable dependence tracing techniques which can be adequately and effectively adopted by the industry in a standardized and popular way. In 2013, in her dissertation, Dr. Felienne Hermans wrote about some responses from spreadsheet users –
We were particularly motivated by a scenario where you, as a spreadsheet user, have the task of coming up with some calculation or financial estimate after some data crunching in a fairly large complex spreadsheet which you have not created yourself. Imagine, to obtain the required results you need to change some parameter values, modify some cells, move data around here and there. However not knowing well the full design structure of the spreadsheet you definitely do not want your changes to affect the whole model so that the results you obtain turn out to be wrong! On the other hand perhaps it´s not even worth it for you to spend days in trying to understand the spreadsheet model as it may be for a one-off use in your case. This is exactly the type of scenario where a quick, real-time prompting system displaying cell dependency information can make things a lot easier. While this was our primary motivation, we also wanted to make sure that the tool was easy and lucrative enough for you. Therefore we also aimed to ensure that the following key features were there in our tool:
- Effortless: We did not want to burden you with additional effort of multiple mouse clicks or peering into the screen trying to distinguish between overlapping blue arrows which all seem to point towards the same cell or no-where.
- Real-time: We wanted our system to be a process that is part of using the spreadsheet itself and not a separate activity to be done at a different time.
- Instantaneous: We wanted our system to be quick and momentary so that you do not have to wait for the information.
- Specific: We did not want to burden you with more information than you really need; it’s not always necessary or worthwhile to understand the whole structure of a spreadsheet when all you need to know is whether a change in one or two cells are going to affect other cells.
So what have we come up with?
The core of Aragorn is the floating prompt system. With these localized text-boxes very close to active cells, we can display crucial information about the selected cell in a transient manner.
In addition to this we have developed the feature called AraSense particularly to cater the need of dependence tracking especially for scenarios like I described above. It is our answer to the questions raised by the users.
We all know how the real Aragorn is also an expert ranger with superior tracking skills. He could track the trail of hobbits when they almost gave up hope of finding them alive!
Similarly the AraSense feature is the tracking power of the Aragorn tool. It can pre-sense dependencies and thus its called AraSENSE. It can do a one-time analysis of an opened Excel workbook, after which it will display a list of dependent cells in the floating prompts for every cell you click on. If you are not trying to edit the cell, normally the prompt will disappear automatically after a delay. If you are trying to edit a cell, the prompt will persist next to it as long as you are editing the cell. Even if the dependent cells are in a different worksheet, they will be listed in the prompt. You can turn the feature on or off based on your preference. No warning is shown for cells that do not have any dependents. Thus when the feature is turned on, if there is no prompt for a cell, then you can safely modify its contents without worrying. And if there is a prompt then you know which are the dependent cells and on which sheet they are. Thus you can be cautious and forewarned about the changes you make. The AraSense feature is effortless, real-time, instantaneous and specific. It fulfills all the aims we had about the system we wanted to give you.
So how it works?
So what we would like from you?
We would like to hear your thoughts on Aragorn and AraSense.
If you are a professional who uses spreadsheet at work, then do you think this tool will be helpful for you?
If you are not a spreadsheet user, but interested about end-user computing and works with usability aspects then do you think our tool addresses some key usability aspects?
Whoever you are, if you have read this and found it worthwhile of your time, we would like to thank you and would really like your thoughts and opinions. Encouragements are obviously appreciated but we would also eagerly welcome criticisms and suggestions for improvements.
Hi all. In my blog post Dependence Tracing: Need for an Investigation I talked about what is dependence tracing in spreadsheets and what is the necessity to review existing techniques to do the same. We did a study of the research that has been done on the topic and I am happy to inform you that the paper documenting the study and its conclusions is going to be published in the proceedings of SEMS’14. In this post I briefly highlight some key aspects of our findings. I recommend my previous post Dependence Tracing: Need for an Investigation for those interested to have a little background information on the problem of dependence tracing.
We were interested particularly to understand the state of the art of research on dependence tracing. Our principal motivations arose from feedbacks we had obtained in previous user studies and surveys where there was a clear dissatisfaction among the industrial spreadsheet users about available dependence tracing tools and enhancements. To put very frankly it is an unanimously accepted fact that MS Excel is by far the most popular spreadsheet application used in the industry today. The feature that is available for dependence tracing in-built with MS Excel is the blue tracing arrows that points to precedent or dependent cells. This technique evidently is not the most convenient one for all types of scenarios that arise for an end user when she is creating or modifying spreadsheets. It has some known issues like:
- wastage of effort and time (one mouse click too many),
- ambiguity (arrows pointing to cells of the same row end up superimposing on each other creating confusion)
- interference (it does indicate the dependents but it also hampers the user’s visual clarity by creating cluttered arrows all across the worksheet) that makes it inconvenient to both work on the sheet and get idea about dependence structure simultaneously.
Therefore it was our hypothesis under the paper’s context that better dependence techniques or tools developed under scientific research was not adequately making it across to the industry to benefit real-life spreadsheet users. Based on this hypothesis we identified three research questions:
- Why the research ideas are not getting across to the industry as popular and user-friendly implementations?
- Is there any standardized metrics framework with which such implementations as above can be compared against each other in their effectiveness to real-life spreadsheet users?
- Is there any clear opportunity of improvement in the dependence tracing context?
In order to find answers for these questions, our first step was a review of the existing research literature on this topic and this preliminary review is practically what makes up this paper.
We shortlisted five published research papers for our study, each having description and details of some innovative idea for dependence tracing. Our approach was to characterize these innovations based on the following aspects:
- The basic technique/principle/strategy
- Tools or prototypes developed if any
- Comments or details available on testing, performance, and limitations of the tool
- Current status of the research and its implementation, and its perceived relevance or influence in the industrial scene
|Paper Published In||Technique||Prototype||Prototype Comments||Industrial Relevance|
||Fluid Visualization & Animation||Pad++ (now defunct) and Python for UNIX||Performance drastically worsens after 400 cells.No user study.||No implementation plan given. Some ideological influence can be observed in MS Excel 2007 onwards.|
||3D Interactive lift-up operation||OpenGL APIs and SLSC for UNIX||None available.||None.|
|3. 2000||Visual indications superimposed on Excel sheets like arrows, highlights, shading.||VBA plug-in for Excel.||None available.||No standardized definite influence. However similar techniques are used in ad hoc basis by expert spreadsheet users on individually or internally to organizations.|
||Different types of graphs and trees depicting spreadsheet dependency structure.||Java tool for Excel workbooks.||Successfully tested on a corpus of 259 Excel workbooks of undisclosed source.No user acceptance or comprehension study.||Authors apparently did not carry forward their work on this project.|
||Leveled dataflow diagrams||C# toolkit that accepts Excel workbooks for input.||Extensive user testing with 27 industrial spreadsheet users in a financial asset management company.||Is currently a part of service offered by the company Infotron.nl|
Thus, from this review we observe that:
- All 5 papers provide innovative ideas to simplify dependence tracing with visualization aids.
=> There is no lack of ideas in this context.
- All 5 papers provide either tools or their prototypes to demonstrate the techniques.
=> The ideas are quite conceivable.
- 2 out of 5 prototypes are based on defunct or obsolete technology. They have not been propagated to newer current technologies. One that is based on VBA (made in 2000) is supposed to work for all Excel versions but is actually subject to be tested for version incompatibility. Only 2 out of 5 are made with Java or C# and can be considered as modern enough to actually be working at present.
=> There is lack of recent work done based on current state of technology that can be used at present in the industry.
- 3 out of 5 were properly tested. One showed very poor performance as 400 cells is nothing compared to the number of cells that is found in spreadsheets used in the industry! No solution or approach was provided about how to scale this up.
=> All ideas that are conceived need to be tested for performance. If that is not done, then it is evident why it will not find itself across to the industry and actually help real-life spreadsheet users.
- Only 1 out of 5 was tested for useability and user-acceptance!
=> Innovative ideas are all exciting, but they are meaningless if actual users don’t find them helpful. Without user studies the effectiveness of ideas or research can not be established.
- Only 1 out of 5 actually ended up having industrial exposure. All the rest were practically aborted or left off by their creators.
=> There must be serious issues in making these tools actually popular to the real life spreadsheet users; or else the researchers working in the domain are not properly motivated to do anything about the problems.
We can now revisit our research questions taking into account the above observations:
Although still not quite clear as to what is the reason behind this, one reason seems to be just plain lack of motivation on part of researchers. Otherwise why not follow up on your innovation and see to it that it gets implemented. A reason behind this reason can also be the pressure in the academia which encourages writing tons of papers on new ideas but does not give equal limelight to follow-ups and industrial implementations!? This may cause researchers to abandon their projects at the conception stage and move on to newer projects.
Another reason seems to be difficulty in performance scaling. Spreadsheets are tricky objects with sizes varying from small to very large. The range of sizes is quite phenomenal. This might be a cause due to which innovative ideas are found difficult to be put into real practical use? This reason is indicated by the fact that most of the projects we covered in the review were not extensively tested.
Evidently there is none. And no wonder why only one of the projects was tested extensively with a group of real-life spreadsheet users. This is a very important necessity. To come up with a set of standards and metrics that can define useability or user-friendliness of spreadsheet aid applications. In that way it will be easy to establish the true worth of innovative techniques as laid against backdrop of effectiveness to real-life users.
- Is there any clear opportunity of improvement in the dependence tracing context?
Based on our review and the observations obtained, it is doubtless that this question is still a very much wide open one. There can be enough opportunity found to research and come up with new tools that actually make a difference in the spreadsheet industry in form of reliable and user-friendly implementations. And not only ones that are created in ad hoc basis for internal use in organizations but robust standardized ones that can be widely adopted through out the industry.
I had very recently visited the offices of a financial modeling company in Delhi. It is a British financial modeling company with their primary modeling team based out of Delhi, India. In this post I would be summarizing some of my findings during my visit. It should grant a perspective of the state of the art on use of spreadsheets for financial modeling purposes.
- What’s the FAST standard?
The FAST acronym stands for Flexible, Appropriate, Structured and Transparent. It is essentially a standard for financial modelling.
- Wait, what is it again?
Okay, just like in Java language we have several hoards of rules, guidelines and conventions to follow when we are coding, the FAST standard defines rules and conventions to follow while doing financial modeling. Financial modeling has been traditionally done with and in spreadsheets and thus the FAST standard basically set guidelines and best practices while working with spreadsheets with the purpose of various types of financial modeling. Furthermore as traditionally the most popular and widely used spreadsheet in modern times has been MS Excel, the FAST standard, although universal in intent, has become inevitably correlated with the Excel technology; as in it tries to work around the weaknesses and exploit the strengths of the Excel environment. Nevertheless the guidelines themselves are not Excel specific and it does not for example has Excel keyboard shortcuts or macro based shortcuts as part of it. However the recommendation to use shortcuts instead of mouse is certainly a part of it. So how well or whether FAST standard can be implemented on other environments than excel is an exploratory question.
- Where can one find it?
The FAST standard is an open standard and is maintained by an universal group at http://www.fast-standard.org/
- What is F1F9? (Apart from being a weird word, more of a cryptic code rather!)
F1F9 is a financial modeling company originating in UK with a considerable global presence across various regions of the world. Its their office that I had visited.
- What with the weird name?
F1 and F9 are two keyboard shortcuts in MS Excel: Help & Calculate…. that is the company’s business, to help people calculate. Thus the naming F1F9. (Cheeky eh?)
- F1F9 and FAST??
The FAST standard was conceived by John Richter and Morten Siersted around a decade back when they were both working in F1F9. It was later released as an open universal standard and has been ever since contributed to by many modelers from across the world. F1F9 has forever pioneered the cause of FAST and still believes in and follows it for its own modeling activities.
So there I was in the Delhi office of F1F9. Much of our work in the Spreadsheet Lab revolves around the fundamental research question of how to apply software engineering methodologies and good practices to the domain of spreadsheet programming to improve it and to resolve known and unknown issues; where the “programmers” are not really programmers but are all types of managers and business executives.
The idea was to learn how F1F9 is using spreadsheet programming to do their work of financial modeling; to identify areas where software engineering methods can be implemented; to perceive issues they might be facing with spreadsheets which they have accepted as part of life but which perhaps we at Spreadsheet Lab can solve.
So it was the FAST standard that I learned as the very first thing with a practical example of how it was being utilized to do a real life financial modeling.
Very soon I observed some obvious similarities between how financial modeling is being done in F1F9 and several widely used coding conventions and software engineering best practices. I compare the similarities based on two types, type one being similarities largely due to the FAST standard which is but analogous to best practices and conventions in coding; type two being similarities in the whole spreadsheet engineering processes and conventional software engineering processes and thus not being limited just to the domain of coding.
Note that these are only the similarities I have already identified based only on my current knowledge level of financial modeling, FAST standard and F1F9’s modeling style and therefore this list can surely grow in the future as I get to know more about these things.
What similarities can be identified with software engineering context:
Similarities due to the FAST standard: The FAST standard has several best practices and guidelines recommended in it to follow while doing financial modeling. It is similar to various standards and coding conventions that we follow in the software engineering world to come up with more mature and robust software. Certain parallels that are readily identifiable are the following: (Currently based only on the aspects of the FAST standard I have already learned)
- Variable Naming Conventions:
In most programming paradigm or language we have well-defined variable naming conventions. Following these conventions are not required by the language itself or the compiler; the programs can function correctly even if they are not followed. However these conventions are religiously followed in order to make code more readable. To make programmers easily understand purpose of a piece of code; so that even someone who has not written the code can understand it. Its purpose is to provide some sort of meta data to the programmer. Something similar has been recommended in the FAST standard. It is not really naming convention but it serves the same purpose; it expresses information about all the variables and data that is used in the spreadsheet.
How it is done?
Unlike naming conventions and using upper case, lower case, camel case and so on in FAST it is done using different types of cell highlight colors, and relative position (columns in Excel) of variables, constants, series, calculations etc. in the spreadsheet, and maintaining this positions uniformly across the spreadsheets. Just by looking at which column the particular data is located or the cell highlight color a modeler can understand the nature and purpose of that data.
- Addition of Comments
Like commenting is one of the most essential best practices for coding of programs, the same is also in use under the FAST standard. Cell comments are used to demarcate rows or areas of the spreadsheet which is in some way having some ambiguities or is unusual.
- Use of CONSTANT declaration
In programming logic it is often considered good practice to not use constants directly inside operation expressions as hard coded values. Instead it is recommended to use certain variables for this purpose and fixing their values by declaring them as constants. In various programming languages this is done in various ways.
Similarly the FAST standard recommends not using embedded hardcoded values inside formulas; rather it is suggested to input constant values in certain cells and use those cell references after proper anchoring inside of formulas used across the spreadsheets. Like most programming language naming conventions identify such constant entities from their UPPER CASE names here it is identified from their cell highlight color and column location.
- Driver and stub modules
In coding practice and in software engineering we have the culture of using driver and stub modules which are practically dummy modules that have been not coded yet, having some hardcoded values or values that are yet to be calculated in the software or program itself. These modules then work in conjunction with the modules of the program that have been actually completed in order to test or demonstrate the overall functionality of the program. Subsequently they are also coded and completed at a later stage. Similar thing is done in financial modeling in the FAST standard.
Use of placeholders are serving the same purpose like drivers or stubs. Placeholders are those values which are not originally supplied and are temporarily taken as assumptions. Based on the assumptions the rest of the model is completed and placeholders can then eventually be replaced with real values obtained at a later stage. Using placeholders enable the modelers to continue with modeling work and not wait for some values that have been not yet supplied.
- Coding line length limit
In coding, many standards, guidelines and editors recommend or enforce something called the coding line length cap. This limits the coders from writing a line of code which more than 100 or 120 characters long because after this length readability of the code is seriously messed up. Similar thing is recommended in the FAST standard namely the “Thumb Rule” for the cell formulas. This rule states that any formula should not be so complicated or long that it is more than the length of your thumb. When requirement is there for a longer formula surely in some way the task can be divided and done by two or more shorter formulas.
- Call by Reference and Call by Value
In programming we clearly distinguish between call by reference and call by value in context of method calls. Methods are smaller modules of a single program where some tasks or calculations are done. When methods are invoked, data is passed to them by the main program through inputs. These inputs can be variable references or pointers which means when the method will work with them the original variables may become modified. The inputs can also be temporary variables with values copied from the original main program variable, in which case the methods can return new value after processing but cannot directly modify the original main program variables. Both has its own merits but normally it is considered safer to do call by value i.e. using copied values than original variables themselves.
Similarly in financial modeling, under the FAST standard, it is encouraged to use range names and not direct cell references when working with them in VBA code inside Macros. Direct cell reference can become altered when rows or columns are added to the sheet; range names however stick to the data in the cell and shifts along with the cell if pushed by new rows or columns. Therefore using range names inside Macros ensure that they always get the correct data passed to them or the code need not be repeatedly changed whenever rows or columns are added.
- Use of checker tools:
Like in software engineering various types of checker tools and inbuilt checks in editors are used to ensure that best practices and conventions are maintained in coding, similarly automated checks for FAST standard compliance of the models is being done using various types of software checker tools.
Similarities due to model development cycle or process:
- Configuration Control:
In software engineering build maintenance and configuration control play a very important part. Although there these operations are large and elaborate, here in F1F9 an elementary form of it is observed in the way version is controlled and back up is archived of the Excel workbooks containing the models in a manual fashion.
My first point of entry into the world of research work in spreadsheet use/engineering was the work of Felienne Hermans of the Software Engineering Research Group at TU Delft. From my primary explorations of her work I arrived at the understanding of how vulnerable the spreadsheet programming paradigm is to horrendous levels of mistakes, errors and goof ups. Not even sure whether it can be called “programming” at all, because its mere use of spreadsheets governed by no rules or structures and moreover the “users” don’t know sh#% about programming. Yet the volume of this usage is humongous and so is the horror stories related to spreadsheet errors. Companies have lost billions of dollars, academics have lost reputation and credibility, CEO s have lost their jobs just due to spreadsheet errors. The Dirty Dozen book by F1F9 has quite a few examples of cases like this. This brings out how important it is to do something about it; to make some changes and bring some improvements so that the heavy dependence of the industry on spreadsheets can be sustained sans the inevitable hazardous risks related to spreadsheets use. The question that arises immediately is what is the reason behind such a risk?
Why are there so many problems?
The formalized answer to this question is perhaps beyond the scope of this article but in a simple way from what I understood, it all has to do with not following rules or good practices. From what I observe is that companies are functioning in three areas when it comes to working with spreadsheets.
By “rules” in the above diagram I mean rules, best practices, techniques and process flows. In other words we have to understand between spreadsheet use and spreadsheet engineering. Spreadsheets can be used in the way artisans make pottery or handicraft or a genius artist paints a masterpiece (though meaning of which people find difficult to understand). On the other hand spreadsheets can be used in the way automobile is mass manufactured in the assembly line and rolled out on the streets. Anybody who knows to drive a Toyota will also be able to drive a Merc, BMW, Jeep, Renault, Ford or whatever make it is of the car. It is important to distinguish between engineering and handicraft or artistry. If I try to extend the Venn diagram to the analogy of cars it would be something like this…
I hope I have been able to send the point across by now. I hope now it is a bit easier to understand as Why there will be so many accidents in the road? Or Why the spreadsheet use today is vulnerable to erroneous effects? Majority of companies using spreadsheets today are operating in the reddish shaded part of the Venn diagrams in regards to how they use spreadsheets. Only a minority is operating in the bluish shaded portion; F1F9 and any other organization following the FAST standard being inside that area.
So what is the solution?
Again the formalized answer to such a question may be one or more PhD theses themselves and hence out of scope of this article. However to approach in a simple way now I will move away from the automobile industry analogy to the domain of medical industry. What is disease and health? A disease is when a malfunction has already victimized a human body and must be treated or cured. On the other hand health is the human body’s inherent capability to resist against potential diseases n’est-ce pas? The healthier a person is the less prone he or she is to various types of diseases. Increasing or maintaining health is therefore a sort of prevention from possibilities of disease.
The issue of spreadsheets can be approached in the manner analogical to above. Spreadsheet masses that are already existing in the world need to be cured of problems, got rid of issues, re-worked, re-engineered, or simply used with additional augmentations like visualizations, analyzers or such tools just like wheel chairs, artificial legs, prosthetics and so on! And on the other hand efforts must be applied in order to create healthier spreadsheets (luckily, unlike human beings where we have no control over the genetic prototype as of yet, here we have control of their design); to ensure that spreadsheets are created following set of logical, reproducible, feasible set of rules, carefully planned processes and under the guidance of a set of best practices.
So what’s happening in F1F9?
From my stint of study in the F1F9 office at Delhi I have reached a conclusion. A conclusion that summarizes in brief whatever is going on there based on what all I learnt. What is happening is just that F1F9 is consistently and meticulously producing healthy babies. Oops sorry, what I meant is healthy spreadsheets! They have chosen the path of prevention and they strive to avoid running into situations where they might need cures. They are not mere spreadsheets users but they are spreadsheet engineers. Their principal niche is financial modeling but this primary activity they are doing with spreadsheets and they are using spreadsheets in a really commendable manner. And how they are doing this? By following the FAST standard template of course. The FAST standard in the domain of financial modeling based on spreadsheets use is almost equivalent a secret genetic template in the hands of God to create healthier human beings! (This is merely my personal opinion at present point of time and may sound a bit exaggerated but yet the exaggeration has good substance behind it, this I can guarantee). If you carefully consider the best practices laid down by the FAST standard that is rigidly followed at F1F9 and compare it to the myriad of problems highlighted in Felienne Hermans research work, you are bound to feel (and eventually logically conclude) that such problems are very unlikely to occur in spreadsheets created in F1F9 or anywhere else where FAST standard is being followed! The cases studied under the research were mostly replete with everything that the FAST standard categorically asks to avoid and devoid of everything the FAST standard tries to religiously enforce. So in one word F1F9 had definitely taken one step forward in the process of spreadsheets evolution and the FAST standard is a golden key leading towards a bright future for the domain.