Tag Archives: SEMS14

New paper: Reviewing Research on Spreadsheet Dependence Tracing Techniques


Hi all. In my blog post Dependence Tracing: Need for an Investigation I talked about what is dependence tracing in spreadsheets and what is the necessity to review existing techniques to do the same. We did a study of the research that has been done on the topic and I am happy to inform you that the paper documenting the study and its conclusions is going to be published in the proceedings of SEMS’14. In this post I briefly highlight some key aspects of our findings. I recommend my previous post  Dependence Tracing: Need for an Investigation for those interested to have a little background information on the problem of dependence tracing.

We were interested particularly to understand the state of the art of research on dependence tracing. Our principal motivations arose from feedbacks we had obtained in previous user studies and surveys where there was a clear dissatisfaction among the industrial spreadsheet users about available dependence tracing tools and enhancements. To put very frankly it is an unanimously accepted fact that MS Excel is by far the most popular spreadsheet application used in the industry today. The feature that is available for dependence tracing in-built with MS Excel is the blue tracing arrows that points to precedent or dependent cells. This technique evidently is not the most convenient one for all types of scenarios that arise for an end user when she is creating or modifying spreadsheets. It has some known issues like:

  • wastage of effort and time (one mouse click too many),
  • ambiguity (arrows pointing to cells of the same row end up superimposing on each other creating confusion)
  • interference (it does indicate the dependents but it also hampers the user’s visual clarity by creating cluttered arrows all across the worksheet) that makes it inconvenient to both work on the sheet and get idea about dependence structure simultaneously.



Therefore it was our hypothesis under the paper’s context that better dependence techniques or tools developed under scientific research was not adequately making it across to the industry to benefit real-life spreadsheet users. Based on this hypothesis we identified three research questions:

  1. Why the research ideas are not getting across to the industry as popular and user-friendly implementations?
  2. Is there any standardized metrics framework with which such implementations as above can be compared against each other in their effectiveness to real-life spreadsheet users?
  3. Is there any clear opportunity of improvement in the dependence tracing context?

In order to find answers for these questions, our first step was a review of the existing research literature on this topic and this preliminary review is practically what makes up this paper.

We shortlisted five published research papers for our study, each having description and details of some innovative idea for dependence tracing. Our approach was to characterize these innovations based on the following aspects:

  • The basic technique/principle/strategy
  • Tools or prototypes developed if any
  • Comments or details available on testing, performance, and limitations of the tool
  • Current status of the research and its implementation, and its perceived relevance or influence in the industrial scene
Paper Published In Technique Prototype Prototype Comments Industrial Relevance
  1. 1998
Fluid Visualization & Animation Pad++ (now defunct) and Python for UNIX Performance drastically worsens after 400 cells.No user study. No implementation plan given. Some ideological influence can be observed in MS Excel 2007 onwards.
  1. 1999
3D Interactive lift-up operation OpenGL APIs and SLSC for UNIX None available.  None.
    3.  2000 Visual indications superimposed on Excel sheets like arrows, highlights, shading. VBA plug-in for Excel. None available. No standardized definite influence. However similar techniques are used in ad hoc basis by expert spreadsheet users on individually or internally to organizations.
  1. 2003
Different types of graphs and trees depicting spreadsheet dependency structure. Java tool for Excel workbooks. Successfully tested on a corpus of 259 Excel workbooks of undisclosed source.No user acceptance or comprehension study. Authors apparently did not carry forward their work on this project.
  1. 2011
Leveled dataflow diagrams C# toolkit that accepts Excel workbooks for input. Extensive user testing with 27 industrial spreadsheet users in a financial asset management company. Is currently a part of service offered by the company Infotron.nl


Thus, from this review we observe that:

  1. All 5 papers provide innovative ideas to simplify dependence tracing with visualization aids.

=>  There is no lack of ideas in this context.

  1. All 5 papers provide either tools or their prototypes to demonstrate the techniques.

=> The ideas are quite conceivable.

  1. 2 out of 5 prototypes are based on defunct or obsolete technology. They have not been propagated to newer current technologies. One that is based on VBA (made in 2000) is supposed to work for all Excel versions but is actually subject to be tested for version incompatibility. Only 2 out of 5 are made with Java or C# and can be considered as modern enough to actually be working at present.

=> There is lack of recent work done based on current state of technology that can be used at present in the industry.

  1. 3 out of 5 were properly tested. One showed very poor performance as 400 cells is nothing compared to the number of cells that is found in spreadsheets used in the industry! No solution or approach was provided about how to scale this up.

=> All ideas that are conceived need to be tested for performance. If that is not done, then it is evident why it will not find itself across to the industry and actually help real-life spreadsheet users.

  1. Only 1 out of 5 was tested for useability and user-acceptance!

=> Innovative ideas are all exciting, but they are meaningless if actual users don’t find them helpful. Without user studies the effectiveness of ideas or research can not be established.

  1. Only 1 out of 5 actually ended up having industrial exposure. All the rest were practically aborted or left off by their creators.

=> There must be serious issues in making these tools actually popular to the real life spreadsheet users; or else the researchers working in the domain are not properly motivated to do anything about the problems.


We can now revisit our research questions taking into account the above observations:

  1. Why the research ideas are not getting across to the industry as popular and user-friendly implementations?

Although still not quite clear as to what is the reason behind this, one reason seems to be just plain lack of motivation on part of researchers. Otherwise why not follow up on your innovation and see to it that it gets implemented. A reason behind this reason can also be the pressure in the academia which encourages writing tons of papers on new ideas but does not give equal limelight to follow-ups and industrial implementations!? This may cause researchers to abandon their projects at the conception stage and move on to newer projects.

Another reason seems to be difficulty in performance scaling. Spreadsheets are tricky objects with sizes varying from small to very large. The range of sizes is quite phenomenal. This might be a cause due to which innovative ideas are found difficult to be put into real practical use? This reason is indicated by the fact that most of the projects we covered in the review were not extensively tested.

  1. Is there any standardized metrics framework with which such implementations as above can be compared against each other in their effectiveness to real-life spreadsheet users?

Evidently there is none. And no wonder why only one of the projects was tested extensively with a group of real-life spreadsheet users. This is a very important necessity. To come up with a set of standards and metrics that can define useability or user-friendliness of spreadsheet aid applications. In that way it will be easy to establish the true worth of innovative techniques as laid against backdrop of effectiveness to real-life users.

  1. Is there any clear opportunity of improvement in the dependence tracing context?

Based on our review and the observations obtained, it is doubtless that this question is still a very much wide open one. There can be enough opportunity found to research and come up with new tools that actually make a difference in the spreadsheet industry in form of reliable and user-friendly implementations. And not only ones that are created in ad hoc basis for internal use in organizations but robust standardized ones that can be widely adopted through out the industry.

SEMS accepted papers published

The following papers have been accepted for publication at SEMS 14:

  • Birgit Hofer, Dietmar Jannach, Thomas Schmitz, Kostyantyn Shchekotykhin and Franz Wotawa. Tool-supported fault localization in spreadsheets: Limitations of current research practice
  • Dietmar JannachThomas Schmitz and Kostyantyn Shchekotykhin. Toward Interactive Spreadsheet Debugging
  • Raymond Panko. Improving Methods in Spreadsheet Error Research
  • Richard PaigeDimitris Kolovos and Nicholas Matragkas. Spreadsheets are models too
  • Birgit Hofer and Franz Wotawa. On the Usage of Dependency-based Models for Spreadsheet Debugging
  • Daniel Kulesz. A Spreadsheet Cell-Type Model for Testing
  • Pablo Palma. SBBRENG: Spreadsheet Based Business Rule Engine
  • Peter Sestoft. End-user development via sheet-defined functions
  • Sohon Roy and Felienne Hermans. Dependence Tracing Techniques for Spreadsheets: An Investigation
  • Jácome CunhaJoao FernandesJorge Mendes, Rui Pereira and João Saraiva. MDSheet – Model-Driven Spreadsheets
  • Thomas Levine. Automatically assessing the relevance, quality, and usability of spreadsheets
  • Thomas Levine. Sheetmusic: Making music from spreadsheets
  • Raymond Panko. Are We Overconfident in Our Understanding of Overconfidence?
  • Joeri van Veen and Felienne Hermans. Anonymizing Spreadsheet Data and Metadata with AnonymousXL
  • Bas Jansen and Felienne Hermans. Using a Visual Language to Create Better Spreadsheets

First workshop on Software Engineering methods in Spreadsheets


Co-located with the Annual conference of the European Spreadsheet Risk Interest Group in Delft , the First workshop on Software Engineering methods in Spreadsheets (SEmS ’14) will take place on Wednesday July 2nd.

We aim for an interactive workshop with a program that allows us to discuss the future directions of spreadsheet research.

For the workshop, we invite researchers and practitioners interested in the field of spreadsheet to submit. Topics include, but are not limited to:

  • Design and implementation of spreadsheet systems
  • Integration of spreadsheet systems with other types of systems
  • Managing large legacy spreadsheets
  • Spreadsheet testing
  • Error-finding in spreadsheets
  • Spreadsheet refactoring
  • Methods for spreadsheet engineering like scrum or domain-driven design


We solicit three types of papers for the workshop:

Research papers (4 pages) that present novel research ideas and early results
Position papers (2 pages) that raise open issues with the potential to stimulate discussion at the workshop and
Tool demos (2 pages) that present exciting spreadsheet tools that bring insight into the questions addressed by the workshop

All papers are to be submitted via EasyChair and should follow the ACM SIG guidelines.

Deadline: February 28th March 7th, anywhere on earth
Notification: March 28th
Workshop: July 2nd


Peter Sestoft, IT University of Copenhagen
Richard Paige, University of York
Felienne Hermans, Delft University of Technology

If you have any questions, comment below or contact us on sems@spreadsheetlab.org