15 November, 2015
In February 2015, Open Data Kosovo released e-prokurimi, an online application constructed with procurement data obtained through cooperation with various municipalities across Kosovo. E-prokurimi provides users with a simple way to interact with procurement data and observe trends over time and across municipalities. It also allows users to identify contracts that raised ‘red flags’ or had characteristics that made them appear suspicious – a common methodology used in compliance and anti-fraud efforts across the world.
Building on the experience gained creating e-prokurimi, Open Data Kosovo has been taking a deeper dive into the procurement data. The result is an interactive treemap which allows users to look at procurement contracts across the seven municipalities that provided data, and see where overspend occurred. This treemap allows you to group the data by a range of variables, as well as look at different years.
In addition to this new visualization, we have prepared some high level observations on the data, as well as some recommendations for how the collection of procurement data can be improved going forward.
Before getting into the analysis, there are several caveats that need to be placed on the numbers and charts. As per all analysis, the accuracy and reliability of the analysis is dependent on the accuracy and reliability of the underlying data. In the case of the procurement data received by Open Data Kosovo, it is apparent that the quality of the data is lacking in several key regards. Open Data Kosovo has attempted to clean and standardize the data as far as possible, but there were assumptions made in this process that may not be accurate. As a result, before any conclusions are drawn from this analysis, any figures or other information should be cross-referenced with the original datasets.
Below are some of the key issues identified with the data as provided to Open Data Kosovo.
One of the biggest issues with the procurement data received by Open Data Kosovo was the information that wasn’t provided. A significant proportion of records are missing information about the expected cost of the contract, the amount actually spent, or both. Chart 1 shows the percentage of records missing cost information by municipality and year.
Chart 1 – Percentage of Records Missing Cost Information
Another issue that arises in the data is the quality assurance of the figures that are provided. In numerous cases, figures will have the decimal point in the wrong place or have extra digits, resulting in a project appearing to be greatly over or under budget. In many cases, the projects that show up as bright red in the visualizer were, in all likelihood, on budget but with the information incorrectly recorded. This inaccuracy is not just limited to the monetary amounts though. Many of the dates are also incorrectly recorded, or provided in a variety of formats. This makes the process of standardizing the dates laborious, and in some cases impossible.
Large projects are often split into various parts (or ‘Lots’) for contractors to compete for. In the data this is often observed in the project description where a Lot number will be provided (e.g. Lot I, Lot II). These Lots can also be split over different years. While this splitting of the project into Lots reflects the way the contract was out to tender, the way the data is currently recorded makes it very difficult to put the various pieces back together for a given project. This splitting also introduces ambiguity into the costs for those projects. Is the estimated cost for the entire project or for that lot specifically? Is the actual cost for this year only, or for work across multiple years?
Due to the aforementioned data quality issues, in depth analysis of the data is difficult and is at particular risk of leading to misleading conclusions (GIGO). As a result, we have limited the analysis of the data to high level aggregates. Below are some of these observations.
One of the first trends that emerges from the procurement data is that spending on procurement has declined significantly since 2011 and 2012. Total procurement spending has fallen by around 44% from over €58 million in 2012 to just under €33 million in 2014. This fall is obvious both at an absolute level (see Chart 2) and on a per project level (see Chart 3)
Chart 2 – Total Procurement Spend by Municipality and Year
Chart 3 – Average Spend per Contract by Municipality and Year
Many of the procurements provided in the data provide a source for the contract funding, whether the local municipal budget, donations, the national budget or some combination of these sources. Chart 4 shows the breakdown of procurement spending by funding source from 2010 to 2014.
Chart 4 – Procurement Spending by Funding Source
Procurement data is also provided with a flag as to what type of contract the procurement is. This can be a contract for work (i.e. construction), the supply of food and beverages, or a range of other services. The breakdown in spending based on the type of contract is provided in Chart 5.
Chart 5 – Procurement Spending by Contract Type
Procurement data can be extremely valuable resource in any country, and not just for the fight against corruption. It can provide important insights for the citizenry into how public funds are being spent, how much the government is paying for its projects, and even provide incentives for increased competition for government contracts. All these things improve confidence in government activities and encourage more competitive tender processes.
However, for the procurement data being collected to be a valuable resource, the data needs to meet some basic standards. When procurement data is lacking certain data points or the figures provided are unreliable, using the data becomes exponentially more difficult, and often makes it impossible for analysts to draw meaningful conclusions from.
Having spent a significant time analyzing procurement data, Open Data Kosovo would make the following recommendations for future collections of procurement data to ensure full transparency:
Expanding on the notion of linking datasets above, one of the most effective ways to increase the usefulness and power of data is to link existing datasets together. Data or Record Linkage is one of the emerging areas of data science and machine learning due to the valuable insights that can be revealed from the joining of existing data. When considering the the collection of a new dataset, one of the most important decisions to be made is how that new dataset should be linked with any existing datasets.
In the case of procurement data, there are several existing databases and data points that would greatly improve the utility of the data. Linking to the business register would help to not only standardize company names, but provide the ability for future researches to look beyond company names to the people who benefit from the economic activities of the company. Linking to the business cases for each procurement would not only provide richer information about the project, but also information on who was involved in creating the business case and who signed off on it. These linkages and others that may not even be apparent at present could prove to be of great importance in the future.