Reverse-engineering price lists for fun and profit

Margin-scaled price sheets are vulnerable to a rounding exploit based on defaults and social factors, and in certain circumstances this allows you to reverse-engineer the markup or margin level purely from the price data itself.

Reverse-engineering price lists for fun and profit

Margin-scaled price sheets are vulnerable to a rounding exploit based on defaults and social factors, and in certain circumstances this allows you to reverse-engineer the markup or margin level purely from the price data itself.

Suppose you're a customer looking to purchase widgets from a broker. The broker sends you an Excel spreadsheet detailing the available widgets and their prices. You're pretty sure the broker is adding some factor to her prices to cover costs, and you realise you'd be in a better negotiating position if you knew how much she was inflating the price.

You're reasonably sure she's not doing anything sophisticated, and suspect she's probably done something simple in a spreadsheet like:

customerPrice = brokerPrice * 1.08

This would mark prices up by 8%. It's not quite the same thing as adding 8% margin, which would be brokerPrice / 0.92, but it turns out that with a couple of assumptions this kind of price inflation is pretty easy to detect.

All your margin are belong to us

To find out how your price data has been manipulated, you're going to need a few things:

  1. A reasonably large data set (around 300 unique prices or more) that has a single margin/markup level applied.
  2. Knowledge of the rounding algorithm used: typically round but sometimes ceiling or floor.
  3. Prices provided to a specified number of decimal places. For financial amounts in most currencies, this would be two.
  4. Knowledge or a reasonable expectation that the margin level has been set by a human and therefore has a similar precision, i.e. something like two decimal places.

The exploit works on the fact that the algorithm to inflate the price is based on rounding. Here's what the broker's spreadsheet might look like in Excel:

Broker: "Yay ima add 7 percent and be mint"

So that's all great, but if you're a customer and just receive the £533.05 figure, how could you possibly know the supplier price and markup? A supplier price at £512.55 and markup at 4% would give the same figure of £533.05, and you would have no idea which markup was correct.

But at scale, you do. Because at 8% markup, there is literally no way to get a price of £533.05.

Huh, they didn't add 8%

Whether the price was £493.56 or £493.57, it's impossible for your broker to have added a flat 8% markup and a price of £533.05. This illustrates the technique pretty well. It's based on currency conventions, how people usually use Excel, and the fact that humans rarely pick arbitrary decimal values to scale prices.

So if you ever see £533.05 as a unit price, you know the supplier isn't using 8% markup. With a large enough price list you can apply this same logic repeatedly to rule out most markup (or margin) levels down to one or two decimal places.

Decompose JS

I've implemented the algorithm in JavaScript and NodeJS, find it on GitHub. Use it as follows:

node decompose.js {filename.csv} {columnNumber}

As it's only a couple of lines of code it's reasonably self-explanatory, though I will go through and add a few code comments at some point. I may also add a CLI parameter library so that you don't have to tweak the script itself to deal with alternative rounding algorithms, additional precision, etc.

There are three sets of sample CSV data in the repo for if you want to try it out.

Defending your price lists

If you're in the unlucky position of being a broker, what can you do to avoid your customers discovering your supplier's prices?

  1. Use more than one margin level
  2. Avoid making it easy to extract large data sets with your prices in them
  3. Randomly add small values to each price
  4. Clamping to *.99, or *.49 will likely prevent the exploit working effectively
  5. Test with the exploit code defined above to find out if your data is vulnerable

Further analysis

I'm unlikely to spend much more time on this, it was as much a mathematical curiosity as anything and I don't have any practical case studies that demonstrate its value as an "exploit". However, I am left with a couple of questions:

  • Does additional precision (say, 9dp?) make it easier or harder to discover the scale factor?
  • Can the algorithm be extended to do some kind of categorisation and automatically discover multiple markup levels in a file?
  • Can the algorithm be tweaked to rank the probability of different margin levels being used?
  • Can it figure out rounding algorithms automatically?

I may get around to investigating these; a day stuck inside with an ill child might do it, especially if the ill child is intent on watching a Hey Duggee marathon and I need something to stop me from being slowly lobotomised!