Here's what I learned after combing through 1000+ sales records

Sorry for the clickbait title haha. I recently skimmed through around 1400 hand-filled-out sales disclosure records in a relatively well-organized county, in order to get a feel for market pricing.

This quest started after trying to use Zillow and to gather sales data in order to derive informed offers. I found that there were problems with the alleged previous sale prices on those platforms when I dug into the actual sales records behind the data. For example, it would say a parcel sold for $X, but I looked it up, and it was actually a sale with multiple parcels, and Zillow/Realtor would just divide the total sale evenly by the number of parcels in the sale, even though the parcels were wildly different sizes. This is also why so many small vacant lots are reportedly sold for half a million dollars, having been paired with large single-family home parcels.

I decided to dig deeper, and evaluate over 1000 county sales records over the past decade, which these real estate platforms apparently use for their data. What an eye-opener it was. Have any of you gathered the intuition to guess roughly (without much context) how many of the 1411 sales involved only one parcel? Mind you I filtered out all sales from districts within any city limits. The answer is 555 of 1411. That's fewer than 2 out of 5. Maybe you're not surprised, but I was. 490 of the sales involved 2 parcels, and 168 involved 3. The remainder were sales involving more than 3 (except for one sale disclosure, where the recorder wrote "0" for some reason, even though a parcel id was included).

It would have been easy enough to get accurate comps by just omitting the multiple-parcel sales, but the problem didn't stop there. I skimmed over the sales descriptions, and whoa was it all over the place. Lots of times people are only buying a small interest in a property, or a lot of times, the recorder says there's only one parcel in the sale, but they write down a price that reflects a different property that was part of the same sale, and they wrote it in a different sales disclosure (also with only one reported parcel). Or the sale price was just paying off a contract, where it didn't indicate the previous equity. Or the sale involved eminent domain, where part of the sale was tied in some elusive description of "damages". Or the "sale" was part of a divorce settlement. Or the sale was a neighbor's exchange of parcels, where one party paid an additional amount on top since the property they were getting was more valuable. Or it was part of some kind of even more complicated exchange of personal property and land. And there's the host of all sorts of sales under duress, such as divorce, estate, and even tax sales (where the sale disclosure is recorded after the redemption period is up). The number of "clean" single-parcel sales ended up being very small compared to the number of sale disclosures.

tl;dr: I learned that getting accurate vacant land comps from sales records, especially on sites like Zillow and is a bit of a crapshoot. I'd probably only rely on those under very specific circumstances, such as a particular newer subdivision with a whole bunch of nearly-identical lots, and you're sure that the sales were for individual lots, and the sales were all very recent.

I love posts like this, because they dispel all the oversimplified “solutions” people love to talk about when addressing a very complicated issue like valuing land.

I wouldn’t say I’m surprised by your findings, but I do find them very fascinating, as I’ve never had the patience to go through that many records individually.

This is also part of why most people don’t even bother with sold comps on land, because even when they exist, they’re just not reliable.

Of course, listed properties aren’t that reliable either, but they seem to have fewer problems lurking beneath the surface like this. Usually if there’s a wrinkle, you can find it without looking too hard, whereas those issues you pointed out in the sold comps sound like they practically require a title search to find them!

Thanks again for posting this. I wish you had also included a magic bullet to solve the problem, but I suppose we’ll have to keep looking. 😂

1 Like

Very true, @oranjoose What I have started to do when I've noticed comps that just seemed off when looking at Zillow is, I go directly to the assessor website and start looking for similarly featured properties in the near vicinity as the subject property, and then I'll just start clicking on at all the ones I find to see if they were a.) recently sold and b.) at what price they were sold. It has helped me get some much more accurate data as to true valuations. I really like Redfin when possible because they actually use straight assessor data from what I understand. But unfortunately, Redfin covers such a tiny percentage of the United States and definitely does not cover 99% of any area that is more rural.

1 Like

Wow. I've never used sold comps on zillow or realtor because just glancing over them I've always known something was up. Prices absolutely all over the place. 150 acres for $500??? What??? This explains it well and I will continue to only use for sale comps. Thank you for that informative post!

To thicken the plot a bit, I gathered all the currently-for-sale data points in a rural area of a county and calculated a trend equation compared against acreage. The confidence of this trend has an r-squared value around 60%, which isn't spectacular, but shows clear correlation. I then compared the data against the approximate same number of actual-sold most recent data points within the same rural area of the same county, the oldest sales dating back to 2016. I filtered out super-obvious outliers (sales of $0 for example), as well as multiple-parcel sales. Using the same kind of equation against acreage, the calculated trend had an expected lower confidence with an r-squared value closer to 40%, but still a clear correlation.

I then used this information to extrapolate expected for-sale market prices at a range of acreage values, and compared that against extrapolated prices based on actual-sales trends, and the result was somewhat interesting, but not incredibly useful. The data showed that realtors were pricing land in the county at roughly double the actual sales trends on average.

It would be interesting to compare these inconclusive findings against other counties to see if there are any patterns. I feel like maybe I wasted my time with this, but then again, this is much more repeatable for me now, and I've grown a bit more confident in pricing. From here, I think I'm going to gather some more data, figure out a tagging system for common confounding variables (such as water feature, no road access, etc.), and use what I've learned and am learning to quickly get as-close-to-accurate market pricing for any vacant parcel in question. While it is foolish to rely on some kind of automated market price based on complicated data, at this point in my journey (the beginning), I figure it might be even more foolish to rely too much on my own green intuition.


Honestly, I'm in way over my head with the level of statistics you're working with. I love the idea, though, and it's interesting because I ran into a data analysis issue, myself, caused by previous sales involving multiple parcels in a single transaction. A couple months ago I figured out how to run a web scraping tool to grab data on tax delinquent properties in a few different counties from a website open to the public. It generated literally thousands of entries, but it was completely unfiltered so most of them needed to be cut for one reason or another. After the easy stuff like removing improved lots, I wanted to come up with a quick way to numerically score the remaining properties so I could quickly trim the duds (e.g. total back taxes exceeds the current value) and prioritize whatever was left, to optimize spending on mail. So I added some formulas to the spreadsheet (just basic math) to calculate a ratio of the parcel's assessed value (best proxy for actual value available within the data) against total back taxes plus the last sale price, with the thought being that most people don't want to sell a property for less than they acquired it for, generally speaking, so last sale price is some indicator of likely willingness to accept my objectively low offer.

One of the issues I ran into with this approach, though, as you pointed out, was that my target counties had a surprising amount of previous transactions involving multiple parcels, and the county's last sale price was the total paid for all lots in the sale, whether it was 1 or 10; and there was often no indication that the purchase was for more than one property (without pulling and reviewing the deed). Haven't found a good way, yet, to work around having to manually review all the outliers, with the very limited exception being that, for efficiency, I just assumed that any anomalously high purchase prices that took place around 2005 or so were probably just for one parcel, due to the overheated market back then.

In a dream scenario, I envisioned being able to automate the process one day, from scraping the data, analysis, and scoring the best prospects (not that I have any database or software skills sufficient to actually build that). Either way, I now suspect the real world might be too messy and random for that type of automated approach to work. Case in point, after spending weeks longer than I should have trying to build some mega optimized list, I finally mailed about twice as many owners as I have with my previous, less scientifically arrived at mailings...and so far it's flopped. Several calls from some dud properties that slipped through the cracks. Made three offers to some legitimate prospects that responded, but all 3 were rejected or ignored. The only remaining active prospect from that mailing at the moment is an elderly guy that lives in Europe, hardly speaks any English, and doesn't have a working computer, so my interactions with him have unfortunately been limited to transatlantic snail mail (I understood he said "send me an offer by post" in the first voicemail he left), and one very painful phone conversation this afternoon actually (speaking very slowly and enunciating way more than normal). So far, slow going coming out of the gate with this master plan. :-)

1 Like

good for you doing all that work to provide this insight! really appreciate you sharing your findings. The problem you describe, which we all intuit, is one of the reasons I develop relationships with local realtors whereever i'm buying.


@oranjoose My brain just exploded trying to follow along with this.

1 Like


Thanks for all the information. If you don't mind me asking: Where did you find those sales disclosure records? I have found some states that have those records online (like Indiana: Or did you get those records from the county? Do you know if it's common for those records to be available online?

@johannes You might not prefer to hear that I manually went through individual sales records accessible from the county's GIS, sorry.

This county (as we as many others in my experience) has a tool that allows you to export a spreadsheet of real estate sale events. I used that to search each scan of an actual signed sale record document to get the nitty gritty details.


Got it. Thanks for clarifying.