Predicting french real estate sales from open data
27 Apr 2019
PostgreSQL
Data
MachineLearning
As the french government recently opened their real
(real estate sales data),
I was curious about trying to make accurate value predictions based on simple features such as geolocation, surfaces…
We will consider two kinds of product:
Apartments: where the location and the surface will influence the value.
Houses: where the location, the built surface and the ground surface will influence the value.
Apartments
Lets consider we want to predict the sale price on a 29 square meters apartment located at 31st rue Poissonnière, Paris:
The first thing we want here is to query the database to geolocate the address to get its coordinates.
To achieve this, we will query the Nominatim API:
Let see if we correctly located the address:
We now need to extract the nearest sales from our database to see if we can make a correlation between the feature we’ve got and the transaction value:
In order to choose the algorithm we’ll use for the prediction, we simply plot the values as:
As we can see, the graph looks pretty linear, a simple linear regression model will do the job:
It’s time to predict our value!
The model predicted a sale price of €309,027.
We now add the sales that occured within the neighborhood onto the map:
Houses
As previosuly mentioned, for houses we will use un extra feature (the ground surface) to compute our model.
The whole stuff could be re-written as:
We’re now able to predict the value of a 100 square meters house on a 900 square meters lot
located at 172 Rue des Candinières, 34160 Castries
Displaying the results:
Looks like its value is about €360,540.
We display the neighborhood to have a deeper look:
Going further
The algorithm here is quite simple and takes really basic features as inputs.
There’s a room for improvements:
detect neighborhood types within country side areas (building density…)
qualify the dataset (check for swimming pools from satellite layers, distance from the sea…)