Correlates of Representation Errors in Internet Data Sources for Real Estate Market

Maciej Beręsewicz


In the article, we focus on detecting correlates of the selection mechanism that underlies Internet data sources for the secondary real estate market in Poland and results in representation errors (frame and selection errors). In order to identify characteristics of properties offered online we link data collected from the two largest advertisements services in Poland and the Register of Real Estate Prices and Values, which covers all transactions made in Poland. Quarterly data for 2016 were linked at a domain level defined by local administrative units (LAU1), the urban/rural distinction and usable floor area (UFA), categorized into four groups. To identify correlates of representation error we used a generalized additive mixed model based on almost 5,500 domains including quarters. Results indicate that properties not advertised online differ significantly from those shown in the Internet in terms of UFA and location. A non-linear relationship with the average price per m2 can be observed, which diminishes after accounting for LAU1 units.
Author Maciej Beręsewicz (WIiGE / KS)
Maciej Beręsewicz,,
- Department of Statistics
Journal seriesJournal of Official Statistics, ISSN 0282-423X, e-ISSN 2001-7367, (N/A 100 pkt)
Issue year2019
Publication size in sheets1
Keywords in Polishbig data, błędy nielosowe, INLA
Keywords in Englishbig data, non-ignorable missing data, representation error, self-selection error, INLA
ASJC Classification2613 Statistics and Probability
Languageen angielski
Score (nominal)100
Score sourcejournalList
ScoreMinisterial score = 100.0, 02-04-2020, ArticleFromJournal
Publication indicators WoS Citations = 0; Scopus SNIP (Source Normalised Impact per Paper): 2018 = 1.02; WoS Impact Factor: 2018 = 0.837 (2) - 2018=0.934 (5)
Citation count*1 (2020-09-11)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?