[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Maposmatic-dev] daily update stats

From: David Decotigny
Subject: Re: [Maposmatic-dev] daily update stats
Date: Thu, 07 Jan 2010 15:01:40 +0100
User-agent: Thunderbird (X11/20090817)


Jeroen van Rijn wrote:
On Thu, Jan 7, 2010 at 09:58, David Decotigny <address@hidden> wrote:
All in all, that's roughly 60% penalty on both sides.

We'll hit a major critical show-stopper when osm2pgsql reaches 24h to
complete. Any bet on /when/ it /will/ happen ? :) Any offer for a higher-end
hosting ? :)

Hello David,

A 60% penalty is substantial enough to try and improve the influence
of running these things concurrently. tells me that
the daily update diffs mean osm2pgsql is being run in slim mode, which
should mean that the daily diffs themselves could be split into chunks
after downloading them, and then running osm2pgsql on the resulting
smaller planet diffs. These could then be scheduled at times of lower
load, with a deadline to start any remaining updates if not completed
by then to assure all updates finish in time.

From what I understood, you are proposing to split the diff updates into chunks, and to schedule the renderings inbetween the processing of these chunks, effectively serializing things "manually" in order to control the impact of the renderings on the diff update.

The idea is nice, indeed, it could allow us to survive a little longer. But, still, I'm afraid this solution could have a significant overhead on the diff updates (ie. overhead of parsing the diff update, splitting it, etc.), and furthermore, doing so would remove the benefit of having 2 CPUs available, not to mention the pain to implement it (need to synchronize django with the diff update, with all the mess related to fault-tolerance when a process crashes, etc.).

For the same strategy, another, lighter (imho), solution I was thinking of, was to keep the parallelism we have, but to control it: regulate the flow of renderings so that we have lower than 60% penalty on the diff updates. That is, when the rendering queue is populated, we don't constantly render the maps while the diff update is running (that's what we do now). Instead, we control when renderings are allowed or not (think of some "fluid" scheduling technic), while osm2pgsql runs till completion. That way, we don't have to bother about osm2pgsql (it runs continuously), but we do regulate the renderings so that the overhead they incur on the diff update is controlled and moderate.

But both solutions have their limit: at some point, the diff update, even alone on the machine, will require 24h to process, based on the assumption that OSM gains in popularity. So, at best we will eventually not be able to render anything, and at worst, we will even not be able to update the DB... Of course, this will happen later with the strategy above, than if we keep the current scheme. But this will eventually happen, these solutions will just allow us to survive a few weeks/months longer. That's the main reason why I would recommend some "easy" technical implementation if we decide to adopt this strategy in the meantime.

In the longer run, either we find the correct way to tune the whole system (pgsql, nice, etc.) so that we significantly reduce the pain it takes to run the diff updates. Or we enjoy a higher-end machine. Or we optimize osm2pgsql and/or the DB indexes in postgis. Or all of these options.

While I don't have higher-end hosting to offer, I'd be more than happy
to investigate tuning the update process on my local development
server, and submit patches and findings where applicable. I'll be
installing a copy of the mapsosmatic codebase this weekend as it is,
once I have it up and running I'll start paying attention to what's
what as far as these updates are concerned.

That is, is the contention for disk i/o slowing things down, is it
that osm2pgsql dominates the cpu? What happens when we change the
priority of the update and/or rendering tasks, and so on. It may take
me some time to get down and dirty with this codebase, as it's new to
me, but I hope to be of some use to the project in due time. ;)

To answer your first question, I didn't personnally investigate. But I have the intuition it's either i/o-based, or lacking some index to speed things up, or inefficiently serially sending several queries that could be grouped. Having more RAM should help anyhow (imho). The OSM people would probably know a lot better on that subject, and I'd be interested to hear on that.

As for the 2nd point, you have first to follow the instructions in the INSTALL file for ocitysmap. We recommend using postgres 8.3. These instructions have been followed several times by several people running ubuntu jaunty, karmic, and debian sid (both 32 and 64 bits). Then, you follow the INSTALL file in maposmatic.

The box in question is an AMD Athlon64 X2 6000 (@ stock 3GHz), with
4Gb DDR2, my old workstation now converted to server, basically.

I take it you've already looked into the following (from the same page):

Large imports into PostGIS are very sensitive to maintenance and
monitoring configuration: it is smart to increase the value of
checkpoint_segments so that autovacuum tasks don't slow down imports."


We are very interested in any postgres/system parameter we could tune.

Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]