[Maposmatic-dev] [PATCH maposmatic 1/9] Improve the file cleanup mechani

maposmatic-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Maposmatic-dev] [PATCH maposmatic 1/9] Improve the file cleanup mechani

From:	Maxime Petazzoni
Subject:	[Maposmatic-dev] [PATCH maposmatic 1/9] Improve the file cleanup mechanism
Date:	Sun, 24 Jan 2010 14:43:15 +0100

Previously, when the rendering directory was over the defined threshold,
files where removed progressively, oldest first, to make up some space.
No information was kept about jobs whose files were removed, making it
harder to keep track of valid jobs with files available.

This new mechanism brings the following features:

  * files are now sorted by content modification time and not creation
    time (a simple chmod could mess up the order);
  * when a file needs to be removed, all files from its parent job are
    removed, and the job's has_files flag is set to false. It does not
    make a lot of sense to keep partial renderings anyway (the map
    without the index? in different formats? Not good.)
  * thumbnails of jobs in the database are always kept, so we can
    display what the map looked like even if we don't have the files
    around anymore;
  * if no parent job can be found, it's an orphaned file and can be
    safely removed. Files starting with a '.' are preserved though
    (.htaccess for example).
  * better logging or the cleanup process.
---
 scripts/maposmaticd               |   74 ++++++++++++++++++++++++++++++-------
 www/maposmatic/models.py          |   70 +++++++++++++++++++++++++++++++----
 www/media/style.css               |    4 ++
 www/templates/maposmatic/job.html |    2 +-
 4 files changed, 127 insertions(+), 23 deletions(-)

diff --git a/scripts/maposmaticd b/scripts/maposmaticd
index 5f6cdbd..064690f 100755
--- a/scripts/maposmaticd
+++ b/scripts/maposmaticd
@@ -148,25 +148,71 @@ def render_job(job):
             job.end_rendering(resultmsg)
             return
 
-# This function checks that the total size of the files in
-# RENDERING_RESULT_PATH does not exceed 80% of
-# RENDERING_RESULT_MAX_SIZE_GB. If it does, the function removes as
-# many files as needed, oldest first
 def cleanup_files():
-    files = [ os.path.join(RENDERING_RESULT_PATH, f) for f in 
os.listdir(RENDERING_RESULT_PATH)]
-    files = [(f, os.stat(f).st_ctime, os.stat(f).st_size) for f in files]
+    """This cleanup function checks that the total size of the files in
+    RENDERING_RESULT_PATH does not exceed 80% of the defined threshold
+    RENDERING_RESULT_MAX_SIZE_GB. If it does, files are removed until the
+    constraint is met again, oldest first, and grouped by job."""
+
+    def get_formatted_value(v):
+        return '%.2f MiB' % (v/1024.0/1024.0)
+    def get_formatted_details(saved, size, threshold):
+        return 'saved %s, now %s/%s' % \
+                (get_formatted_value(saved),
+                 get_formatted_value(size),
+                 get_formatted_value(threshold))
+
+    files = [os.path.join(RENDERING_RESULT_PATH, f)
+                for f in os.listdir(RENDERING_RESULT_PATH)
+                if not f.startswith('.')]
+    files = map(lambda f: (f, os.stat(f).st_mtime, os.stat(f).st_size), files)
+
+    # Compute the total size occupied by the renderings, and the actual 80%
+    # threshold, in bytes
     size = reduce(lambda x, y: x + y[2], files, 0)
     threshold = 0.8 * RENDERING_RESULT_MAX_SIZE_GB * 1024 * 1024 * 1024
+
+    # Stop here if we are below the threshold
     if size < threshold:
         return
-    files.sort(lambda x, y: cmp(x[1], y[1]))
-    for f in files:
-        os.remove(os.path.join(RENDERING_RESULT_PATH, f[0]))
-        size -= f[2]
-        LOG.debug("remove '%s', %f GB consumed over a %f GB threshold" % \
-                            (f[0], (size / 1024 / 1024 / 1024), (threshold / 
1024 / 1024 / 1024)))
-        if size < threshold:
-            break
+
+    LOG.info("%s consumed for a %s threshold. Cleaning..." %
+            (get_formatted_value(size), get_formatted_value(threshold)))
+
+    # Sort files by timestamp, oldest last, and start removing them by
+    # pop()-ing the list
+    files.sort(lambda x, y: cmp(y[1], x[1]))
+
+    while size > threshold:
+        if not len(files):
+            LOG.error("No files to remove and still above threshold! 
Something's wrong!")
+            return
+
+        # Get the next file to remove, and try to identify the job it comes
+        # from
+        f = files.pop()
+        name = os.path.basename(f[0])
+        job = MapRenderingJob.objects.get_by_filename(name)
+        if job:
+            removed, saved = job.remove_all_files()
+            size -= saved
+
+            # If files were removed, log it. If not, it only means only the
+            # thumbnail remained, and that's good.
+            if removed:
+                LOG.info("Removed %d files from job #%d (%s)." %
+                         (removed, job.id, get_formatted_details(saved, size, 
threshold)))
+
+
+        else:
+            # If we didn't find a parent job, it means this is an orphaned
+            # file, and we can safely remove it to get back some disk space.
+            os.remove(f[0])
+            saved = f[2]
+            size -= saved
+            LOG.info("Removed orphan file %s (%s)." %
+                     (name, get_formatted_details(saved, size, threshold)))
+
 
 if not os.path.isdir(RENDERING_RESULT_PATH):
     LOG.error("ERROR: please set RENDERING_RESULT_PATH ('%s') to an existing 
directory" % \
diff --git a/www/maposmatic/models.py b/www/maposmatic/models.py
index dc3b3ce..7c871d4 100644
--- a/www/maposmatic/models.py
+++ b/www/maposmatic/models.py
@@ -42,12 +42,29 @@ class MapRenderingJobManager(models.Manager):
     # has its thumbnail present.
     def get_random_with_thumbnail(self):
         fifteen_days_before = datetime.now() - timedelta(15)
-        maps = 
MapRenderingJob.objects.filter(status=2).filter(submission_time__gte=fifteen_days_before).order_by('?')[0:10]
+        maps = (MapRenderingJob.objects.filter(status=2)
+            .filter(submission_time__gte=fifteen_days_before)
+            .order_by('?')[0:10])
         for m in maps:
             if m.get_thumbnail():
                 return m
         return None
 
+    def get_by_filename(self, name):
+        """Tries to find the parent MapRenderingJob of a given file from its
+        filename. Both the job ID found in the first part of the prefix and the
+        entire files_prefix is used to match a job."""
+
+        try:
+            jobid = int(name.split('_', 1)[0])
+            job = MapRenderingJob.objects.get(id=jobid)
+            if name.startswith(job.files_prefix()):
+                return job
+        except (ValueError, IndexError):
+            pass
+
+        return None
+
 SPACE_REDUCE = re.compile(r"\s+")
 NONASCII_REMOVE = re.compile(r"[^A-Za-z0-9]+")
 
@@ -57,6 +74,7 @@ class MapRenderingJob(models.Model):
         (0, 'Submitted'),
         (1, 'In progress'),
         (2, 'Done'),
+        (3, 'Done w/o files')
         )
 
     maptitle = models.CharField(max_length=256)
@@ -98,6 +116,7 @@ class MapRenderingJob(models.Model):
                              
self.startofrendering_time.strftime("%Y-%m-%d_%H-%M"),
                              self.maptitle_computized())
 
+
     def start_rendering(self):
         self.status = 1
         self.startofrendering_time = datetime.now()
@@ -116,7 +135,7 @@ class MapRenderingJob(models.Model):
         return self.status == 1
 
     def is_done(self):
-        return self.status == 2
+        return self.status == 2 or self.status == 3
 
     def is_done_ok(self):
         return self.is_done() and self.resultmsg == "ok"
@@ -137,24 +156,59 @@ class MapRenderingJob(models.Model):
         return os.path.join(www.settings.RENDERING_RESULT_PATH, 
self.files_prefix() + "_index." + format)
 
     def output_files(self):
+        """Returns a structured dictionary of the output files for this job.
+        The result contains two lists, 'maps' and 'indeces', listing the output
+        files. Each file is reported by a tuple (format, path, title, size)."""
+
         allfiles = {'maps': [], 'indeces': []}
 
         for format in www.settings.RENDERING_RESULT_FORMATS:
             # Map files (all formats but CSV)
-            if format != 'csv' and 
os.path.exists(self.get_map_filepath(format)):
-                allfiles['maps'].append((format, self.get_map_fileurl(format),
-                    _("%(title)s %(format)s Map") % {'title': self.maptitle, 
'format': format.upper()}))
+            map_path = self.get_map_filepath(format)
+            if format != 'csv' and os.path.exists(map_path):
+                allfiles['maps'].append((format, map_path,
+                    _("%(title)s %(format)s Map") % {'title': self.maptitle,
+                                                     'format': format.upper()},
+                    os.stat(map_path).st_size))
+
             # Index files
-            if os.path.exists(self.get_index_filepath(format)):
-                allfiles['indeces'].append((format, 
self.get_index_fileurl(format),
-                    _("%(title)s %(format)s Index") % {'title': self.maptitle, 
'format': format.upper()}))
+            index_path = self.get_index_filepath(format)
+            if os.path.exists(index_path):
+                allfiles['indeces'].append((format, index_path,
+                    _("%(title)s %(format)s Index") % {'title': self.maptitle,
+                                                       'format': 
format.upper()},
+                    os.stat(index_path).st_size))
 
         return allfiles
 
     def has_output_files(self):
+        """Tells if this jobs still has its output files present in the
+        RENDERING_RESULT_PATH. Their actual presence is checked even if
+        has_files is True."""
+
+        if not self.is_done() or self.status == 3:
+            return False
+
         files = self.output_files()
         return len(files['maps']) + len(files['indeces'])
 
+    def remove_all_files(self):
+        """Removes all the output files from this job, and returns the space
+        saved in bytes (Note: the thumbnail is not removed)."""
+
+        files = self.output_files()
+        saved = 0
+        removed = 0
+
+        for f in (files['maps'] + files['indeces']):
+            saved += f[3]
+            removed += 1
+            os.remove(f[1])
+
+        self.status = 3
+        self.save()
+        return removed, saved
+
     def get_thumbnail(self):
         thumbnail_file = os.path.join(www.settings.RENDERING_RESULT_PATH, 
self.files_prefix() + "_small.png")
         thumbnail_url = www.settings.RENDERING_RESULT_URL + "/" + 
self.files_prefix() + "_small.png"
diff --git a/www/media/style.css b/www/media/style.css
index f24525e..7dd9432 100644
--- a/www/media/style.css
+++ b/www/media/style.css
@@ -269,6 +269,10 @@ table.jobinfo td.info {
   vertical-align: top;
 }
 
+p.nofiles {
+  font-style: italic;
+}
+
 div.mapsearch {
   float: right;
   font-style: italic;
diff --git a/www/templates/maposmatic/job.html 
b/www/templates/maposmatic/job.html
index cd181d6..1e270b3 100644
--- a/www/templates/maposmatic/job.html
+++ b/www/templates/maposmatic/job.html
@@ -64,7 +64,7 @@
       <li>{% trans "Index: " %} {% for file in job.output_files.indeces %}<a 
href="{{ file.1 }}" title="{{ file.2 }}">{{ file.0|upper }}</a>{% if not 
forloop.last %}, {% endif %}{% endfor %}.</li>
     </ul>
     {% else %}
-      {% trans "The generated files are no longer available." %}
+      <p class="nofiles">{% trans "The generated files are no longer 
available." %}</p>
     {% endif %}
     {% endif %}
   </td>
-- 
1.6.3.3.277.g88938c

[Prev in Thread]

Current Thread

[Next in Thread]

[Maposmatic-dev] MapOSMatic daemon rewrite, Maxime Petazzoni, 2010/01/24
- [Maposmatic-dev] [PATCH maposmatic 1/9] Improve the file cleanup mechanism, Maxime Petazzoni <=
  - [Maposmatic-dev] [PATCH maposmatic 2/9] Cornerstones for a new MapOSMatic daemon, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 3/9] Update the .gitignore list, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 4/9] Add MAPOSMATIC_LOG_LEVEL to the environment wrapping, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 5/9] Merge the StandaloneMapOSMaticDaemon into the base MapOSMaticDaemon class, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 6/9] Revamp the job renderer, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 7/9] Rework the daemon to use the new JobRenderers, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 8/9] Frequency parameters passing improvement, Maxime Petazzoni, 2010/01/24
    - [Maposmatic-dev] [PATCH maposmatic 9/9] Provide a map_areas prefix to the TimingOutJobRenderer, Maxime Petazzoni, 2010/01/24

Prev by Date: [Maposmatic-dev] MapOSMatic daemon rewrite
Next by Date: [Maposmatic-dev] [PATCH maposmatic 2/9] Cornerstones for a new MapOSMatic daemon
Previous by thread: [Maposmatic-dev] MapOSMatic daemon rewrite
Next by thread: [Maposmatic-dev] [PATCH maposmatic 2/9] Cornerstones for a new MapOSMatic daemon
Index(es):
- Date
- Thread