[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Performance questions: workspace_size default value and temp file direct
From: |
Stefan Tzeggai |
Subject: |
Performance questions: workspace_size default value and temp file directory |
Date: |
Fri, 15 Mar 2013 12:57:46 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 |
Hi everybody and thanks for this powerful piece of free software.
I use GNU pspp 0.7.9 (Fri Jun 29 19:31:48 UTC 2012) to batch convert CSV
to SAV files. The script basically does
GET DATA /TYPE=TXT
VARIABLE LABELS
VALUE LABELS
SAVE OUTFILE /COMPRESSED
My "bigger" CSV files are between 100MB and 1GB in filesize, 300
columns, 3000000 rows, mostly numerics. PSPP performance is pretty bad
on the big files. One single CPU core uses only 20%, top's wait flickers
up to 20%wa.
I started to investigate solutions and came up with these questions:
SET WORKSPACE=workspace_size
The maximum amount of memory that PSPP will use to store data being
processed. If memory in excess of the workspace size is required,
then PSPP will start to use temporary files to store the data.
Setting a higher value will, in general, mean procedures will run
faster, but may cause other applications to run slower. On platforms
without virtual memory management, setting a very large workspace
may cause PSPP to abort.
1. Question: This is the amount of in BYTES? Any more recommendation on
this setting? Will the amount be reserved on demand (a bit more, a bit
more, a bit more) while processing or fully as soon as the command is
executed?
What is the default value and how can I query the present setting? "SHOW
workspace;" did not work.
When I set workspace=268435456 (256mb) the process uses 100% CPU and IO
wait is down. So it is an approach for more performance.
When I provide a low WORKSPACE, the disk IO increases. Where are these
files stored? I could not find any hints in the documentation and I
could not see and files being created in /tmp? Is there an option to set
this directory?
Any more ideas on performance? Can SAVE output be piped to zip-command
directly, so some more disk IO could be saved?
Many thanks in advance,
Steve
- Performance questions: workspace_size default value and temp file directory,
Stefan Tzeggai <=
Re: Performance questions: workspace_size default value and temp file directory, John Darrington, 2013/03/17