discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSMutableString -initWithFormat appends to existing text


From: David Chisnall
Subject: Re: NSMutableString -initWithFormat appends to existing text
Date: Thu, 19 Apr 2018 09:12:30 +0100

On 19 Apr 2018, at 07:56, Mick Bert <micbert75@gmail.com> wrote:
> 
> 2018-04-18 17:15 GMT+02:00 David Chisnall <gnustep@theravensnest.org>:
>> On 18 Apr 2018, at 10:23, Mick Bert <micbert75@gmail.com> wrote:
>>> 
>>> Is it the preferable way?
>> 
> 
>> That depends a bit on what you mean by ‘preferable’.  If you mean
>> ‘simpler’ or ‘cleaner code’, then I don’t think so.  If you mean
>> ‘faster’ or ‘lower memory’, then you may find that using NSString’s
>> -getBytes:… method into an on-stack buffer then write that using the
>> lower-level C / C++ APIs.  In most cases, the overhead of the I/O is
>> going to be sufficient that this won’t make a noticeable difference,
>> but if you’re processing a lot of data and have NVMe storage then
>> you might consider this.
> 
> 
> Sometimes I have to process files several dozen of GByte large, 200
> bilions of lines (it took a couple of minutes just to cont them :-D ).

That implies that your storage is quite slow.  On modern NVMe storage, I’d 
expect to be able to process a few TBs in that time.  As such, it’s not worth 
optimising the CPU side too much, because you’re mostly waiting for I/O.  

> I have successfully written perl scripts to process them, and it was
> interesting. Now I would like to do it in a gnustep tool, just to
> practice with base classes, and the language itself.

It’s probably good practice, but don’t expect to see much of a speedup, if any 
(though you might see the CPU load and temperature go down a bit).

>>> Are there any other class to work with
>>> text-oriended files?
>> 
>> Note that text-oriented files don’t really exist as an abstraction
>> on most *NIX systems (though the C standard still likes to pretend
>> that they do).  GNUstep / Cocoa don’t provide useful abstractions for
>> this, though the C++ standard streams library does (not very good
>> ones though, so I don’t really suggest using it).  David
> 
> Here I don't follow you any more. Whenever I have to write information
> in a file, I always prefer readable form, so that I can access them
> with a text editor, without the need of any particular tool (of any
> particular version). At least as long as performance are concernd
> (i.e. randomly seeking is needed, or syntax interpretation
> is computationally too  heavy)

Some file systems have a concept of a text file as a distinct thing from a 
binary file.  The low-level APIs handle things like character set conversions, 
breaking into records, and so on.  The C and Windows APIs and, to a lesser 
extent, even POSIX have some vestiges of this, but most modern systems don’t 
differentiate files at that layer: they’re just files and it’s up to the reader 
to understand them.

Off topic now, but note that one of the down sides of not having a format that 
supports random access is that it is very difficult to process in parallel.  
Given your system, I’d also recommend looking at compressing the data on disk.  
For the traces that our processor generates, I moved from a human-readable text 
representation to a structured binary format that can be stored xz-compressed.  
Even on machines with relatively fast (non-NVMe) flash storage, the code that 
reads the binary format and generates human-readable text can stream a 
moderately large trace (around 100GB in text format) to /dev/null faster than 
cat can do the same with the text file.  In both cases, disk I/O is the 
limiting factor.  The xz decompression can run in a separate thread to the 
processing (so can run on ahead filling up a buffer to process as fast as the 
disk can give you data, until the consumer catches up) and on a moderately fast 
CPU can decompress faster than the SSD can give you data.

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]