gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] r18222 - gnunet


From: gnunet
Subject: [GNUnet-SVN] r18222 - gnunet
Date: Sat, 19 Nov 2011 21:26:33 +0100

Author: grothoff
Date: 2011-11-19 21:26:33 +0100 (Sat, 19 Nov 2011)
New Revision: 18222

Removed:
   gnunet/RATIONALE
Log:
moved to drupal

Deleted: gnunet/RATIONALE
===================================================================
--- gnunet/RATIONALE    2011-11-19 19:09:10 UTC (rev 18221)
+++ gnunet/RATIONALE    2011-11-19 20:26:33 UTC (rev 18222)
@@ -1,316 +0,0 @@
-This document is a summary of the changes made to GNUnet for version
-0.9.x (from 0.8.x) and what this major redesign tries to address.
-
-First of all, the redesign does not (intentionally) change anything
-fundamental about the application-level protocols or how files are
-encoded and shared.  However, it is not protocol-compatible due to
-other changes that do not relate to the essence of the application
-protocols.  This choice was made since productive development and
-readable code were considered more important than compatibility at
-this point.
-
-The redesign tries to address the following major problem groups
-describing isssues that apply more or less to all GNUnet versions
-prior to 0.9.x:
-
-
-PROBLEM GROUP 1 (scalability):
-* The code was modular, but bugs were not.  Memory corruption
-  in one plugin could cause crashes in others and it was not
-  always easy to identify the culprit.  This approach
-  fundamentally does not scale (in the sense of GNUnet being
-  a framework and a GNUnet server running hundreds of 
-  different application protocols -- and the result still
-  being debuggable, secure and stable).
-* The code was heavily multi-threaded resulting in complex
-  locking operations.  GNUnet 0.8.x had over 70 different
-  mutexes and almost 1000 lines of lock/unlock operations.
-  It is challenging for even good programmers to program or 
-  maintain good multi-threaded code with this complexity.
-  The excessive locking essentially prevents GNUnet 0.8 from
-  actually doing much in parallel on multicores.
-* Despite efforts like Freeway, it was virtually 
-  impossible to contribute code to GNUnet 0.8 that was not
-  writen in C/C++.
-* Changes to the configuration almost always required restarts
-  of gnunetd; the existence of change-notifications does not
-  really change that (how many users are even aware of SIGHUP,
-  and how few options worked with that -- and at what expense
-  in code complexity!).
-* Valgrinding could only be done for the entire gnunetd
-  process.  Given that gnunetd does quite a bit of 
-  CPU-intensive crypto, this could not be done for a system
-  under heavy (or even moderate) load.
-* Stack overflows with threads, while rare under Linux these
-  days, result in really nasty and hard-to-find crashes.
-* structs of function pointers in service APIs were
-  needlessly adding complexity, especially since in 
-  most cases there was no actual polymorphism
-
-SOLUTION:
-* Use multiple, lously-coupled processes and one big select
-  loop in each (supported by a powerful util library to eliminate
-  code duplication for each process).  
-* Eliminate all threads, manage the processes with a 
-  master-process (gnunet-arm, for automatic restart manager) 
-  which also ensures that configuration changes trigger the 
-  necessary restarts.
-* Use continuations (with timeouts) as a way to unify
-  cron-jobs and other event-based code (such as waiting
-  on network IO).
-  => Using multiple processes ensures that memory corruption
-     stays localized.  
-  => Using multiple processes will make it easy to contribute
-     services written in other language(s). 
-  => Individual services can now be subjected to valgrind
-  => Process priorities can be used to schedule the CPU better
-  Note that we can not just use one process with a big
-  select loop because we have blocking operations (and the
-  blocking is outside of our control, thanks to MySQL,
-  sqlite, gethostbyaddr, etc.).  So in order to perform
-  reasonably well, we need some construct for parallel
-  execution.  
-
-  RULE: If your service contains blocking functions, it
-        MUST be a process by itself.  If your service
-        is sufficiently complex, you MAY choose to make
-        it a separate process.
-* Eliminate structs with function pointers for service APIs;
-  instead, provide a library (still ending in _service.h) API
-  that transmits the requests nicely to the respective
-  process (easier to use, no need to "request" service
-  in the first place; API can cause process to be started/stopped
-  via ARM if necessary).
-
-
-PROBLEM GROUP 2 (UTIL-APIs causing bugs):
-* The existing logging functions were awkward to use and
-  their expressive power was never really used for much.
-* While we had some rules for naming functions, there
-  were still plenty of inconsistencies.
-* Specification of default values in configuration could 
-  result in inconsistencies between defaults in
-  config.scm and defaults used by the program; also,
-  different defaults might have been specified for the
-  same option in different parts of the program.
-* The TIME API did not distinguish between absolute
-  and relative time, requiring users to know which
-  type of value some variable contained and to
-  manually convert properly.  Combined with the
-  possibility of integer overflows this is a major
-  source of bugs.
-* The TIME API for seconds has a theoretical problem
-  with a 32-bit overflow on some platforms which is
-  only partially fixed by the old code with some
-  hackery.
-
-SOLUTION:
-* Logging was radically simplified.
-* Functions are now more conistently named.
-* Configuration has no more defaults; instead,
-  we load a global default configuration file
-  before the user-specific configuration (which 
-  can be used to override defaults); the global
-  default configuration file will be generated 
-  from config.scm.
-* Time now distinguishes between
-  struct GNUNET_TIME_Absolute and
-  struct GNUNET_TIME_Relative.  We use structs
-  so that the compiler won't coerce for us 
-  (forcing the use of specific conversion
-  functions which have checks for overflows, etc.).
-  Naturally the need to use these functions makes
-  the code a bit more verbose, but that's a good
-  thing given the potential for bugs.
-* There is no more TIME API function to do anything
-  with 32-bit seconds
-* There is now a bandwidth API to handle 
-  non-trivial bandwidth utilization calculations
-
-
-PROBLEM GROUP 3 (statistics):
-* Databases and others needed to store capacity values
-  similar to what stats was already doing, but
-  across process lifetimes ("state"-API was a partial
-  solution for that, but using it was clunky)
-* Only gnunetd could use statistics, but other
-  processes in the GNUnet system might have had
-  good uses for it as well
-
-SOLUTION:
-* New statistics library and service that offer
-  an API to inspect and modify statistics
-* Statistics are distinguished by service name
-  in addition to the name of the value
-* Statistics can be marked as persistent, in
-  which case they are written to disk when
-  the statistics service shuts down.
-  => One solution for existing stats uses,
-     application stats, database stats and
-     versioning information!
-
-
-PROBLEM GROUP 4 (Testing):
-* The existing structure of the code with modules
-  stored in places far away from the test code
-  resulted in tools like lcov not giving good results.
-* The codebase had evolved into a complex, deeply
-  nested hierarchy often with directories that
-  then only contained a single file.  Some of these
-  files had the same name making it hard to find
-  the source corresponding to a crash based on 
-  the reported filename/line information.
-* Non-trivial portions of the code lacked good testcases,
-  and it was not always obvious which parts of the code 
-  were not well-tested.
-
-SOLUTION:
-* Code that should be tested together is now
-  in the same directory.
-* The hierarchy is now essentially flat, each
-  major service having on directory under src/;
-  naming conventions help to make sure that
-  files have globally-unique names
-* All code added to the new repository must
-  come with testcases with reasonable coverage.
-
-
-PROBLEM GROUP 5 (core/transports):
-* The new DV service requires session key exchange
-  between DV-neighbours, but the existing
-  session key code can not be used to achieve this.
-* The core requires certain services
-  (such as identity, pingpong, fragmentation,
-   transport, traffic, session) which makes it 
-  meaningless to have these as modules
-  (especially since there is really only one
-  way to implement these)
-* HELLO's are larger than necessary since we need
-  one for each transport (and hence often have
-  to pick a subset of our HELLOs to transmit)
-* Fragmentation is done at the core level but only
-  required for a few transports; future versions of
-  these transports might want to be aware of fragments
-  and do things like retransmission
-* Autoconfiguration is hard since we have no good
-  way to detect (and then use securely) our external IP address
-* It is currently not possible for multiple transports
-  between the same pair of peers to be used concurrently
-  in the same direction(s)
-* We're using lots of cron-based jobs to periodically
-  try (and fail) to build and transmit
-
-SOLUTION:
-* Rewrite core to integrate most of these services
-  into one "core" service.
-* Redesign HELLO to contain the addresses for
-  all enabled transports in one message (avoiding
-  having to transmit the public key and signature
-  many, many times)
-* With discovery being part of the transport service,
-  it is now also possible to "learn" our external
-  IP address from other peers (we just add plausible
-  addresses to the list; other peers will discard 
-  those addresses that don't work for them!)
-* New DV will consist of a "transport" and a 
-  high-level service (to handle encrypted DV
-  control- and data-messages).
-* Move expiration from one field per HELLO to one
-  per address
-* Require signature in PONG, not in HELLO (and confirm
-  on address at a time)
-* Move fragmentation into helper library linked
-  against by UDP (and others that might need it)
-* Link-to-link advertising of our HELLO is transport
-  responsibility; global advertising/bootstrap remains
-  responsibility of higher layers
-* Change APIs to be event-based (transports pull for
-  transmission data instead of core pushing and failing)
-
-
-PROBLEM GROUP 6 (FS-APIs):
-* As with gnunetd, the FS-APIs are heavily threaded,
-  resulting in hard-to-understand code (slightly
-  better than gnunetd, but not much).
-* GTK in particular does not like this, resulting 
-  in complicated code to switch to the GTK event
-  thread when needed (which may still be causing
-  problems on Gnome, not sure).
-* If GUIs die (or are not properly shutdown), state
-  of current transactions is lost (FSUI only
-  saves to disk on shutdown)
-* FILENAME metadata is killed by ECRS/FSUI to avoid
-  exposing HOME, but what if the user set it manually?
-* The DHT was a generic data structure with no
-  support for ECRS-style block validation
-
-SOLUTION:
-* Eliminate threads from FS-APIs
-* Incrementally store FS-state always also on disk using many
-  small files instead of one big file
-* Have API to manipulate sharing tree before
-  upload; have auto-construction modify FILENAME
-  but allow user-modifications afterwards
-* DHT API was extended with a BLOCK API for content
-  validation by block type; validators for FS and
-  DHT block types were written; BLOCK API is also
-  used by gap routing code.
-
-
-PROBLEM GROUP 7 (User experience):
-* Searches often do not return a sufficient / significant number of
-  results
-* Sharing a directory with thousands of similar files (image/jpeg)
-  creates thousands of search results for the mime-type keyword
-  (problem with DB performance, network transmission, caching,
-   end-user display, etc.)
-* Users that wanted to share important content had no way to
-  tell the system to replicate it more; replication was also
-  inefficient (this desired feature was sometimes called
-  "power" publishing or content pushing)
-
-SOLUTION:
-* Have option to canonicalize keywords (see suggestion on mailinglist end of
-  June 2009: keep consonants and sort those alphabetically); not
-  fully implemented yet
-* When sharing directories, extract keywords first and then
-  push keywords that are common in all files up to the
-  directory level; when processing an AND-ed query and a directory
-  is found to match the result, do an inspection on the metadata
-  of the files in the directory to possibly produce further results
-  (requires downloading of the directory in the background);
-  needs more testing
-* A desired replication level can now be specified and is tracked
-  in the datastore; migration prefers content with a high
-  replication level (which decreases as replicase are created)
-  => datastore format changed; we also took out a size field 
-     that was redundant, so the overall overhead remains the same
-* Peers with a full disk (or disabled migration) can now notify 
-  other peers that they are not interested in migration right
-  now; as a result, less bandwidth is wasted pushing content
-  to these peers (and replication counters are not generally
-  decreased based on copies that are just discarded; naturally,
-  there is still no guarantee that the replicas will stay
-  available) 
-
-
-
-SUMMARY:
-* Features eliminated from util:
-  - threading (goal: good riddance!)
-  - complex logging features [ectx-passing, target-kinds] (goal: good 
riddance!)
-  - complex configuration features [defaults, notifications] (goal: good 
riddance!)
-  - network traffic monitors (goal: eliminate)
-  - IPC semaphores (goal: d-bus? / eliminate?)
-  - second timers
-* New features in util:
-  - scheduler
-  - service and program boot-strap code
-  - bandwidth and time APIs
-  - buffered IO API
-  - HKDF implementation (crypto)
-  - load calculation API
-  - bandwidth calculation API
-* Major changes in util:
-  - more expressive server (replaces selector)
-  - DNS lookup replaced by async service




reply via email to

[Prev in Thread] Current Thread [Next in Thread]