[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of th

gnunet-svn
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of th

From:	gnunet
Subject:	[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of this text being crappy ? High. It's still text tho
Date:	Mon, 13 Aug 2018 23:49:12 +0200
This is an automated email from the git hooks/post-receive script.

erwan-ulrich pushed a commit to branch master
in repository schemafuzz.

The following commit(s) were added to refs/heads/master by this push:
     new 0de8b08  Probability of this text being crappy ? High. It's still text 
tho
0de8b08 is described below

commit 0de8b08237b3f6caf76371e6a15956839615c1d5
Author: Feideus <address@hidden>
AuthorDate: Mon Aug 13 23:49:07 2018 +0200

    Probability of this text being crappy ? High. It's still text tho
---
 Documentation.tex | 49 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/Documentation.tex b/Documentation.tex
index 418907f..1a6e4cf 100755
--- a/Documentation.tex
+++ b/Documentation.tex
@@ -113,6 +113,8 @@ To do so, the software has a way to tranfert the mutation 
from a child to its pa
 
                                
                        \subsubsection{Mutations}
+                               \paragraph{What is a Mutation}
+                               \paragraph{How to chose what to mutate}
                                \paragraph{Creating malformed data} 
 As the goal of running this tool is to submit unexpected or invalid data to 
the target software it is necessary to understand what t
 Fuzzing a complex type such a timestamp variable has nothing to do with 
fuzzing a trivial boolean. In practice, A significant part o
@@ -171,27 +173,58 @@ All the mutations that are injected at least once in the 
course of the execution
 It is made of nodes (mutations) including a root (first mutation to be 
processed on a field selected randomly in the database)  
 Each node has a number of children that depends on the ranking its mutation 
and the number of potential modifications that it can perform.
                                \paragraph{Weight} :
-Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an intresting impact on the target program behavior 
(if it triggered new buggs or uncommon code paths) than this value is high and 
vice-versa. The weight is then used as a mean of determining the upcomming 
modification. The chance that a mutation gets a child is proportionnal to its 
weight.
-This value currently isn't biased by any other parameter, but it this might 
change in the future.                              
+Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an intresting impact on the target program behavior 
(if it triggered new buggs or uncommon code paths) than this value is high and 
vice-versa. The weight is then used as a mean of determining the upcomming 
modification. The chance that a mutation gets a child is directly proportionnal 
to its weight.
+This value currently isn't biased by any other parameter, but this might 
change in the future.  
                                \paragraph{Path}
+Since the weighting of the mutation allows to go back to previously more 
intresting mutations, 
+there is a need for a path finder mechanism. Concretly, this routines resolves 
the nodes that separate nodes A and B in the tree. A and B might be children 
and parent but can also belong to complitely different branches. This path is 
then given to the do/undo routine that processes back the modifications to set 
the database up in the required state for the upcomming mutation. %% diagram 
here.
                        \subsubsection{The analyzer}
+Analyzing the output of the target programm is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation intrest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. %% 
diagram here
                                \paragraph{Stack Trace Parser}
+The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevent 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthetises the stack trace values as well as additionnal 
intresting information usefull for other mechanisms (that also require 
parsing). These additionnal informations include the [...]
                                \paragraph{Hashing}
+In order to be used in the clustering algorithm, the stack trace of a mutation 
has to be hashed.
+Hashing is usually defined as follows : 
+                               \begin{quotation}
+"A hash value (or simply hash), also called a message digest, is a number 
generated from a string of text. The hash is substantially smaller than the 
text itself, and is generated by a formula in such a way that it is extremely 
unlikely that some other text will produce the same hash value."
+                               \end{quotation}
+                               
+In the present case, we used a different approach. Since proximity beetween 
two stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of the two strings. 
+In that regards, we implemented a version of the Levenshtein Distance 
algorithm.
+This algorithm can roughly be explain by the following :
+                               \begin{quotation}
+"The Levenshtein distance between two words is the minimum number of 
single-character edits (insertions, deletions or substitutions) required to 
change one word into the other."
+                               \end{quotation}                          
+After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accuratly yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. %%exemple of hash here.
                                \paragraph{The Scoring mechanism}
-                               \paragraph{Clustering Mutations}
+The "score" (or rank) of a mutation is a numerical value that reflects its 
intrest. This value is calculated through a modified version of a clustering 
method that computes an n-uplet                        into a integer depending 
on the sum of the euclidian distances from the n-uplet to the existing 
centroids (groups of mutation's n-uplets that were already processed).
+This value is then set as the mutation's rank and used as a mean of chosing 
the upcomming mutation.
+%% exemple of clustering here          
                \subsection{Known issues}               
-                       \subsubsection{Failing Mutations}
-                       \subsubsection{Foreign Key constraints}
+This tool is still far from being perfect. About one mutation out of 15 will 
fail for unvalid reaseons.
+                       \subsubsection{Context Cohorence}
+A significant amount of the failing mutations do so because of the transfer 
mechanism. As said in the dedicated section, this mechanism applies more than 
one change to the database (Potentially the whole database). In specific case, 
this property can become problematic. 
+More specificaly, when the main loop identifies a mutation's child as the 
upcomming mutation and its parent row has been splashed with the effect of a 
transfer. In this case, the data embedded in the schemaFuzz data structure may 
not match the data that are actually in the database, this delta will likely 
induce a wrongly designed SQL statement that will result in a SQL failure 
(meaning that 0 row were updated by the statement).
+                       \subsubsection{Foreign Key constraints}                 
+For a reason that is not yet clear, some of the implied FKC of the target 
database can't be properly set to CASCADE mode. This result in a FKC error 
(mutation fails but the program can carry on)                     
                        \subsubsection{Tests}
+Besides the test suit written by the SchemaSpy team for their own tool (still 
implemented in SchemaFuzz for the meta data extraction), the tests for this 
project are very poor. Their are only very few of them and their utility is 
debatable. This is due to the lack of experience in that regard of the main 
developper. Obviously, we are very well aware of this being a really severe 
flaw in this project and will be one of the main future improvements.
+This big lack of good maintenance equipment might also explain some of the 
silent and non silent buggs that still remain in the code to this day.          
            
        \section{Upcomming features and changes}
 This section will provide more insights on the future features that 
might/may/will be implemented as well as the changes in the existing code.
 Any sugestion will be greatly appriciated as long as it is relevent and well 
argumented. All the relevent information regarding the contributions are 
detailled in the so called section.
        
                \subsection{Code coverage}
-Debate code coverage here.
+We are considering changing or simply adding code coverage in the clustering 
method as a parameters.Not only would this increase the accuracy of the scoring 
but also increase the accuracy of the "type" of each mutation. To this day, 
this tool does not make a concrete difference in terms of scoring or 
information generating (reports) beetween a mutation with a new stack trace in 
a very common code path and a very common stack trace in a very rarely 
triggered code path.
                \subsection{Centralised anonymous user data}
-Debate computing the best types or mutations and configurations (tree depth 
etc...) as user data for SchemaFuzz
-               
+SchemaFuzz's efficiency in thightly linked to the quality of its heuristics. 
this term includes the following points 
+               \begin{itemize}
+               \item{Quality of the possible modifications for a single field}
+               \item{Quality of the possible modifications for each data type}
+               \item{Quantity of possible modifications for a single field}
+               \item{Quantity of supported data types}
+               \end{itemize}
+Knowing this, we are also concidering for futures enhancements an anonymous 
data collection  for each execution of this tool that will be statisticly 
computed to determine the best modification in average. This would improve the 
choosing mechanism by balancing the weights  depending on the modifcation's 
average quality. Modifications with higher average quality would see their 
weight increased (meaning they would get picked more frequently) and vice 
versa.                   
        \section{Contributing}
 You can send your ideas at  \\*
                address@hidden

-- 
To stop receiving notification emails like this one, please contact
address@hidden
[Prev in Thread]
Current Thread
[Next in Thread]
[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of this text being crappy ? High. It's still text tho, gnunet <=
Prev by Date: [GNUnet-SVN] [gnunet] branch master updated: Fix documentation and assertions
Next by Date: [GNUnet-SVN] [gnunet-webui] 02/04: fix visuals, wip namestore page
Previous by thread: [GNUnet-SVN] [gnunet] branch master updated: Fix documentation and assertions
Next by thread: [GNUnet-SVN] [gnunet-webui] branch master updated (da35c82 -> 560b397)
Index(es):
- Date
- Thread