[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of th
From: |
gnunet |
Subject: |
[GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of this text being crappy ? High. It's still text tho |
Date: |
Mon, 13 Aug 2018 23:49:12 +0200 |
This is an automated email from the git hooks/post-receive script.
erwan-ulrich pushed a commit to branch master
in repository schemafuzz.
The following commit(s) were added to refs/heads/master by this push:
new 0de8b08 Probability of this text being crappy ? High. It's still text
tho
0de8b08 is described below
commit 0de8b08237b3f6caf76371e6a15956839615c1d5
Author: Feideus <address@hidden>
AuthorDate: Mon Aug 13 23:49:07 2018 +0200
Probability of this text being crappy ? High. It's still text tho
---
Documentation.tex | 49 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 41 insertions(+), 8 deletions(-)
diff --git a/Documentation.tex b/Documentation.tex
index 418907f..1a6e4cf 100755
--- a/Documentation.tex
+++ b/Documentation.tex
@@ -113,6 +113,8 @@ To do so, the software has a way to tranfert the mutation
from a child to its pa
\subsubsection{Mutations}
+ \paragraph{What is a Mutation}
+ \paragraph{How to chose what to mutate}
\paragraph{Creating malformed data}
As the goal of running this tool is to submit unexpected or invalid data to
the target software it is necessary to understand what t
Fuzzing a complex type such a timestamp variable has nothing to do with
fuzzing a trivial boolean. In practice, A significant part o
@@ -171,27 +173,58 @@ All the mutations that are injected at least once in the
course of the execution
It is made of nodes (mutations) including a root (first mutation to be
processed on a field selected randomly in the database)
Each node has a number of children that depends on the ranking its mutation
and the number of potential modifications that it can perform.
\paragraph{Weight} :
-Weighting the nodes is an important part of the runtime. Each mutation has a
weight that is equal to the analyzer's output. This value reflects the
mutation's value. If it had an intresting impact on the target program behavior
(if it triggered new buggs or uncommon code paths) than this value is high and
vice-versa. The weight is then used as a mean of determining the upcomming
modification. The chance that a mutation gets a child is proportionnal to its
weight.
-This value currently isn't biased by any other parameter, but it this might
change in the future.
+Weighting the nodes is an important part of the runtime. Each mutation has a
weight that is equal to the analyzer's output. This value reflects the
mutation's value. If it had an intresting impact on the target program behavior
(if it triggered new buggs or uncommon code paths) than this value is high and
vice-versa. The weight is then used as a mean of determining the upcomming
modification. The chance that a mutation gets a child is directly proportionnal
to its weight.
+This value currently isn't biased by any other parameter, but this might
change in the future.
\paragraph{Path}
+Since the weighting of the mutation allows to go back to previously more
intresting mutations,
+there is a need for a path finder mechanism. Concretly, this routines resolves
the nodes that separate nodes A and B in the tree. A and B might be children
and parent but can also belong to complitely different branches. This path is
then given to the do/undo routine that processes back the modifications to set
the database up in the required state for the upcomming mutation. %% diagram
here.
\subsubsection{The analyzer}
+Analyzing the output of the target programm is another critical part of
SchemaFuzz. The analyzer parses in the stack trace of the target software's
execution to try measuring its interest. The main criteria that defines a
mutation intrest is its proximity to previously parsed stack traces. The more
distance between the new mutation and the old ones, the better the ranking. %%
diagram here
\paragraph{Stack Trace Parser}
+The stack trace parser is a separate Bash script that processes stack traces
generated by the GDB C language debugger and stores all the relevent
informations (function's name, line number, file name) into a Java object. The
parser also generates as a secondary job a human readable file for each
mutation that synthetises the stack trace values as well as additionnal
intresting information usefull for other mechanisms (that also require
parsing). These additionnal informations include the [...]
\paragraph{Hashing}
+In order to be used in the clustering algorithm, the stack trace of a mutation
has to be hashed.
+Hashing is usually defined as follows :
+ \begin{quotation}
+"A hash value (or simply hash), also called a message digest, is a number
generated from a string of text. The hash is substantially smaller than the
text itself, and is generated by a formula in such a way that it is extremely
unlikely that some other text will produce the same hash value."
+ \end{quotation}
+
+In the present case, we used a different approach. Since proximity beetween
two stack traces is the key to a relevant ranking, it is mandatory to have a
hashing function that preserves the proximity of the two strings.
+In that regards, we implemented a version of the Levenshtein Distance
algorithm.
+This algorithm can roughly be explain by the following :
+ \begin{quotation}
+"The Levenshtein distance between two words is the minimum number of
single-character edits (insertions, deletions or substitutions) required to
change one word into the other."
+ \end{quotation}
+After hashing the file name and the function name into numerical values trough
Levenshtein distance, we are creating a triplet the fully (but not fully
accuratly yet) represents the stack trace that is being parsed. This triplet
will be used in the clustering method. %%exemple of hash here.
\paragraph{The Scoring mechanism}
- \paragraph{Clustering Mutations}
+The "score" (or rank) of a mutation is a numerical value that reflects its
intrest. This value is calculated through a modified version of a clustering
method that computes an n-uplet into a integer depending
on the sum of the euclidian distances from the n-uplet to the existing
centroids (groups of mutation's n-uplets that were already processed).
+This value is then set as the mutation's rank and used as a mean of chosing
the upcomming mutation.
+%% exemple of clustering here
\subsection{Known issues}
- \subsubsection{Failing Mutations}
- \subsubsection{Foreign Key constraints}
+This tool is still far from being perfect. About one mutation out of 15 will
fail for unvalid reaseons.
+ \subsubsection{Context Cohorence}
+A significant amount of the failing mutations do so because of the transfer
mechanism. As said in the dedicated section, this mechanism applies more than
one change to the database (Potentially the whole database). In specific case,
this property can become problematic.
+More specificaly, when the main loop identifies a mutation's child as the
upcomming mutation and its parent row has been splashed with the effect of a
transfer. In this case, the data embedded in the schemaFuzz data structure may
not match the data that are actually in the database, this delta will likely
induce a wrongly designed SQL statement that will result in a SQL failure
(meaning that 0 row were updated by the statement).
+ \subsubsection{Foreign Key constraints}
+For a reason that is not yet clear, some of the implied FKC of the target
database can't be properly set to CASCADE mode. This result in a FKC error
(mutation fails but the program can carry on)
\subsubsection{Tests}
+Besides the test suit written by the SchemaSpy team for their own tool (still
implemented in SchemaFuzz for the meta data extraction), the tests for this
project are very poor. Their are only very few of them and their utility is
debatable. This is due to the lack of experience in that regard of the main
developper. Obviously, we are very well aware of this being a really severe
flaw in this project and will be one of the main future improvements.
+This big lack of good maintenance equipment might also explain some of the
silent and non silent buggs that still remain in the code to this day.
\section{Upcomming features and changes}
This section will provide more insights on the future features that
might/may/will be implemented as well as the changes in the existing code.
Any sugestion will be greatly appriciated as long as it is relevent and well
argumented. All the relevent information regarding the contributions are
detailled in the so called section.
\subsection{Code coverage}
-Debate code coverage here.
+We are considering changing or simply adding code coverage in the clustering
method as a parameters.Not only would this increase the accuracy of the scoring
but also increase the accuracy of the "type" of each mutation. To this day,
this tool does not make a concrete difference in terms of scoring or
information generating (reports) beetween a mutation with a new stack trace in
a very common code path and a very common stack trace in a very rarely
triggered code path.
\subsection{Centralised anonymous user data}
-Debate computing the best types or mutations and configurations (tree depth
etc...) as user data for SchemaFuzz
-
+SchemaFuzz's efficiency in thightly linked to the quality of its heuristics.
this term includes the following points
+ \begin{itemize}
+ \item{Quality of the possible modifications for a single field}
+ \item{Quality of the possible modifications for each data type}
+ \item{Quantity of possible modifications for a single field}
+ \item{Quantity of supported data types}
+ \end{itemize}
+Knowing this, we are also concidering for futures enhancements an anonymous
data collection for each execution of this tool that will be statisticly
computed to determine the best modification in average. This would improve the
choosing mechanism by balancing the weights depending on the modifcation's
average quality. Modifications with higher average quality would see their
weight increased (meaning they would get picked more frequently) and vice
versa.
\section{Contributing}
You can send your ideas at \\*
address@hidden
--
To stop receiving notification emails like this one, please contact
address@hidden
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [GNUnet-SVN] [taler-schemafuzz] branch master updated: Probability of this text being crappy ? High. It's still text tho,
gnunet <=