gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] [taler-schemafuzz] branch master updated: modifications on


From: gnunet
Subject: [GNUnet-SVN] [taler-schemafuzz] branch master updated: modifications on the text and structure
Date: Fri, 31 Aug 2018 12:43:51 +0200

This is an automated email from the git hooks/post-receive script.

erwan-ulrich pushed a commit to branch master
in repository schemafuzz.

The following commit(s) were added to refs/heads/master by this push:
     new 37c2dc7  modifications on the text and structure
37c2dc7 is described below

commit 37c2dc7d534fe0aea2fa1bb2bb611bf856417e23
Author: Feideus <address@hidden>
AuthorDate: Fri Aug 31 12:43:47 2018 +0200

    modifications on the text and structure
---
 docs/Documentation.pdf       | Bin 311105 -> 953766 bytes
 docs/Documentation.tex       | 122 +++++++++++++++++++++++++++++++++++++------
 docs/PersonnalExperience.tex |  17 +++---
 docs/compileDoc.sh           |   4 ++
 docs/sc1.png                 | Bin 0 -> 157581 bytes
 docs/sc2.png                 | Bin 0 -> 289462 bytes
 docs/sc3.png                 | Bin 0 -> 149536 bytes
 docs/sc4.png                 | Bin 0 -> 137324 bytes
 docs/testcite.bib            |   5 ++
 9 files changed, 122 insertions(+), 26 deletions(-)

diff --git a/docs/Documentation.pdf b/docs/Documentation.pdf
index 1dea4e6..f645a17 100644
Binary files a/docs/Documentation.pdf and b/docs/Documentation.pdf differ
diff --git a/docs/Documentation.tex b/docs/Documentation.tex
index 3dddc7e..5746f20 100644
--- a/docs/Documentation.tex
+++ b/docs/Documentation.tex
@@ -4,6 +4,8 @@
 \usepackage{hyperref}
 \usepackage{tikz}
 \usepackage{pifont}
+\usepackage{natbib}
+\bibliographystyle{unsrtnat}
 \graphicspath{{/home/feideu/Work/Gnunet/schemafuzz/docs/}}
 \usepackage{graphicx}
 \usepackage{pdfpages}
@@ -43,7 +45,10 @@ SchemaFuzz's development enrolls in the global dynamic of 
the past decades regar
 It uses the principle of "fuzz testing" or "fuzzing" to help find out which 
are the weak code paths of one's project. 
                                \begin{quotation}
 Traditional fuzzing is defined as "testing an automated software testing 
technique that involves providing invalid, unexpected, or random data as inputs 
to a computer program".
-                               \end{quotation}         
+                               \end{quotation} 
+                               \cite{fuzzing}          
+
+
 
 This quote is very well illustrated by the following example :
                                \begin{quotation}
@@ -63,7 +68,7 @@ SchemaFuzz focuses on the first part : modification of the 
content of the databa
 It is interesting to point out that this last point also qualifies SchemaFuzz 
as a good "database structural flaw detector".
 That is to say that errors typically triggered by a poor management of a 
database (wrong data type usage, incoherence between database structure and use 
of the content etc ...) might also appear clearly during the execution.   
                \subsection{Perimeter}
-This tool is based on some of the SchemaSpy tool's source code. More 
precisely, it uses the portion of the code that detect and stores the target 
database's structure.
+This tool implement's some of the SchemaSpy tool's source code. More 
precisely, it uses the portion of the code that detect and stores the target 
database's structure.
 The main goal of this project is to build on top of this piece of existing 
code the functionalities required to test the usage of the database content by 
any kind of utility.                 
 The resulting software will generate a group of human readable reports on each 
modification that was performed.                
                \begin{figure} [h!]
@@ -71,9 +76,9 @@ The resulting software will generate a group of human 
readable reports on each m
                \caption{Shows the nature of the code for every distinct 
component. The slice size is a rough estimation.}
                \end{figure}
                \subsection{When to use it}
-SchemaFuzz is a very useful tool for anyone trying secure a piece of software 
that uses database resources. The target software should be GDB compatible and 
the DBMS has to grant access to the target database through credentials passed 
as argument to this tool.
+SchemaFuzz is a very useful tool for anyone trying secure a piece of software 
that uses database resources. The target software should be GDB(introduce GDB) 
compatible and the DBMS(introduce acronym) has to grant access to the target 
database through credentials passed as argument to this tool.
 
----It is very strongly advice to use a copy of the target database rather than 
on the production material. Doing so will very likely result in the database 
being corrupted and not usable for any useful mean.
+---It is very strongly advice to use a copy of the target database rather than 
on the production material. Doing so may result in the database being corrupted 
and not usable for any useful mean.
 
                \clearpage
 
@@ -107,7 +112,7 @@ In order to do that, the user shall provide this set of 
mandatory database relat
                        \end{itemize}
                \subsection{SchemaFuzz Core}            
                        \subsubsection{Constrains}
-The target database often contains constraints on one or several tables. These 
constraints have to be taken into account in the process of fabricating 
mutations as most of the time they restrict the possible values that the 
pointed field can take. This restriction can take the shape of a \underline 
{Not Null} constraint, \underline{Check} constraint, {Foreign key} constraint 
(value has to exist in some other table's field) or \underline{Primary key} 
constraint (no doublets of value allow [...]
+The target database often contains constraints on one or several tables. These 
constraints have to be taken into account in the process of fabricating 
mutations as most of the time they restrict the possible values that the 
pointed field can take. These restrictions can take the shape of a \underline 
{Not Null} constraint, \underline{Check} constraint, {Foreign key} constraint 
(value has to exist in some other table's field) or \underline{Primary key} 
constraint (no doublets of value all [...]
 \bigskip
 
 \begin{figure} [h!]
@@ -160,8 +165,9 @@ Three parallel code paths can be triggered from this point.
 A branch is a succession of mutation that share the same database row as their 
modification target.
 The heuristics determining the next mutation's modification are still very 
primitive and will be thinly justed in futures versions.                        
                                     
                                \paragraph{Creating malformed data} 
+                               %%ne veux rien dire.
 As the goal of running this tool is to submit unexpected or invalid data to 
the target software it is necessary to understand what t
-Fuzzing a complex type such a timestamps variables has nothing to do with 
fuzzing a trivial boolean. In practice, A significant part o
+Fuzzing a complex type such a timestamps variables has nothing to do with 
fuzzing a trivial boolean. In practice, a significant part o
 and this matter could absolutely be the subject of a more abstract work. We 
focused here on a very simple approach (as a first step).
 After retrieving the current row being fuzzed (may it be a new row or a 
previously fuzzed row), the algorithm explores the different
 The algorithm then builds the possible modification for each of the fields for 
the current row.
@@ -189,7 +195,7 @@ The possible modifications that this tool can produce at 
the moment are : \\ % a
                                        \item Increment/Decrement date by 1 
day/minutes depending on the precision of the date
                                        \item Set date to $00/00/0000$ 
                                \end{itemize}
-Obviously, these "abnormal" values might in fact be totally legit in some 
cases. in that case the analyzer 
+These "abnormal" values might in fact be totally legit in some cases. in that 
case the analyzer 
 will rank the mutation rather poorly, which will lead to this tree path not 
being very likely to be developed further more.
                                \\*
                                \paragraph{SQL handling}
@@ -220,8 +226,8 @@ Each node has a number of children that depends on the 
ranking its mutation and
                                \paragraph{Weight}
 Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an interesting impact on the target program 
behavior (if it triggered new bugs or uncommon code paths) than this value is 
high and vice-versa. The weight is then used as a mean of determining the 
upcoming modification. The chance that a mutation gets a child is directly 
proportional to its weight.
 This value currently isn't biased by any other parameter, but this might 
change in the future.  
-                               \paragraph{Path}
-Since the weighting of the mutation allows to go back to previously more 
interesting mutations, 
+                               \paragraph{Path} %% changer la frase sur resolve
+Since the weighting of the mutation allows to go back to previous more 
interesting mutations, 
 there is a need for a path finder mechanism. Concretely, this routines 
resolves the nodes that separate nodes A and B in the tree. A and B might be 
children and parent but can also belong to completely different branches. This 
path is then given to the do/undo routine that processes back the modifications 
to set the database up in the required state for the upcoming mutation. 
 
 \bigskip
@@ -234,7 +240,7 @@ there is a need for a path finder mechanism. Concretely, 
this routines resolves
 
 \bigskip
                        \subsubsection{The analyzer}
-Analyzing the output of the target program is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation interest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
+Analyzing the output of the target program is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation interest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
%%enlever les mots definis plus bas.
                                \paragraph{Stack Trace Parser}
 The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevant 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthesizes the stack trace values as well as additional 
interesting information useful for other mechanisms (that also require 
parsing). These additional informations include the p [...]
                                \paragraph{Hashing}
@@ -250,10 +256,7 @@ This algorithm can roughly be explain by the following :
                                \begin{quotation}
 "The Levenshtein distance between two words is the minimum number of 
single-character edits (insertions, deletions or substitutions) required to 
change one word into the other."
                                \end{quotation}                          
-After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accurately yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
-
-\clearpage
-
+                               
 \begin{figure} 
 \centering
 \begin{tabular}{ | l | l | l | l | l | l | c | r | }
@@ -266,7 +269,10 @@ After hashing the file name and the function name into 
numerical values trough L
 \caption{Example of the levenshtein distance concept.}
 \end{figure}
 
-The distance for this example is $2\div8\times100$
+The distance for this example is $2\div8\times100$                             
+                               
+After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accurately yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
+
 
                                \paragraph{The Scoring mechanism}
 The "score" (or rank) of a mutation is a numerical value that reflects how 
interesting the outcome was. Crashes and unexpected behavior to raise this 
value whereas no crash tend to lower it. This value is calculated through a 
modified version of a clustering method that computes an n-uplet into a integer 
depending on the sum of the Euclidean distances from the n-uplet to the 
existing centroids (groups of mutation's n-uplets that were already processed).
@@ -297,8 +303,91 @@ The following points constitute the main flaw of the 
source code:
                        \clearpage
 
        \section{Results and examples}
-In the process of being written.
+               \subsection{Results on test environment}
+The project as been developed primarily to be run against the GNU Taler 
database. But, a sample database was used throughout the course of the 
development in order to evaluate the progress of the tool as well as for 
testing it an environment that would not compromise any real data set.
+This sample database contains all the supported types and emulates the 
structure of a production database.
+the following figure shows what the format of output for a standard run is. 
The tree of mutations is displayed in a text format where each block stands for 
a successful mutation injection and is delimited by a pair of hooks $[]$. Each 
block is preceded by a visual representation of the depth in the tree where 
$--$ indicates one level in the tree. 
+The informations provided on each block follow this ordered structure:
+               \begin{itemize}
+               \item{Mutation ID (ordered)}
+               \item{Numerical representation of the Depth in the tree}
+               \item{ID of the mutation the modification is attached to}
+               \item{The value present in the target field BEFORE the 
modification}
+               \item{The value of the target field AFTER the modification}
+               \end{itemize}
+                               
 
+It is noticeable that the algorithm does not display the tree in depth order 
but in ID order.
+This allows the user to analyze in what order the mutations where injected.
+
+               \bigskip
+               \begin{figure} [h!]
+                       \includegraphics[width=\textwidth]{sc2.png}
+                       \caption{Example of the output for an execution on the 
development database}
+               \end{figure}
+               \bigskip
+               
+After every successful mutation, the analyzer generates a report that 
summarizes the response of the target program after the modification was 
applied.
+Every report is structured as follow :
+
+If the program did not crash: report only contains a 0.
+       
+
+If the program crashed:
+               \begin{itemize}
+               \item{"functionNames:" item}
+               \item{List representation of the function stack from the crash 
(ordered from most precise to most general level) }
+               \item{"filesNames:" item}
+               \item{List representation of the file containing the function 
call}
+               \item{"lineNumbers" item}
+               \item{List representation of the line numbers for each function 
call (the line number of the main function does not appear)}
+               \item{"end:" item}
+               \item{"path:"item}
+               \item{Text representation of the path in the tree from the 
root. Every line described a previously processed mutation}
+               \item{"endpath:"item}
+               \end{itemize}
+       
+               \bigskip
+               \begin{figure} [h!]
+                       \includegraphics[width=\textwidth]{sc1.png}
+                       \caption{Example of a generated report for an execution 
on the development database }
+               \end{figure}
+               \bigskip
+               
+               \subsection{Results on the GNU Taler database}
+
+The outcome of the first executions of SchemaFuzz against a sample of the GNU 
Taler database were promising. The tool itself properly fuzzed the target and 
the execution ended with a success
+code on 9 of the 10 attempts.          
+       
+               \bigskip
+               \begin{figure} [h!]
+                       \includegraphics[width=\textwidth]{sc3.png}
+                       \caption{Example of the output for an execution on a 
sample of the GNU Taler database}
+               \end{figure}
+               \bigskip
+               
+               \paragraph{Vanishing bugs}              
+Some of the bugs that were encountered during the test executions were not 
triggered when running against the GNU Taler database. After comparing the 
content and structure of both environments, it is likely that this behavior was 
due to the test database's minimalistic content.
+This difference between the outputs when executing the tool on the two 
different environments helped in debugging some of the code's unexplained 
behavior. 
+
+For instance, the tool would crash if meeting the following criteria:
+       \begin{itemize}
+       \item{The last mutation scored better than its parent}
+       \item{The last mutation does not have any other modification 
possibilities}
+       \item{In its current state, the tree does not have more than one branch}
+       \end{itemize}
+       
+By running the tool on a more dense database, the bug had vanished. This 
allowed us to locate the origin of the issue.
+
+
+               \bigskip
+               \begin{figure} [h!]
+                       \includegraphics[width=\textwidth]{sc4.png}
+                       \caption{Example of a bug fixed by changing the 
environment of execution}
+               \end{figure}
+               \bigskip
+                       
+               
                        \clearpage
        \section{Upcoming features and changes}
 This section will provide more insights on the future features that 
might/may/will be implemented as well as the changes in the existing code.
@@ -346,6 +435,7 @@ Or directly create a pull request on the official 
repository to edit this docume
 \newpage
 \input{PersonnalExperience.tex}
        
+       \bibliography{testcite}         
                        
 \end{empfile}
 \end{document} 
diff --git a/docs/PersonnalExperience.tex b/docs/PersonnalExperience.tex
index 565e2c1..b07036f 100644
--- a/docs/PersonnalExperience.tex
+++ b/docs/PersonnalExperience.tex
@@ -10,24 +10,23 @@ The SchemaFuzz project has had since its genesis a quiet 
clear view of how the d
 The project had to pass trough different phases of development that are 
detailed in the following time line diagram. %% insert timeline diagram here.
 
 Some of the tasks of the above time line were completed on time, some others 
were delivered late, and some were delayed in the time line because of the 
previous point.
-In the end, the project was lead in a way that is best described by the 
following time line diagram.    %% insert timeline diagram here.
+In the end, the project was lead in a way that is best described by the 
following time line diagram. %% insert timeline diagram here.
 
-Those two diagrams differ on some points. This is one of the major failures 
for the development of this project throughout the course of these 6 months. 
-There are several reasons that explain why this project could have been lead 
in a better way.
-they will be detailed and discussed in the next section. 
+Those two diagrams differ on some points.There are several reasons that 
explain why this project could have been lead in a better way. They will be 
detailed and discussed in the next section. 
 
-       \subsection{Organizational failures}
-This section has a particular value in this report, it is on the first hand a 
description of why the SchemaFuzz did not meet all of its defined goals.
-Other the other hand, it is a personal reminder of what should be improved in 
my work habits and general organization when leading a project of such a large 
size. 
+       \subsection{General Organization}
+The following organizational points help explain why the SchemaFuzz project 
did not meet all of its defined goals.
+It is also a personal reminder of what should be improved in my work habits 
and general organization when leading a project of such a large size. 
        
        \begin{itemize}
        \item{Defining tasks/features as daily/weekly sub goals}
-       \item{Improving multitasking} %% bad title.
+       \item{Improving general project planning} %% bad title.
        \item{Setting up more fluid communication}
        \end{itemize}           
 
        \subsection{Positive outcomes}
 Throughout the development of the project, I have had the chance to acquire 
many new capacities and improve many of my own skills. I will give more 
insights on what this project and, more generically, what this internship as a 
developer for a GNU package, has brought me.
+Apart from the Java language, which I was already familiar with, I also had 
the chance to get my hands of new technologies (or technologies I never really 
had the chance to practice in real conditions). 
 
                \subsubsection{Technical aspect}
                
@@ -36,8 +35,6 @@ In many ways, this project has been a real challenge. But the 
main difficulty th
 Even if I was already accustomed to Java programming, I got struck by the 
complexity and the architecture of a "real" in-production software like 
SchemaSpy which I had to look into to get the meta data extraction routine.
 This was my first improvement. Code structure. Even if my coding capacities 
can still be perfected in many ways, I feel like understanding/re-using complex 
and well structured code gave me a much better idea of what "good code" really 
is. Integrating these concepts empowered my development skills and I am now 
much more confident about it.
 
-Apart from the Java language, which I was already familiar with, I also had 
the chance to get my hands of new technologies (or technologies I never really 
had the chance to practice in real conditions). 
-
                        \paragraph{SQL language}
 SchemaFuzz is a database fuzzer. Naturally, A major component of the work for 
its development was to create and handle SQL requests and responses. In order 
to do that, I had to document myself for a while as I was lacking some 
knowledge on databases in general. After gaining a better understanding of how 
databases operate theoretically, I had to go into more depth concerning the 
inner structure of constraints and the way data types are encoded for most DMBS.
 This brings me to my next point regarding the handling of SQL in this project.
diff --git a/docs/compileDoc.sh b/docs/compileDoc.sh
new file mode 100644
index 0000000..6f5a551
--- /dev/null
+++ b/docs/compileDoc.sh
@@ -0,0 +1,4 @@
+latex Documentation.tex
+bibtex Documentation
+latex Documentation.tex
+latex Documentation.tex
diff --git a/docs/sc1.png b/docs/sc1.png
new file mode 100644
index 0000000..b9e93d6
Binary files /dev/null and b/docs/sc1.png differ
diff --git a/docs/sc2.png b/docs/sc2.png
new file mode 100644
index 0000000..ec3ee99
Binary files /dev/null and b/docs/sc2.png differ
diff --git a/docs/sc3.png b/docs/sc3.png
new file mode 100644
index 0000000..12c3e44
Binary files /dev/null and b/docs/sc3.png differ
diff --git a/docs/sc4.png b/docs/sc4.png
new file mode 100644
index 0000000..b179622
Binary files /dev/null and b/docs/sc4.png differ
diff --git a/docs/testcite.bib b/docs/testcite.bib
new file mode 100644
index 0000000..f27dd42
--- /dev/null
+++ b/docs/testcite.bib
@@ -0,0 +1,5 @@
address@hidden,
+author = {Wiki},
+title = {Fuzzing},
+url = "https://en.wikipedia.org/w/index.php?title=Plagiarism&oldid=5139350";
+}

-- 
To stop receiving notification emails like this one, please contact
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]