gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] [taler-schemafuzz] branch master updated: adapting code for


From: gnunet
Subject: [GNUnet-SVN] [taler-schemafuzz] branch master updated: adapting code for productrion command line production (still need a solution for nested options)
Date: Wed, 29 Aug 2018 16:34:10 +0200

This is an automated email from the git hooks/post-receive script.

erwan-ulrich pushed a commit to branch master
in repository schemafuzz.

The following commit(s) were added to refs/heads/master by this push:
     new a9b8f48  adapting code for productrion command line production (still 
need a solution for nested options)
a9b8f48 is described below

commit a9b8f4851c0f45eb4e76e407dc4467a005fc2707
Author: Feideus <address@hidden>
AuthorDate: Wed Aug 29 16:34:05 2018 +0200

    adapting code for productrion command line production (still need a 
solution for nested options)
---
 Documentation.tex                                  | 322 ---------------------
 Documentation.pdf => docs/Documentation.pdf        | Bin 286590 -> 311105 bytes
 docs/Documentation.tex                             | 262 +++++++++--------
 docs/PersonnalExperience.pdf                       | Bin 39544 -> 0 bytes
 docs/PersonnalExperience.tex                       |  78 +++--
 src/main/java/org/schemaspy/DBFuzzer.java          |   2 +-
 .../org/schemaspy/cli/CommandLineArguments.java    |   7 +
 stackTraceCParser.sh                               |  22 +-
 8 files changed, 199 insertions(+), 494 deletions(-)

diff --git a/Documentation.tex b/Documentation.tex
deleted file mode 100644
index 1af7a17..0000000
--- a/Documentation.tex
+++ /dev/null
@@ -1,322 +0,0 @@
-\documentclass{article}
-\usepackage[utf8]{inputenc}
-\usepackage[document]{ragged2e}
-\usepackage{hyperref}
-\usepackage{tikz}
-\usepackage{pifont}
-\graphicspath{{/home/feideu/Work/Gnunet/schemafuzz/docs/}}
-\usepackage{graphicx}
-\usepackage{pdfpages}
-\usepackage{emp}
-\usetikzlibrary{shapes.arrows,chains}
-\usepackage[english]{babel}
-
-\title{Documentation for schemaFuzz}
-\author{Ulrich "Feideus" Erwan}
-
-\begin{document}
-\begin{empfile}
-       
-\maketitle Documentation For SchemaFuzz
-       \section{Summary?}
-               This document actually needs a front page.
-       \section{Introduction}
-       
-SchemaFuzz is a free software command line tool incorporated inside the Gnu 
Taler package 
-which is a free software electronic payment system providing anonymity for 
customers.
-The main goal of this project is to provide an effecient debbuging tool that 
uses a "fuzzing" strategy oriented on databases.  
-Where a traditionnal fuzzer would send malformed input to a program, 
SchemaFuzz modifies the content of a database to test that program's behavior 
when stumbling on such unexpected data. \\*
-Obviously, this tool is meant to be used as a mean of debugging as the goal is 
to pop buggs or put into light the security breaches that the code may contain 
regarding the retrieving, usage and saving of a database's content.
-As this tool is being developped as a master's thesis project, its current 
state is far from being finished and there are many options and optimisations 
that deserve to be implemented that are not yet available.
-These future/missing features will be detailed and discussed in a dedicated 
section.
-
-       
-       \section{Context and Perimeter}
-SchemaFuzz's developpement enrolls in the global dynamic of the past decades 
regarding internet  that sustain great efforts to make it a more fluid, 
pleasant but more importantly a safer space.
-
-It uses the principle of "fuzz testing" or "fuzzing" to help find out which 
are the weak code paths of one's project. 
-                               \begin{quotation}
-Traditionnal fuzzing is defined as "testing an automated software testing 
technique that involves providing invalid, unexpected, or random data as inputs 
to a computer program".
-                               \end{quotation}         
-
-This illustation is very well illustated by the following example :
-                               \begin{quotation}
-                               Lets's consider an integer in a program, which 
stores the result of a user's choice between 3 questions. When the user picks 
one, the choice will be 0, 1 or 2. Which makes three practical cases. But what 
if we transmit 3, or 255 ? We can, because integers are stored a static size 
variable. If the default switch case hasn't been implemented securely, the 
program may crash and lead to "classical" security issues: (un)exploitable 
buffer overflows, DoS, ... 
-                               \end{quotation}
-
-It is declined in severals categories that each focus on a specific type of 
input.
- 
-UI fuzzing focuses on button sequences and more genericly any kind of user 
input during the execution of a program. The above exemple falls into this 
category.
-This principle had already successfully been used in existing fuzzing tool 
such as the well known "American fuzzy loop".
-File format fuzzing generates multiple malformed samples, and opens them 
sequentially.
-However, SchemaFuzz is a database oriented fuzzer. This means that it focuses 
on triggering unexpected behavior related to the usage of a external database 
content   
-
-This tool is meant to help developpers, mainteners and more genericly anyone 
that makes use of database comming from a database under his influence in their 
task. A good way to summerise the effect of this tool is to compare it with an 
"cyber attack simulator".
-This means that the idea behind it is to emulate the damage that an attacker 
may cause subtly or not to a database he unlegitly gained privileges on. This 
might in theory go from a simple boolean flip (subtle modifications) to 
removing/adding content to purely and simply destroying or erasing all the 
content of the database.
-SchemaFuzz focuses on the first part : modification of the content of the 
database by single small modification that may or may not overlap. These 
modifications may be very aggressive of very subtle.
-It is intresting to point out that this last point also qualifies SchemaFuzz 
as a good "database structural flaw detector".
-That is to say that errors typically triggered by a poor management of a 
database (wrong data type usage, incoherence beetween database structure and 
use of the content etc ...) might also appear clearly during the execution.   
-               \subsection{Perimeter}
-This tool is based on some of the SchemaSpy tool's source code. More 
precisely, it uses the portion of the code that detect and stores the target 
database's structure.
-The main goal of this project is to build on top of this piece of existing 
code the functionnalities required to test the usage of the database content by 
any kind of utility.                
-The resulting software will generate a group of human readable reports on each 
modification that was performed.                
-               \begin{figure} [htbp]
-               \centering
-               \includegraphics[scale=1]{codeOriginDiagram.pdf}
-               \caption{Shows the nature of the code for every distinct 
component. The slice size is a rough estimation.}
-               \end{figure}
-               \subsection{When to use it}
-SchemaFuzz is a very usefull tool for anyone trying secure a piece of software 
that uses database ressources. The target software should be GDB compatible and 
the DBMS has to grant access to the target database through credentials passed 
as argument to this tool.
-
----It is very strongly advice to use a copy of the target databas erather than 
on the production material. Doing so will very likely result in the database 
being corrupted and not usable for any usefull mean.
-
-       \section{Design}
-               \subsection{Generic explanation}
-SchemaFuzz implementation is based on some bits of the SchemaSpy project 
source code.
-The majority of this project is built on top of this already existing code and 
is organised as follows :
-               \begin{itemize}
-               \item{mutation/data-set used as a way to store the 
imputs,outputs and other intresting data from the modification that was 
performed on the target database}
-               \item{the mutation Tree, used to store the mutations coherently}
-               \item{an analyser that scores the mutations to influence the 
paths that will be explored afterwards}
-               \end{itemize}
-                
-This organisation will be detailled and discussed in the following sections.
-               \subsection{SchemaSpy legacy/metadata extraction}
-SchemaSpy source code has provided the metadata extraction routine. The only 
job of this routine is to initialise the connection to the database and 
retrieve its metadata at the very beginning of the execution (before any actual 
SchemaFuzz code is run). These metadata include data types, table and table 
column names, views and foreign/primary key constraints. Having this pool of 
metadata under the shape of java objects allows the main program to properly 
frame what the possibilities are  [...]
-
-Exemple of typical metadata Set 
-
-
-\begin{figure} [htbp]
-\centering
-\includegraphics[scale=1]{MetaDataExtractionDiagram-1.pdf}
-\caption{Objects returned by the metadata extraction routine.}
-\end{figure}
-
-In order to do that, the user shall provide this set of mandatory database 
related arguments
-                       \begin{itemize}
-                               \item The driver to the corresponding database 
RDBMS (only support PostGres at the moment)
-                               \item The credentials to be used to access the 
database.
-                               \item The name of the database (duh)
-                       \end{itemize}
-               \subsection{SchemaFuzz Core}            
-                       \subsubsection{Constrains}
-The target database often contains contraints on one or several tables. These 
constraints have to be taken into account in the process of fabricating 
mutations as most of the time they restrict the possible values that the 
pointed field can take. This restriction can take the shape of a \underline 
{Not Null} constraint, \underline{Check} constraint, {Foreign key} constraint 
(value has to exist in some other table's field) or \underline{Primary key} 
constraint (no doublets of value allowe [...]
-
-\begin{figure} 
-\centering
-\includegraphics[scale=1]{ForeignKeyClassDiagram-1.pdf}
-\caption{Objects returned by the metadata extraction routine.}
-\end{figure}
-
-The last two ones are the problematic ones. They imply specific work before 
applying any mutations to make sure that the value respect all the 
restrictions. before doing anything else after the metadata extraction is done, 
SchemaFuzz performs an update of all the existing constraints on the database 
to add the CASCADE clause. This allows the values bonded by a foreign key 
constraints to take effect. This update reverts to take the constraints back to 
their initial state before the progra [...]
-                               \paragraph{Primary key contraints (PKC)} :
-The primary key constraints require an extra DB query that checks the 
existence of the value in the column. If the value already exists (the query's 
result is not empty), the mutation will be dropped before being executed.
-                               \paragraph{Foreign key contraints (FKC)} :
-The foreignKey constraint is the trickiest one. Its inherent nature bonds two 
values of different table column where the value being referenced is called the 
father, and the referecing field, the child. To be precise, in order to change 
one of the two values, the other has to be changed accordingly in the same 
statement.SchemaFuzz uses the power of the CASCADE clause to make the change 
possible. This clause allows the DRBMS to automaticly change the value of the 
child if the father has b [...]
-This mechanic allows to change any of the bounded values by changing the 
father's value.
-To do so, the software has a way to tranfert the mutation from a child to its 
parent (called the mutationTransfert).
-
-                               
-                       \subsubsection{Mutations}
-                               \paragraph{What is a Mutation}
-A mutation is a Java object that bundles all the informations that are used to 
perform a modification in the database. Every is linked to its parent and 
inherits some of his parent's data. In the case of a follow up mutation the 
child inherits the the database row that was his parent's target.Therefore the 
initial state (state before the injection of the modification) of its target is 
exactly the final state (state after injection of the modification) of his 
parent's target. A mutation i [...]
-It also holds the informations concerning the result of the injection in the 
shape of a data vector. This data vector is then used to perform a clustering 
calculus to determine the "uniqueness" of the mutation. This value is also 
stored inside the mutation object and is used as the weight of this mutation in 
the tree.
-
-\begin{figure} 
-\centering
-\includegraphics[scale=1]{MutationClassDiagram-1.pdf}
-\caption{Structure of a Mutation}
-\end{figure}
-
-                               
-                               \paragraph{Choosing patern}
-For each iteration of the main loop, a modification has to be picked up as the 
next step in the fuzzing proccess. This is done by concidering the current 
state of the tree.
-Three parallel code paths can be triggered from this point.
-                               \begin{itemize}
-                               \item{Continue on the current branch of the 
tree (triggered if the last mutation scored better than its parent)}
-                               \item{Pick an existing branch in the tree and 
grow it (triggered if the last mutation scored worse than its parent on a 50/50 
chance with the next bullet)}
-                               \item{Start a new branch (triggered if the last 
mutation scored worse than its parent on a 50/50 chance with the previous 
bullet)}
-                               
-\begin{figure} 
-\centering
-\includegraphics[scale=1]{pickingPaternDiagram.pdf}
-\caption{picking Patern schema}
-\end{figure}                           
-                               
-                               \end{itemize}
-A branch is a succession of mutation that share the same database row as their 
modification target.
-The heuristics determining the next mutation's modification are still very 
primitive and will be thinly ajusted in futures versions.                       
                                     
-                               \paragraph{Creating malformed data} 
-As the goal of running this tool is to submit unexpected or invalid data to 
the target software it is necessary to understand what t
-Fuzzing a complex type such a timestamp variable has nothing to do with 
fuzzing a trivial boolean. In practice, A significant part o
-and this matter could absolutly be the subject of a more abstract work. We 
focused here on a very simple approach (as a first step).
-After retrieving the current row being fuzzed (may it be a new row or a 
previously fuzzed row), the algorithm explores the different
-The algorithm then builds the possible modification for each of the fields for 
the current row.
-At the moment, the supported types are : % add a list of the supported types.
-More primitives types will be added in the future.
-The possible modifications that this tool can produce at the moment are : % 
add complete list of the modifications that CAN be gener$
-                               Int Types:
-                               \begin{itemize}
-               
-                                       \item Extreme values (0-32676 (int) 
etc...)
-                                       \item Random value (0<value<32676 (int) 
etc...)
-                                       \item Increment/Decrement the existing 
value (332 -> 333 OR 332 -> 331)
-                               \end{itemize}
-                               String Types:
-                               \begin{itemize}
-                       
-                                       \item Change string to "aaa" ("Mount 
Everest" -> "aaa")
-                                       \item Increment/Decrement ASCII 
character at a random position in the string ("Mount Everest" -> "Mount 
Fverest")
-                                       Boolean
-                                       \item Swaping the existing value (F -> 
T OR T -> F)
-                                       \end{itemize}
-                                       Date Types : (! IMPLEMENTED BUT NOT 
FULLY FUNCTIONNAL)                                  
-                                       \begin{itemize}
-                                       \item Increment/Decrement date by 1 
day/minutes depending on the precision of the date
-                                       \item Set date to 00/00/0000
-                               \end{itemize}
-Obviously, these "abnormal" values might in fact be totally legit in some 
cases. in that case the analyzer 
-will rank the mutation rather poorly, which will lead to this tree path not 
being very likely to be developped further more.
-                               \\*
-                               \paragraph{Sql handling}
-All the SQL statements are generated within the code. This means that the data 
concerning the current and future state of the mutations have to be very 
precise. Otherwise, the SQL statement is very likely to fail. Sadly, since 
SchemaFuzz only supports postgreSQL, the implemented synthax follow the one of 
postgres
-DBMS. This is already a very big axis for future improvements and will be 
detailled in the dedicated section.
-The statement is built to target the row as precisely as possible, meaning 
that it uses all of the non fuzzed values from the row to avoid updating other 
row accidently. Only the types that can possibly be fuzzed will be used in the 
building of the SQL statement. Since this part of the code is very delicate in 
the sense that it highly depends on an arbitrary large pool of variables from 
various types it is a good bugg provider. 
-                               
-                               \paragraph{Injecting} :
-The injection process sends the built statement to the DBMS so that the 
modification can be operated. After the execution of the query, depending of 
the output of the injection (one modification, several modifications, tranfer) 
informations are updated so that they can match the database state after the 
modification. If the modification failed, no trace of this mutation is kept, it 
is erased and running goes on like nothing happenned.                           
      
-                               \paragraph{Special Case(MutationTransfert)} :
-The mutation tranfert is a special case of a modification being applied to the 
database.
-It is triggered when the value that was supposed to be fuzzed is under the 
influence of a FKC as the child.
-In the case a FKC (In CASCADE mode), only the father can be changed, which 
also triggers the same modification on all of his children. The algorithm then 
"transfers" the modification from the original mutation to its father.
-After injecting the transfered mutation, the children mutation is indeed 
modified but the modification "splashed" on some parts of the database that was 
not meant to be changed.
-Hopefully, this does not impact the life of the algorithm until this mutation 
is reverted (see next paragraph).
-                               \paragraph{Do/Undo routine} :
-The Do/Undo mechanism is at the center of this software. Its behavior is 
crucial for the execution and will have a strong impact on the coherence of the 
data nested in the code or inside the target database throughout the runtime.
-This mechanism allows the algorithm to revert a previous mutation or, if 
necessary inject it one more time. 
-Undoing a mutation applies the exact opposite modification that was originally 
applied to the database ending up in recovering the same database state as 
before the mutation was injected.
-Reverting mutations is the key to flawlessly shifting the current position in 
the mutation tree.
-The case of the transfered mutation is no exception to this. In this case, the 
mutation applied changes on an unknown number of fields in the database. But, 
the FKC still bounds all the children to their father at this point (this is 
always the case unless this software is not used as intended).  
-Changing the father's field value back to its original state will splash the 
original values back on all the children.
-This mechanism might trigger failing mutations in some cases (usually 
mutations following a tranfer). This issue will be addressed in the known 
issues section. 
-                       \subsubsection{Tree Based data structure}
-All the mutations that are injected at least once in the course of the 
execution of this software will be stored properly in a tree data structure. 
Having such a data structure makes parent-children relations between mutations 
possible. The tree follows the traditionnal definition of the a n-ary 
algorithmic tree.
-It is made of nodes (mutations) including a root (first mutation to be 
processed on a field selected randomly in the database)  
-Each node has a number of children that depends on the ranking its mutation 
and the number of potential modifications that it can perform.
-                               \paragraph{Weight} :
-Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an intresting impact on the target program behavior 
(if it triggered new buggs or uncommon code paths) than this value is high and 
vice-versa. The weight is then used as a mean of determining the upcomming 
modification. The chance that a mutation gets a child is directly proportionnal 
to its weight.
-This value currently isn't biased by any other parameter, but this might 
change in the future.  
-                               \paragraph{Path}
-Since the weighting of the mutation allows to go back to previously more 
intresting mutations, 
-there is a need for a path finder mechanism. Concretly, this routines resolves 
the nodes that separate nodes A and B in the tree. A and B might be children 
and parent but can also belong to complitely different branches. This path is 
then given to the do/undo routine that processes back the modifications to set 
the database up in the required state for the upcomming mutation. 
-
-\begin{figure} 
-\centering
-\includegraphics[scale=1]{CommonAncestorDiagram.pdf}
-\caption{Objects returned by the metadata extraction routine.}
-\end{figure}
-                       \subsubsection{The analyzer}
-Analyzing the output of the target programm is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation intrest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
-                               \paragraph{Stack Trace Parser}
-The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevent 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthetises the stack trace values as well as additionnal 
intresting information usefull for other mechanisms (that also require 
parsing). These additionnal informations include the [...]
-                               \paragraph{Hashing}
-In order to be used in the clustering algorithm, the stack trace of a mutation 
has to be hashed.
-Hashing is usually defined as follows : 
-                               \begin{quotation}
-"A hash value (or simply hash), also called a message digest, is a number 
generated from a string of text. The hash is substantially smaller than the 
text itself, and is generated by a formula in such a way that it is extremely 
unlikely that some other text will produce the same hash value."
-                               \end{quotation}
-                               
-In the present case, we used a different approach. Since proximity beetween 
two stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of the two strings. 
-In that regards, we implemented a version of the Levenshtein Distance 
algorithm.
-This algorithm can roughly be explain by the following :
-                               \begin{quotation}
-"The Levenshtein distance between two words is the minimum number of 
single-character edits (insertions, deletions or substitutions) required to 
change one word into the other."
-                               \end{quotation}                          
-After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accuratly yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
-
-\begin{figure} 
-\centering
-\begin{tabular}{ | l | l | l | l | l | l | c | r | }
-  \hline                       
-  E & X & E & M & P & L & E &  \\ \hline
-  \ding{51}  & \ding{51}  & \ding{56}  & \ding{51}  & \ding{51}  & \ding{51}  
& \ding{51} & \ding{56}  \\\hline
-  E & X & A & M & P & L & E & S \\
-  \hline  
-\end{tabular}
-\caption{Exemple of the levenshtein distance concept.}
-\end{figure}
-
-The Distance for this exemple is 2/8x100
-
-                               \paragraph{The Scoring mechanism}
-The "score" (or rank) of a mutation is a numerical value that reflects its 
intrest. This value is calculated through a modified version of a clustering 
method that computes an n-uplet                        into a integer depending 
on the sum of the euclidian distances from the n-uplet to the existing 
centroids (groups of mutation's n-uplets that were already processed).
-This value is then set as the mutation's rank and used as a mean of chosing 
the upcomming mutation.
-\begin{figure} 
-  \includegraphics[width=\textwidth]{Scoring.png}
-\end{figure}   
-               \subsection{Known issues}               
-About one mutation out of 15 will fail for unvalid reaseons.
-                       \subsubsection{Context Cohorence}
-A significant amount of the failing mutations do so because of the transfer 
mechanism. As said in the dedicated section, this mechanism applies more than 
one change to the database (Potentially the whole database). In specific case, 
this property can become problematic. 
-More specificaly, when the main loop identifies a mutation's child as the 
upcomming mutation and its parent row has been splashed with the effect of a 
transfer. In this case, the data embedded in the schemaFuzz data structure may 
not match the data that are actually in the database, this delta will likely 
induce a wrongly designed SQL statement that will result in a SQL failure 
(meaning that 0 row were updated by the statement).
-                       \subsubsection{Foreign Key constraints}                 
-For a reason that is not yet clear, some of the implied FKC of the target 
database can't be properly set to CASCADE mode. This result in a FKC error 
(mutation fails but the program can carry on)                     
-                       \subsubsection{Tests}
-Besides the test suit written by the SchemaSpy team for their own tool (still 
implemented in SchemaFuzz for the meta data extraction), the tests for this 
project are very poor. Their are only very few of them and their utility is 
debatable. This is due to the lack of experience in that regard of the main 
developper. Obviously, we are very well aware of this being a really severe 
flaw in this project and will be one of the main future improvements.
-This big lack of good maintenance equipment might also explain some of the 
silent and non silent buggs that still remain in the code to this day.
-
-                       \subsubsection{Code Quality}
-We are well aware that this tool's source code is of debatable quality. This 
fact induces the  buggs and unexpected behaviors discussed earlier on some 
components of this program. 
-The following points constitute the main flaw of the source code:
-                       \begin{itemize}
-                       \item Hard to maintain. The code is not optimised 
either in term of size or                     efficency. Bad coding habits tend 
to make it rather weak and unstable to context changes.
-                       \item Structure is not intuitive. The main loop of the 
program lacks a good             structure.
-                       \end{itemize}
-
-       \section{Results and exemples}
-In the proccess of being written.
-
-       \section{Upcomming features and changes}
-This section will provide more insights on the future features that 
might/may/will be implemented as well as the changes in the existing code.
-Any sugestion will be greatly appriciated as long as it is relevent and well 
argumented. All the relevent information regarding the contributions are 
detailled in the so called section.
-
-               \subsection{General Report}
-In its future state, SchemaFuzz will generate a synthesized report concerning 
the overall execution of the tool (which it does not do right now). This 
general report will primarely contain the most "intresting" mutations (meaning 
the mutations with the highest score mark) for the whole run.
-A more advanced version of this report would also take into account the code 
coverage rate for each mutation and execute a last clustering round at the end 
of the execution to generate a "global" score that would represent the global 
value of each mutations.
-       
-               \subsection{Code coverage}
-We are considering changing or simply adding code coverage in the clustering 
method as a parameters.Not only would this increase the accuracy of the scoring 
but also increase the accuracy of the "type" of each mutation. To this day, 
this tool does not make a concrete difference in terms of scoring or 
information generating (reports) beetween a mutation with a new stack trace in 
a very common code path and a very common stack trace in a very rarely 
triggered code path.
-
-               \subsection{Data type Pre-analyzing}
-This idea for this feature to be is to implement some kind of "auto learning" 
mechanism.
-To be more precise, this routine is meant to performed a statistical analysis 
on a representative portion database's content. This analysis would provide the 
rest of the program the commun values encountered for each field. More 
genericly, this would allow the software to have a global view over the format 
of the data that the database holds.
-Such global understanding of the content format is very intresting to make the 
modifications possibilites nore relevent. Indeed, one of the major limitation 
of SchemaFuzz is its "blindness".
-That is to say that some of the modifications performed in the course the 
execution of the program are irrelevent due to the lack of information on what 
is supposed to be stored in this precise field.
-For instance, a field that only holds numerical values that go from 1 to 1000 
even if it has enough bits to encode from -32767 to 32767 would have a very low 
chance of triggering a crash if this software modifies its value from 10 to 55.
-on the other end, if the software modifies this very same field from 10 to 
-12000, then a crash is much more likely to pop up.
-Same principle applies to strings. Suppose a field can encode 10 characters.
-the pre analysis, detected that, for this field, most of the value were 
surnames beginning with the letter "a". Changing this field from "Sylvain" to 
"Sylvaim" will probably not be very effective. However, changing this same 
field from "Sylvain" to "NULL" might indeed triggered an unexpected behavior. 
-  
-This pre-analysis routine would only be executed once at the start of the 
execution, right after the meta data exctraction. The result of this analysis 
will be held by a specific object. 
-this object's lifespan is equal to the duration of the main loop's execution 
(so that every mutation can benefits from the analysis data.)
-               
-               \subsection{Centralised anonymous user data}
-SchemaFuzz's efficiency in thightly linked to the quality of its heuristics. 
this term includes the following points 
-               \begin{itemize}
-               \item{Quality of the possible modifications for a single field}
-               \item{Quality of the possible modifications for each data type}
-               \item{Quantity of possible modifications for a single field}
-               \item{Quantity of supported data types}
-               \end{itemize}
-Knowing this, we are also concidering for futures enhancements an anonymous 
data collection  for each execution of this tool that will be statisticly 
computed to determine the best modification in average. This would improve the 
choosing mechanism by balancing the weights  depending on the modifcation's 
average quality. Modifications with higher average quality would see their 
weight increased (meaning they would get picked more frequently) and vice 
versa.                   
-
-\includepdf[pages=-]{PersonnalExperience.pdf}
-
-       \section{Contributing}
-You can send your ideas at  \\*
-               address@hidden
-Or directly create a pull request on the official repository to edit this 
document and/or the code itself
-       \section{Conclusion}
-\end{empfile}
-\end{document} 
diff --git a/Documentation.pdf b/docs/Documentation.pdf
similarity index 50%
rename from Documentation.pdf
rename to docs/Documentation.pdf
index e25f5aa..1dea4e6 100644
Binary files a/Documentation.pdf and b/docs/Documentation.pdf differ
diff --git a/docs/Documentation.tex b/docs/Documentation.tex
index 23fafa1..3dddc7e 100644
--- a/docs/Documentation.tex
+++ b/docs/Documentation.tex
@@ -1,12 +1,13 @@
 \documentclass{article}
 \usepackage[utf8]{inputenc}
-\usepackage[document]{ragged2e}
+%\usepackage[document]{ragged2e}
 \usepackage{hyperref}
 \usepackage{tikz}
 \usepackage{pifont}
 \graphicspath{{/home/feideu/Work/Gnunet/schemafuzz/docs/}}
 \usepackage{graphicx}
 \usepackage{pdfpages}
+\usepackage{url}
 \usepackage{emp}
 \usetikzlibrary{shapes.arrows,chains}
 \usepackage[english]{babel}
@@ -18,78 +19,84 @@
 \begin{empfile}
        
 \maketitle Documentation For SchemaFuzz
-       \section{Summary?}
-               This document actually needs a front page.
-       \section{Introduction}
+\clearpage
+
+\tableofcontents
+
+
+       \section{Introduction} 
        
 SchemaFuzz is a free software command line tool incorporated inside the Gnu 
Taler package 
 which is a free software electronic payment system providing anonymity for 
customers.
-The main goal of this project is to provide an effecient debbuging tool that 
uses a "fuzzing" strategy oriented on databases.  
-Where a traditionnal fuzzer would send malformed input to a program, 
SchemaFuzz modifies the content of a database to test that program's behavior 
when stumbling on such unexpected data. \\*
-Obviously, this tool is meant to be used as a mean of debugging as the goal is 
to pop buggs or put into light the security breaches that the code may contain 
regarding the retrieving, usage and saving of a database's content.
-As this tool is being developped as a master's thesis project, its current 
state is far from being finished and there are many options and optimisations 
that deserve to be implemented that are not yet available.
+The main goal of this project is to provide an efficient debugging tool that 
uses a "fuzzing" strategy oriented on databases.  
+Where a traditional fuzzer would send malformed input to a program, SchemaFuzz 
modifies the content of a database to test that program's behavior when 
stumbling on such unexpected data. \\*
+Obviously, this tool is meant to be used as a mean of debugging as the goal is 
to pop bugs or put into light the security breaches that the code may contain 
regarding the retrieving, usage and saving of a database's content.
+As this tool is being developed as a master's thesis project, its current 
state is far from being finished and there are many options and optimizations 
that deserve to be implemented that are not yet available.
+
+       \clearpage
 
        
-       \section{Context and Perimeter}
-SchemaFuzz's developpement enrolls in the global dynamic of the past decades 
regarding internet  that sustain great efforts to make it a more fluid, 
pleasant but more importantly a safer space.
+       \section{Context and Perimeter} 
+               \subsection{Context}
+SchemaFuzz's development enrolls in the global dynamic of the past decades 
regarding Internet  that sustain great efforts to make it a more fluid, 
pleasant but more importantly a safer space.
 
 It uses the principle of "fuzz testing" or "fuzzing" to help find out which 
are the weak code paths of one's project. 
                                \begin{quotation}
-Traditionnal fuzzing is defined as "testing an automated software testing 
technique that involves providing invalid, unexpected, or random data as inputs 
to a computer program".
+Traditional fuzzing is defined as "testing an automated software testing 
technique that involves providing invalid, unexpected, or random data as inputs 
to a computer program".
                                \end{quotation}         
 
-This illustation is very well illustated by the following example :
+This quote is very well illustrated by the following example :
                                \begin{quotation}
-                               Lets's consider an integer in a program, which 
stores the result of a user's choice between 3 questions. When the user picks 
one, the choice will be 0, 1 or 2. Which makes three practical cases. But what 
if we transmit 3, or 255 ? We can, because integers are stored a static size 
variable. If the default switch case hasn't been implemented securely, the 
program may crash and lead to "classical" security issues: (un)exploitable 
buffer overflows, DoS, ... 
+Lets consider an integer in a program, which stores the result of a user's 
choice between 3 questions. When the user picks one, the choice will be 0, 1 or 
2. Which makes three practical cases. But what if we transmit 3, or 255 ? We 
can, because integers are stored a static size variable. If the default switch 
case hasn't been implemented securely, the program may crash and lead to 
"classical" security issues: (un)exploitable buffer overflows, DoS, ... 
                                \end{quotation}
 
 It is declined in severals categories that each focus on a specific type of 
input.
  
-UI fuzzing focuses on button sequences and more genericly any kind of user 
input during the execution of a program. The above exemple falls into this 
category.
+UI fuzzing focuses on button sequences and more generically any kind of user 
input during the execution of a program. The above example falls into this 
category.
 This principle had already successfully been used in existing fuzzing tool 
such as the well known "American fuzzy loop".
 File format fuzzing generates multiple malformed samples, and opens them 
sequentially.
 However, SchemaFuzz is a database oriented fuzzer. This means that it focuses 
on triggering unexpected behavior related to the usage of a external database 
content   
 
-This tool is meant to help developpers, mainteners and more genericly anyone 
that makes use of database comming from a database under his influence in their 
task. A good way to summerise the effect of this tool is to compare it with an 
"cyber attack simulator".
-This means that the idea behind it is to emulate the damage that an attacker 
may cause subtly or not to a database he unlegitly gained privileges on. This 
might in theory go from a simple boolean flip (subtle modifications) to 
removing/adding content to purely and simply destroying or erasing all the 
content of the database.
+This tool is meant to help developers, maintainers and more generically anyone 
that makes use of data coming from a database under his influence in their 
task. A good way to sum up the effect of this tool is to compare it with an 
"cyber attack simulator".
+This means that the idea behind it is to emulate the damage that an attacker 
may cause subtly or not to a database he illegally gained privileges on. This 
might in theory go from a simple boolean flip (subtle modifications) to 
removing/adding content to purely and simply destroying or erasing all the 
content of the database.
 SchemaFuzz focuses on the first part : modification of the content of the 
database by single small modification that may or may not overlap. These 
modifications may be very aggressive of very subtle.
-It is intresting to point out that this last point also qualifies SchemaFuzz 
as a good "database structural flaw detector".
-That is to say that errors typically triggered by a poor management of a 
database (wrong data type usage, incoherence beetween database structure and 
use of the content etc ...) might also appear clearly during the execution.   
+It is interesting to point out that this last point also qualifies SchemaFuzz 
as a good "database structural flaw detector".
+That is to say that errors typically triggered by a poor management of a 
database (wrong data type usage, incoherence between database structure and use 
of the content etc ...) might also appear clearly during the execution.   
                \subsection{Perimeter}
 This tool is based on some of the SchemaSpy tool's source code. More 
precisely, it uses the portion of the code that detect and stores the target 
database's structure.
-The main goal of this project is to build on top of this piece of existing 
code the functionnalities required to test the usage of the database content by 
any kind of utility.                
+The main goal of this project is to build on top of this piece of existing 
code the functionalities required to test the usage of the database content by 
any kind of utility.                 
 The resulting software will generate a group of human readable reports on each 
modification that was performed.                
-               \begin{figure} [htbp]
-               \centering
-               \includegraphics[scale=1]{codeOriginDiagram.pdf}
+               \begin{figure} [h!]
+               \includegraphics[width=\textwidth]{codeOriginDiagram.pdf}
                \caption{Shows the nature of the code for every distinct 
component. The slice size is a rough estimation.}
                \end{figure}
                \subsection{When to use it}
-SchemaFuzz is a very usefull tool for anyone trying secure a piece of software 
that uses database ressources. The target software should be GDB compatible and 
the DBMS has to grant access to the target database through credentials passed 
as argument to this tool.
+SchemaFuzz is a very useful tool for anyone trying secure a piece of software 
that uses database resources. The target software should be GDB compatible and 
the DBMS has to grant access to the target database through credentials passed 
as argument to this tool.
 
----It is very strongly advice to use a copy of the target databas erather than 
on the production material. Doing so will very likely result in the database 
being corrupted and not usable for any usefull mean.
+---It is very strongly advice to use a copy of the target database rather than 
on the production material. Doing so will very likely result in the database 
being corrupted and not usable for any useful mean.
+
+               \clearpage
 
        \section{Design}
                \subsection{Generic explanation}
 SchemaFuzz implementation is based on some bits of the SchemaSpy project 
source code.
-The majority of this project is built on top of this already existing code and 
is organised as follows :
+The majority of this project is built on top of this already existing code and 
is organized as follows :
                \begin{itemize}
-               \item{mutation/data-set used as a way to store the 
imputs,outputs and other intresting data from the modification that was 
performed on the target database}
+               \item{mutation/data-set used as a way to store the 
inputs,outputs and other interesting data from the modification that was 
performed on the target database}
                \item{the mutation Tree, used to store the mutations coherently}
-               \item{an analyser that scores the mutations to influence the 
paths that will be explored afterwards}
+               \item{an analyzer that scores the mutations to influence the 
paths that will be explored afterwards}
                \end{itemize}
                 
-This organisation will be detailled and discussed in the following sections.
-               \subsection{SchemaSpy legacy/metadata extraction}
-SchemaSpy source code has provided the metadata extraction routine. The only 
job of this routine is to initialise the connection to the database and 
retrieve its metadata at the very beginning of the execution (before any actual 
SchemaFuzz code is run). These metadata include data types, table and table 
column names, views and foreign/primary key constraints. Having this pool of 
metadata under the shape of java objects allows the main program to properly 
frame what the possibilities are  [...]
-
-Exemple of typical metadata Set 
+This organization will be detailed and discussed in the following sections.
+               \subsection{SchemaSpy legacy/meta data extraction}
+SchemaSpy source code has provided the meta data extraction routine. The only 
job of this routine is to initialize the connection to the database and 
retrieve its meta data at the very beginning of the execution (before any 
actual SchemaFuzz code is run). These meta data include data types, table and 
table column names, views and foreign/primary key constraints. Having this pool 
of meta data under the shape of Java objects allows the main program to 
properly frame what the possibilities  [...]
 
+\clearpage
 
-\begin{figure} [htbp]
+\begin{figure} [h!]
 \centering
-\includegraphics[scale=1]{MetaDataExtractionDiagram-1.pdf}
-\caption{Objects returned by the metadata extraction routine.}
+\includegraphics[width=\textwidth]{MetaDataExtractionDiagram-1.pdf}
+\caption{Objects returned by the meta data extraction routine.}
 \end{figure}
 
 In order to do that, the user shall provide this set of mandatory database 
related arguments
@@ -100,21 +107,24 @@ In order to do that, the user shall provide this set of 
mandatory database relat
                        \end{itemize}
                \subsection{SchemaFuzz Core}            
                        \subsubsection{Constrains}
-The target database often contains contraints on one or several tables. These 
constraints have to be taken into account in the process of fabricating 
mutations as most of the time they restrict the possible values that the 
pointed field can take. This restriction can take the shape of a \underline 
{Not Null} constraint, \underline{Check} constraint, {Foreign key} constraint 
(value has to exist in some other table's field) or \underline{Primary key} 
constraint (no doublets of value allowe [...]
+The target database often contains constraints on one or several tables. These 
constraints have to be taken into account in the process of fabricating 
mutations as most of the time they restrict the possible values that the 
pointed field can take. This restriction can take the shape of a \underline 
{Not Null} constraint, \underline{Check} constraint, {Foreign key} constraint 
(value has to exist in some other table's field) or \underline{Primary key} 
constraint (no doublets of value allow [...]
+\bigskip
 
-\begin{figure} 
+\begin{figure} [h!]
 \centering
-\includegraphics[scale=1]{ForeignKeyClassDiagram-1.pdf}
-\caption{Objects returned by the metadata extraction routine.}
+\includegraphics[width=\textwidth]{ForeignKeyClassDiagram-1.pdf}
+\caption{Class diagram of the ForeignKeyConstraint Java object}
 \end{figure}
 
-The last two ones are the problematic ones. They imply specific work before 
applying any mutations to make sure that the value respect all the 
restrictions. before doing anything else after the metadata extraction is done, 
SchemaFuzz performs an update of all the existing constraints on the database 
to add the CASCADE clause. This allows the values bonded by a foreign key 
constraints to take effect. This update reverts to take the constraints back to 
their initial state before the progra [...]
-                               \paragraph{Primary key contraints (PKC)} :
+\bigskip
+
+The last two ones are the problematic ones. They imply specific work before 
applying any mutations to make sure that the value respect all the 
restrictions. before doing anything else after the meta data extraction is 
done, SchemaFuzz performs an update of all the existing constraints on the 
database to add the CASCADE clause. This allows the values bonded by a foreign 
key constraints to take effect. This update reverts to take the constraints 
back to their initial state before the progr [...]
+                               \paragraph{Primary key constraints (PKC)} :
 The primary key constraints require an extra DB query that checks the 
existence of the value in the column. If the value already exists (the query's 
result is not empty), the mutation will be dropped before being executed.
-                               \paragraph{Foreign key contraints (FKC)} :
-The foreignKey constraint is the trickiest one. Its inherent nature bonds two 
values of different table column where the value being referenced is called the 
father, and the referecing field, the child. To be precise, in order to change 
one of the two values, the other has to be changed accordingly in the same 
statement.SchemaFuzz uses the power of the CASCADE clause to make the change 
possible. This clause allows the DRBMS to automaticly change the value of the 
child if the father has b [...]
+                               \paragraph{Foreign key constraints (FKC)} :
+The foreignKey constraint is the trickiest one. Its inherent nature bonds two 
values of different table column where the value being referenced is called the 
father, and the referencing field, the child. To be precise, in order to change 
one of the two values, the other has to be changed accordingly in the same 
statement.SchemaFuzz uses the power of the CASCADE clause to make the change 
possible. This clause allows the DBMS to automatically change the value of the 
child if the father has [...]
 This mechanic allows to change any of the bounded values by changing the 
father's value.
-To do so, the software has a way to tranfert the mutation from a child to its 
parent (called the mutationTransfert).
+To do so, the software has a way to transfer the mutation from a child to its 
parent (called the mutationTransfer).
 
                                
                        \subsubsection{Mutations}
@@ -122,71 +132,75 @@ To do so, the software has a way to tranfert the mutation 
from a child to its pa
 A mutation is a Java object that bundles all the informations that are used to 
perform a modification in the database. Every is linked to its parent and 
inherits some of his parent's data. In the case of a follow up mutation the 
child inherits the the database row that was his parent's target.Therefore the 
initial state (state before the injection of the modification) of its target is 
exactly the final state (state after injection of the modification) of his 
parent's target. A mutation i [...]
 It also holds the informations concerning the result of the injection in the 
shape of a data vector. This data vector is then used to perform a clustering 
calculus to determine the "uniqueness" of the mutation. This value is also 
stored inside the mutation object and is used as the weight of this mutation in 
the tree.
 
-\begin{figure} 
+\clearpage
+
+\begin{figure} [h!]
 \centering
-\includegraphics[scale=1]{MutationClassDiagram-1.pdf}
+\includegraphics[width=\textwidth]{MutationClassDiagram-1.pdf}
 \caption{Structure of a Mutation}
 \end{figure}
 
+\bigskip
                                
-                               \paragraph{Choosing patern}
-For each iteration of the main loop, a modification has to be picked up as the 
next step in the fuzzing proccess. This is done by concidering the current 
state of the tree.
+                               \paragraph{Choosing pattern}
+For each iteration of the main loop, a modification has to be picked up as the 
next step in the fuzzing process. This is done by considering the current state 
of the tree.
 Three parallel code paths can be triggered from this point.
                                \begin{itemize}
                                \item{Continue on the current branch of the 
tree (triggered if the last mutation scored better than its parent)}
                                \item{Pick an existing branch in the tree and 
grow it (triggered if the last mutation scored worse than its parent on a 50/50 
chance with the next bullet)}
                                \item{Start a new branch (triggered if the last 
mutation scored worse than its parent on a 50/50 chance with the previous 
bullet)}
                                
-\begin{figure} 
+\begin{figure}[h!]
 \centering
-\includegraphics[scale=1]{pickingPaternDiagram.pdf}
-\caption{picking Patern schema}
+\includegraphics[width=\textwidth]{pickingPaternDiagram.pdf}
+\caption{picking Pattern schema}
 \end{figure}                           
                                
                                \end{itemize}
 A branch is a succession of mutation that share the same database row as their 
modification target.
-The heuristics determining the next mutation's modification are still very 
primitive and will be thinly ajusted in futures versions.                       
                                     
+The heuristics determining the next mutation's modification are still very 
primitive and will be thinly justed in futures versions.                        
                                     
                                \paragraph{Creating malformed data} 
 As the goal of running this tool is to submit unexpected or invalid data to 
the target software it is necessary to understand what t
-Fuzzing a complex type such a timestamp variable has nothing to do with 
fuzzing a trivial boolean. In practice, A significant part o
-and this matter could absolutly be the subject of a more abstract work. We 
focused here on a very simple approach (as a first step).
+Fuzzing a complex type such a timestamps variables has nothing to do with 
fuzzing a trivial boolean. In practice, A significant part o
+and this matter could absolutely be the subject of a more abstract work. We 
focused here on a very simple approach (as a first step).
 After retrieving the current row being fuzzed (may it be a new row or a 
previously fuzzed row), the algorithm explores the different
 The algorithm then builds the possible modification for each of the fields for 
the current row.
 At the moment, the supported types are : % add a list of the supported types.
 More primitives types will be added in the future.
-The possible modifications that this tool can produce at the moment are : % 
add complete list of the modifications that CAN be gener$
+The possible modifications that this tool can produce at the moment are : \\ % 
add complete list of the modifications that CAN be gener$ 
                                Int Types:
                                \begin{itemize}
                
-                                       \item Extreme values (0-32676 (int) 
etc...)
-                                       \item Random value (0<value<32676 (int) 
etc...)
-                                       \item Increment/Decrement the existing 
value (332 -> 333 OR 332 -> 331)
+                                       \item Extreme values ($0 \mapsto 
\texttt{MAXVALUE}$ etc...)
+                                       \item Random value 
($0<\texttt{value}<\texttt{MAXVALUE}$ etc...)
+                                       \item Increment/Decrement the existing 
value ($332 \mapsto 333$ OR $332 \mapsto 331$)
                                \end{itemize}
                                String Types:
                                \begin{itemize}
-                       
-                                       \item Change string to "aaa" ("Mount 
Everest" -> "aaa")
-                                       \item Increment/Decrement ASCII 
character at a random position in the string ("Mount Everest" -> "Mount 
Fverest")
-                                       Boolean
-                                       \item Swaping the existing value (F -> 
T OR T -> F)
-                                       \end{itemize}
-                                       Date Types : (! IMPLEMENTED BUT NOT 
FULLY FUNCTIONNAL)                                  
-                                       \begin{itemize}
+                                       \item Change string to "aaa" ("Mount 
Everest" $\mapsto$ "aaa")
+                                       \item Increment/Decrement ASCII 
character at a random position in the string ("Mount Everest" $\mapsto$ "Mount 
Fverest")
+                               \end{itemize}
+                                       Boolean Types:
+                               \begin{itemize}                                 
        
+                                       \item Swapping the existing value (F 
$\mapsto$ T OR T $\mapsto$ F)
+                               \end{itemize}
+                                       Date Types: (implemented but not fully 
functional)                      
+                               \begin{itemize}
                                        \item Increment/Decrement date by 1 
day/minutes depending on the precision of the date
-                                       \item Set date to 00/00/0000
+                                       \item Set date to $00/00/0000$ 
                                \end{itemize}
 Obviously, these "abnormal" values might in fact be totally legit in some 
cases. in that case the analyzer 
-will rank the mutation rather poorly, which will lead to this tree path not 
being very likely to be developped further more.
+will rank the mutation rather poorly, which will lead to this tree path not 
being very likely to be developed further more.
                                \\*
-                               \paragraph{Sql handling}
-All the SQL statements are generated within the code. This means that the data 
concerning the current and future state of the mutations have to be very 
precise. Otherwise, the SQL statement is very likely to fail. Sadly, since 
SchemaFuzz only supports postgreSQL, the implemented synthax follow the one of 
postgres
-DBMS. This is already a very big axis for future improvements and will be 
detailled in the dedicated section.
-The statement is built to target the row as precisely as possible, meaning 
that it uses all of the non fuzzed values from the row to avoid updating other 
row accidently. Only the types that can possibly be fuzzed will be used in the 
building of the SQL statement. Since this part of the code is very delicate in 
the sense that it highly depends on an arbitrary large pool of variables from 
various types it is a good bugg provider. 
+                               \paragraph{SQL handling}
+All the SQL statements are generated within the code. This means that the data 
concerning the current and future state of the mutations have to be very 
precise. Otherwise, the SQL statement is very likely to fail. Sadly, since 
SchemaFuzz only supports postgreSQL, the implemented syntax follow the one of 
postgres
+DBMS. This is already a very big axis for future improvements and will be 
detailed in the dedicated section.
+The statement is built to target the row as precisely as possible, meaning 
that it uses all of the non fuzzed values from the row to avoid updating other 
row accidentally. Only the types that can possibly be fuzzed will be used in 
the building of the SQL statement. Since this part of the code is very delicate 
in the sense that it highly depends on an arbitrary large pool of variables 
from various types it is a good bug provider. 
                                
                                \paragraph{Injecting} :
-The injection process sends the built statement to the DBMS so that the 
modification can be operated. After the execution of the query, depending of 
the output of the injection (one modification, several modifications, tranfer) 
informations are updated so that they can match the database state after the 
modification. If the modification failed, no trace of this mutation is kept, it 
is erased and running goes on like nothing happenned.                           
      
-                               \paragraph{Special Case(MutationTransfert)} :
-The mutation tranfert is a special case of a modification being applied to the 
database.
+The injection process sends the built statement to the DBMS so that the 
modification can be operated. After the execution of the query, depending of 
the output of the injection (one modification, several modifications, transfer) 
informations are updated so that they can match the database state after the 
modification. If the modification failed, no trace of this mutation is kept, it 
is erased and running goes on like nothing happened.                            
     
+                               \paragraph{Special Case(MutationTransfer)} :
+The mutation transfer is a special case of a modification being applied to the 
database.
 It is triggered when the value that was supposed to be fuzzed is under the 
influence of a FKC as the child.
 In the case a FKC (In CASCADE mode), only the father can be changed, which 
also triggers the same modification on all of his children. The algorithm then 
"transfers" the modification from the original mutation to its father.
 After injecting the transfered mutation, the children mutation is indeed 
modified but the modification "splashed" on some parts of the database that was 
not meant to be changed.
@@ -198,27 +212,31 @@ Undoing a mutation applies the exact opposite 
modification that was originally a
 Reverting mutations is the key to flawlessly shifting the current position in 
the mutation tree.
 The case of the transfered mutation is no exception to this. In this case, the 
mutation applied changes on an unknown number of fields in the database. But, 
the FKC still bounds all the children to their father at this point (this is 
always the case unless this software is not used as intended).  
 Changing the father's field value back to its original state will splash the 
original values back on all the children.
-This mechanism might trigger failing mutations in some cases (usually 
mutations following a tranfer). This issue will be addressed in the known 
issues section. 
+This mechanism might trigger failing mutations in some cases (usually 
mutations following a transfer). This issue will be addressed in the known 
issues section. 
                        \subsubsection{Tree Based data structure}
-All the mutations that are injected at least once in the course of the 
execution of this software will be stored properly in a tree data structure. 
Having such a data structure makes parent-children relations between mutations 
possible. The tree follows the traditionnal definition of the a n-ary 
algorithmic tree.
+All the mutations that are injected at least once in the course of the 
execution of this software will be stored properly in a tree data structure. 
Having such a data structure makes parent-children relations between mutations 
possible. The tree follows the traditional definition of the a n-ary 
algorithmic tree.
 It is made of nodes (mutations) including a root (first mutation to be 
processed on a field selected randomly in the database)  
 Each node has a number of children that depends on the ranking its mutation 
and the number of potential modifications that it can perform.
-                               \paragraph{Weight} :
-Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an intresting impact on the target program behavior 
(if it triggered new buggs or uncommon code paths) than this value is high and 
vice-versa. The weight is then used as a mean of determining the upcomming 
modification. The chance that a mutation gets a child is directly proportionnal 
to its weight.
+                               \paragraph{Weight}
+Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an interesting impact on the target program 
behavior (if it triggered new bugs or uncommon code paths) than this value is 
high and vice-versa. The weight is then used as a mean of determining the 
upcoming modification. The chance that a mutation gets a child is directly 
proportional to its weight.
 This value currently isn't biased by any other parameter, but this might 
change in the future.  
                                \paragraph{Path}
-Since the weighting of the mutation allows to go back to previously more 
intresting mutations, 
-there is a need for a path finder mechanism. Concretly, this routines resolves 
the nodes that separate nodes A and B in the tree. A and B might be children 
and parent but can also belong to complitely different branches. This path is 
then given to the do/undo routine that processes back the modifications to set 
the database up in the required state for the upcomming mutation. 
+Since the weighting of the mutation allows to go back to previously more 
interesting mutations, 
+there is a need for a path finder mechanism. Concretely, this routines 
resolves the nodes that separate nodes A and B in the tree. A and B might be 
children and parent but can also belong to completely different branches. This 
path is then given to the do/undo routine that processes back the modifications 
to set the database up in the required state for the upcoming mutation. 
 
-\begin{figure} 
+\bigskip
+
+\begin{figure}[h!] 
 \centering
-\includegraphics[scale=1]{CommonAncestorDiagram.pdf}
-\caption{Objects returned by the metadata extraction routine.}
+\includegraphics[width=\textwidth]{CommonAncestorDiagram.pdf}
+\caption{Objects returned by the meta data extraction routine.}
 \end{figure}
+
+\bigskip
                        \subsubsection{The analyzer}
-Analyzing the output of the target programm is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation intrest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
+Analyzing the output of the target program is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation interest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
                                \paragraph{Stack Trace Parser}
-The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevent 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthetises the stack trace values as well as additionnal 
intresting information usefull for other mechanisms (that also require 
parsing). These additionnal informations include the [...]
+The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevant 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthesizes the stack trace values as well as additional 
interesting information useful for other mechanisms (that also require 
parsing). These additional informations include the p [...]
                                \paragraph{Hashing}
 In order to be used in the clustering algorithm, the stack trace of a mutation 
has to be hashed.
 Hashing is usually defined as follows : 
@@ -226,13 +244,15 @@ Hashing is usually defined as follows :
 "A hash value (or simply hash), also called a message digest, is a number 
generated from a string of text. The hash is substantially smaller than the 
text itself, and is generated by a formula in such a way that it is extremely 
unlikely that some other text will produce the same hash value."
                                \end{quotation}
                                
-In the present case, we used a different approach. Since proximity beetween 
two stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of the two strings. 
+In the present case, we used a different approach. Since proximity between two 
stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of the two strings. 
 In that regards, we implemented a version of the Levenshtein Distance 
algorithm.
 This algorithm can roughly be explain by the following :
                                \begin{quotation}
 "The Levenshtein distance between two words is the minimum number of 
single-character edits (insertions, deletions or substitutions) required to 
change one word into the other."
                                \end{quotation}                          
-After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accuratly yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
+After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accurately yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
+
+\clearpage
 
 \begin{figure} 
 \centering
@@ -243,81 +263,89 @@ After hashing the file name and the function name into 
numerical values trough L
   E & X & A & M & P & L & E & S \\
   \hline  
 \end{tabular}
-\caption{Exemple of the levenshtein distance concept.}
+\caption{Example of the levenshtein distance concept.}
 \end{figure}
 
-The Distance for this exemple is 2/8x100
+The distance for this example is $2\div8\times100$
 
                                \paragraph{The Scoring mechanism}
-The "score" (or rank) of a mutation is a numerical value that reflects its 
intrest. This value is calculated through a modified version of a clustering 
method that computes an n-uplet                        into a integer depending 
on the sum of the euclidian distances from the n-uplet to the existing 
centroids (groups of mutation's n-uplets that were already processed).
-This value is then set as the mutation's rank and used as a mean of chosing 
the upcomming mutation.
-\begin{figure} 
+The "score" (or rank) of a mutation is a numerical value that reflects how 
interesting the outcome was. Crashes and unexpected behavior to raise this 
value whereas no crash tend to lower it. This value is calculated through a 
modified version of a clustering method that computes an n-uplet into a integer 
depending on the sum of the Euclidean distances from the n-uplet to the 
existing centroids (groups of mutation's n-uplets that were already processed).
+This value is then set as the mutation's rank and used as a mean of choosing 
the upcoming mutation.
+
+\begin{figure} [h!]
   \includegraphics[width=\textwidth]{Scoring.png}
 \end{figure}   
                \subsection{Known issues}               
-About one mutation out of 15 will fail for unvalid reaseons.
-                       \subsubsection{Context Cohorence}
+About one mutation out of 15 will fail for invalid reasons.
+                       \subsubsection{Context Coherence}
 A significant amount of the failing mutations do so because of the transfer 
mechanism. As said in the dedicated section, this mechanism applies more than 
one change to the database (Potentially the whole database). In specific case, 
this property can become problematic. 
-More specificaly, when the main loop identifies a mutation's child as the 
upcomming mutation and its parent row has been splashed with the effect of a 
transfer. In this case, the data embedded in the schemaFuzz data structure may 
not match the data that are actually in the database, this delta will likely 
induce a wrongly designed SQL statement that will result in a SQL failure 
(meaning that 0 row were updated by the statement).
+More specifically, when the main loop identifies a mutation's child as the 
upcoming mutation and its parent row has been splashed with the effect of a 
transfer. In this case, the data embedded in the schemaFuzz data structure may 
not match the data that are actually in the database, this delta will likely 
induce a wrongly designed SQL statement that will result in a SQL failure 
(meaning that 0 row were updated by the statement).
                        \subsubsection{Foreign Key constraints}                 
 For a reason that is not yet clear, some of the implied FKC of the target 
database can't be properly set to CASCADE mode. This result in a FKC error 
(mutation fails but the program can carry on)                     
                        \subsubsection{Tests}
-Besides the test suit written by the SchemaSpy team for their own tool (still 
implemented in SchemaFuzz for the meta data extraction), the tests for this 
project are very poor. Their are only very few of them and their utility is 
debatable. This is due to the lack of experience in that regard of the main 
developper. Obviously, we are very well aware of this being a really severe 
flaw in this project and will be one of the main future improvements.
-This big lack of good maintenance equipment might also explain some of the 
silent and non silent buggs that still remain in the code to this day.
+Besides the test suit written by the SchemaSpy team for their own tool (still 
implemented in SchemaFuzz for the meta data extraction), the tests for this 
project are very poor. Their are only very few of them and their utility is 
debatable. This is due to the lack of experience in that regard of the main 
developer. Obviously, we are very well aware of this being a really severe flaw 
in this project and will be one of the main future improvements.
+This big lack of good maintenance equipment might also explain some of the 
silent and non silent bugs that still remain in the code to this day.
 
                        \subsubsection{Code Quality}
-We are well aware that this tool's source code is of debatable quality. This 
fact induces the  buggs and unexpected behaviors discussed earlier on some 
components of this program. 
+We are well aware that this tool's source code is of debatable quality. This 
fact induces the  bugs and unexpected behaviors discussed earlier on some 
components of this program. 
 The following points constitute the main flaw of the source code:
                        \begin{itemize}
-                       \item Hard to maintain. The code is not optimised 
either in term of size or                     efficency. Bad coding habits tend 
to make it rather weak and unstable to context changes.
+                       \item Hard to maintain. The code is not optimized 
either in term of size or                     efficiency. Bad coding habits 
tend to make it rather weak and unstable to context changes.
                        \item Structure is not intuitive. The main loop of the 
program lacks a good             structure.
                        \end{itemize}
+                       
+                       \clearpage
 
-       \section{Results and exemples}
-In the proccess of being written.
+       \section{Results and examples}
+In the process of being written.
 
-       \section{Upcomming features and changes}
+                       \clearpage
+       \section{Upcoming features and changes}
 This section will provide more insights on the future features that 
might/may/will be implemented as well as the changes in the existing code.
-Any sugestion will be greatly appriciated as long as it is relevent and well 
argumented. All the relevent information regarding the contributions are 
detailled in the so called section.
+Any suggestion will be greatly appreciated as long as it is relevant. All the 
relevant information regarding the contributions are detailed in the so called 
section.
 
                \subsection{General Report}
-In its future state, SchemaFuzz will generate a synthesized report concerning 
the overall execution of the tool (which it does not do right now). This 
general report will primarely contain the most "intresting" mutations (meaning 
the mutations with the highest score mark) for the whole run.
+In its future state, SchemaFuzz will generate a synthesized report concerning 
the overall execution of the tool (which it does not do right now). This 
general report will primarily contain the most "interesting" mutations (meaning 
the mutations with the highest score mark) for the whole run.
 A more advanced version of this report would also take into account the code 
coverage rate for each mutation and execute a last clustering round at the end 
of the execution to generate a "global" score that would represent the global 
value of each mutations.
        
                \subsection{Code coverage}
-We are considering changing or simply adding code coverage in the clustering 
method as a parameters.Not only would this increase the accuracy of the scoring 
but also increase the accuracy of the "type" of each mutation. To this day, 
this tool does not make a concrete difference in terms of scoring or 
information generating (reports) beetween a mutation with a new stack trace in 
a very common code path and a very common stack trace in a very rarely 
triggered code path.
+We are considering changing or simply adding code coverage in the clustering 
method as a parameters.Not only would this increase the accuracy of the scoring 
but also increase the accuracy of the "type" of each mutation. To this day, 
this tool does not make a concrete difference in terms of scoring or 
information generating (reports) between a mutation with a new stack trace in a 
very common code path and a very common stack trace in a very rarely triggered 
code path.
 
                \subsection{Data type Pre-analyzing}
 This idea for this feature to be is to implement some kind of "auto learning" 
mechanism.
-To be more precise, this routine is meant to performed a statistical analysis 
on a representative portion database's content. This analysis would provide the 
rest of the program the commun values encountered for each field. More 
genericly, this would allow the software to have a global view over the format 
of the data that the database holds.
-Such global understanding of the content format is very intresting to make the 
modifications possibilites nore relevent. Indeed, one of the major limitation 
of SchemaFuzz is its "blindness".
-That is to say that some of the modifications performed in the course the 
execution of the program are irrelevent due to the lack of information on what 
is supposed to be stored in this precise field.
+To be more precise, this routine is meant to performed a statistical analysis 
on a representative portion database's content. This analysis would provide the 
rest of the program the most common values encountered for each field. More 
generically, this would allow the software to have a global view over the 
format of the data that the database holds.
+Such global understanding of the content format is very interesting to make 
the modifications possibilities more relevant. Indeed, one of the major 
limitation of SchemaFuzz is its "blindness".
+That is to say that some of the modifications performed in the course the 
execution of the program are irrelevant due to the lack of information on what 
is supposed to be stored in this precise field.
 For instance, a field that only holds numerical values that go from 1 to 1000 
even if it has enough bits to encode from -32767 to 32767 would have a very low 
chance of triggering a crash if this software modifies its value from 10 to 55.
 on the other end, if the software modifies this very same field from 10 to 
-12000, then a crash is much more likely to pop up.
 Same principle applies to strings. Suppose a field can encode 10 characters.
-the pre analysis, detected that, for this field, most of the value were 
surnames beginning with the letter "a". Changing this field from "Sylvain" to 
"Sylvaim" will probably not be very effective. However, changing this same 
field from "Sylvain" to "NULL" might indeed triggered an unexpected behavior. 
+the pre-analysis, detected that, for this field, most of the value were 
surnames beginning with the letter "a". Changing this field from "Sylvain" to 
"Sylvaim" will probably not be very effective. However, changing this same 
field from "Sylvain" to "NULL" might indeed triggered an unexpected behavior. 
   
-This pre-analysis routine would only be executed once at the start of the 
execution, right after the meta data exctraction. The result of this analysis 
will be held by a specific object. 
+This pre-analysis routine would only be executed once at the start of the 
execution, right after the meta data extraction. The result of this analysis 
will be held by a specific object. 
 this object's lifespan is equal to the duration of the main loop's execution 
(so that every mutation can benefits from the analysis data.)
                
-               \subsection{Centralised anonymous user data}
-SchemaFuzz's efficiency in thightly linked to the quality of its heuristics. 
this term includes the following points 
+               \subsection{Centralized anonymous user data}
+SchemaFuzz's efficiency is tightly linked to the quality of its heuristics. 
this term includes the following points 
                \begin{itemize}
                \item{Quality of the possible modifications for a single field}
                \item{Quality of the possible modifications for each data type}
                \item{Quantity of possible modifications for a single field}
                \item{Quantity of supported data types}
                \end{itemize}
-Knowing this, we are also concidering for futures enhancements an anonymous 
data collection  for each execution of this tool that will be statisticly 
computed to determine the best modification in average. This would improve the 
choosing mechanism by balancing the weights  depending on the modifcation's 
average quality. Modifications with higher average quality would see their 
weight increased (meaning they would get picked more frequently) and vice 
versa.                   
+Knowing this, we are also considering for futures enhancements an anonymous 
data collection  for each execution of this tool that will be statistically 
computed to determine the best modification in average. This would improve the 
choosing mechanism by balancing the weights  depending on the modification's 
average quality. Modifications with higher average quality would see their 
weight increased (meaning they would get picked more frequently) and vice 
versa.                        
+
 
-\includepdf[pages=-]{PersonnalExperience.pdf}
 
        \section{Contributing}
 You can send your ideas at  \\*
                address@hidden
 Or directly create a pull request on the official repository to edit this 
document and/or the code itself
+
+
+\appendix      
+\newpage
+\input{PersonnalExperience.tex}
        
-       
-       
+                       
 \end{empfile}
 \end{document} 
diff --git a/docs/PersonnalExperience.pdf b/docs/PersonnalExperience.pdf
deleted file mode 100644
index 10461b3..0000000
Binary files a/docs/PersonnalExperience.pdf and /dev/null differ
diff --git a/docs/PersonnalExperience.tex b/docs/PersonnalExperience.tex
index 01f8492..565e2c1 100644
--- a/docs/PersonnalExperience.tex
+++ b/docs/PersonnalExperience.tex
@@ -1,33 +1,24 @@
-\documentclass{article}
-\usepackage[english]{babel}
 
-\title{Documentation for schemaFuzz}
-\author{Ulrich "Feideus" Erwan}
-
-\begin{document}
-
-
-
-\section{Internship organisation}
+\section{Internship organisation} 
        \subsection{Introduction}
 
-This section is meant to be added to the University version of this 
documentation. It will be written as Erwan Ulrich and will focus on the 
different aspects of the organisation of the project. The folllowing text will 
also be written with a more personnal and more critical point of view as a mean 
of self analysement.
+This section is meant to be added to the University version of this 
documentation. It will be written as Erwan Ulrich and will focus on the 
different aspects of the organization of the project. The following text will 
also be written with a more personal and more critical point of view as a mean 
of self analyze.
 
        \subsection{Calendars}
        
 The    SchemaFuzz project has had since its genesis a quiet clear view of how 
the development should evolve. The desired features have been discussed and the 
big picture had been designed to fit the time that the main developer had for 
his work at this position.
-The project had to pass trough different phases of development that are 
detailed in the following timeline diagram. %% insert timeline diagram here.
+The project had to pass trough different phases of development that are 
detailed in the following time line diagram. %% insert timeline diagram here.
 
-Some of the tasks of the above timeline were completed on time, some others 
were delivered late, and some were delayed in the timeline because of the 
previous point.
-In the end, the project was lead in a way that is best described by the 
following timeline diagram.    %% insert timeline diagram here.
+Some of the tasks of the above time line were completed on time, some others 
were delivered late, and some were delayed in the time line because of the 
previous point.
+In the end, the project was lead in a way that is best described by the 
following time line diagram.    %% insert timeline diagram here.
 
 Those two diagrams differ on some points. This is one of the major failures 
for the development of this project throughout the course of these 6 months. 
 There are several reasons that explain why this project could have been lead 
in a better way.
-they will be detailled and discussed in the next section. 
+they will be detailed and discussed in the next section. 
 
-       \subsection{Organisationnal failures}
+       \subsection{Organizational failures}
 This section has a particular value in this report, it is on the first hand a 
description of why the SchemaFuzz did not meet all of its defined goals.
-Other the other hand, it is a personnal reminder of what should be improved in 
my work habbits and general organisation when leading a project of such a large 
size. 
+Other the other hand, it is a personal reminder of what should be improved in 
my work habits and general organization when leading a project of such a large 
size. 
        
        \begin{itemize}
        \item{Defining tasks/features as daily/weekly sub goals}
@@ -36,64 +27,65 @@ Other the other hand, it is a personnal reminder of what 
should be improved in m
        \end{itemize}           
 
        \subsection{Positive outcomes}
-Throughout the development of the project, I have had the chance to acquire 
many new capacities and improve many of my own skills. I will give more 
insights on what this project and, more genericly, what this intership as a 
developer for a GNU package, has brought me.
+Throughout the development of the project, I have had the chance to acquire 
many new capacities and improve many of my own skills. I will give more 
insights on what this project and, more generically, what this internship as a 
developer for a GNU package, has brought me.
 
                \subsubsection{Technical aspect}
                
                \paragraph{Java language}
 In many ways, this project has been a real challenge. But the main difficulty 
that I encountered was the technical challenge that rose up when the project 
started. Indeed, it was my first time conducting a project of the size of 
SchemaFuzz. The size of the project and the fact that I was the only one 
developing the tool implied that every aspect of the project, independently of 
the language that was used for each module, had to be imagined and implemented 
with my two hands.
-Even if I was already accostumed to Java programming, I got struck by the 
complexity and the architecture of a "real" in-production software like 
SchemaSpy which I had to look into to get the metadata extraction routine.
-This was my first improvement. Code structure. Even if my coding capacitites 
can still be perfected in many ways, I feel like understanding/re-using complex 
and well structured code gave me a much better idea of what "good code" really 
is. Integreting these concepts enpowered my development skills and I am now 
much more confident about it.
+Even if I was already accustomed to Java programming, I got struck by the 
complexity and the architecture of a "real" in-production software like 
SchemaSpy which I had to look into to get the meta data extraction routine.
+This was my first improvement. Code structure. Even if my coding capacities 
can still be perfected in many ways, I feel like understanding/re-using complex 
and well structured code gave me a much better idea of what "good code" really 
is. Integrating these concepts empowered my development skills and I am now 
much more confident about it.
 
-Apart from the Java language, which I was already familiar with, I also had 
the chance to get my hands of new technologies (or technologies I never really 
had the chance to pratice in real conditions). 
+Apart from the Java language, which I was already familiar with, I also had 
the chance to get my hands of new technologies (or technologies I never really 
had the chance to practice in real conditions). 
 
                        \paragraph{SQL language}
-SchemaFuzz is a database fuzzer. Naturally, A major component of the work for 
its development was to create and handle SQL requests and responses. In order 
to do that, I had to document myself for a while as I was lacking some 
knowledge on databases in general. After gaining a better understanding of how 
databases operate theoraticly, I had to go into more depth concerning the inner 
structure of constraints and the way datatypes are encoded for most DMBS.
+SchemaFuzz is a database fuzzer. Naturally, A major component of the work for 
its development was to create and handle SQL requests and responses. In order 
to do that, I had to document myself for a while as I was lacking some 
knowledge on databases in general. After gaining a better understanding of how 
databases operate theoretically, I had to go into more depth concerning the 
inner structure of constraints and the way data types are encoded for most DMBS.
 This brings me to my next point regarding the handling of SQL in this project.
 
                        \paragraph{DBMS(PostgreSQL)} 
-SchemaFuzz's first and formost import goal is to help in the debugging and 
maintenance of the GNU Taler payement system. GNU Taler databases are managed 
by the PostgreSQL DBMS. Therefore, the natural choice of technology for SQL 
management in this project was obvious.
-Not having ever worked with PostgreSQL before, I had to adapt my habbits when 
dealing with the DBMS itself.
-By doing so, and stumbling on error messages I had never seen before, I had 
the chance to get into more depth in the structure of DBMSes in general. In 
particular, I had to get my hands on the inner PostgreSQL tables in order to 
understand how different databases were managed within the same environnement.
+SchemaFuzz's first and foremost import goal is to help in the debugging and 
maintenance of the GNU Taler payment system. GNU Taler databases are managed by 
the PostgreSQL DBMS. Therefore, the natural choice of technology for SQL 
management in this project was obvious.
+Not having ever worked with PostgreSQL before, I had to adapt my habits when 
dealing with the DBMS itself.
+By doing so, and stumbling on error messages I had never seen before, I had 
the chance to get into more depth in the structure of DBMSes in general. In 
particular, I had to get my hands on the inner PostgreSQL tables in order to 
understand how different databases were managed within the same environmental.
 
                        \paragraph{Shell/Bash Scripting} 
-As a part of the development of the analyzer for SchemaFuzz, I have had the 
chance to build up several bash scripts. This excercice was to me a true 
pleasure as well as very instructive.
-Spending some time on writting parsing script had me look into how parsing is 
usually implemented for such jobs.
-Having this experience with me, I now better understand how each and every 
componenent of a same project connects to each other. 
-Even though I was aware of the power of scripting in general, I have now come 
to understand how much of a crucial skill it is to understand and be able to 
write scripts when working in a Linux environement.
-In the big picture, I feel like I have earned a precious asset by practicing 
scripting on a technical level. This also gave me the chance to develop my own 
script in the frame of personnal use in my own environnement. Going through 
more conceptual and theoratical documents on what scripting really is and how 
it should be used.
+As a part of the development of the analyzer for SchemaFuzz, I have had the 
chance to build up several bash scripts. This  was to me a true pleasure as 
well as very instructive.
+Spending some time on writing parsing script had me look into how parsing is 
usually implemented for such jobs.
+Having this experience with me, I now better understand how each and every 
component of a same project connects to each other. 
+Even though I was aware of the power of scripting in general, I have now come 
to understand how much of a crucial skill it is to understand and be able to 
write scripts when working in a Linux environment.
+In the big picture, I feel like I have earned a precious asset by practicing 
scripting on a technical level. This also gave me the chance to develop my own 
script in the frame of personal use in my own environmental. Going through more 
conceptual and theoretical documents on what scripting really is and how it 
should be used.
                        
                         \paragraph{LateX}
-By writting this documentation, I had to learn how to create and process 
properly presented and properly styled scientific documents. In this process, I 
have first learned and then practiced LateX as well as the very handy Tikz and 
metaUML packages used for graphical representations.
-Creating and implementing (in this case) graphics I did not concider to be a 
real coding challenge, but sone of them proved me terribly wrong. Spending time 
on finding the right synthax for what I wanted to show strenghend my project 
management skills and conforted me in the belief that presentation and creation 
of a project are two sides if the same coin and that both should be treated 
with the same amount of seriousness.                       
+By writing this documentation, I had to learn how to create and process 
properly presented and properly styled scientific documents. In this process, I 
have first learned and then practiced LateX as well as the very handy Tikz and 
metaUML packages used for graphical representations.
+Creating and implementing (in this case) graphics I did not consider to be a 
real coding challenge, but some of them proved me terribly wrong. Spending time 
on finding the right syntax for what I wanted to show strengthened my project 
management skills and comforted me in the belief that presentation and creation 
of a project are two sides if the same coin and that both should be treated 
with the same amount of seriousness.                      
        
 
                \subsubsection{Human aspect}
                
                        \paragraph{Languages}
-The development of my project was conducted in german-speaking environnement, 
which is a language I am not very familiar with. 
-This lead to having any kind of communication both regarding the project and 
other subjects in english. This participated in my improvement in both oral and 
written english (this document is also an excellent training for written 
content) as well as my overall comprehension.
-Apart from the pure linguistic point of view, discussing complex topics in 
english gave me the keys to expressing ideas and concept in a more concise and 
clearer way.                         
+The development of my project was conducted in German-speaking environment, 
which is a language I am not very familiar with. 
+This lead to having any kind of communication both regarding the project and 
other subjects in English. This participated in my improvement in both oral and 
written English (this document is also an excellent training for written 
content) as well as my overall comprehension.
+Apart from the pure linguistic point of view, discussing complex topics in 
English gave me the keys to expressing ideas and concept in a more concise and 
clearer way.                         
 
                        \paragraph{Political maturity}
-Disclaimer. With this paragraph, I am not pushing forward any idea in 
particular, all I intend to do is explain with more detail and insights on how 
rich the environnement was during this intership.                 
+Disclaimer. With this paragraph, I am not pushing forward any idea in 
particular, all I intend to do is explain with more detail and insights on how 
rich the environment was during this internship.                  
                        
 Surprisingly, I have had the chance to meet many people that shared various 
political points of view regarding computer science and technologies. In these 
subjects it was a truly enriching process to debate things such as morality, 
ethic or freedom.
-Some other topics that are further away from science were brought up such as 
veganism, green energies, or anarchism.
+Some other topics that are further away from science were brought up such as 
vegan-ism, green energies, or anarchism.
 I hold very dearly the moments I shared speaking and confronting my own ideas 
because I feel like this has allowed me to gain maturity in my political 
positions.              
+       
+       \clearpage
 
-       \section{Conclusion}
+       \subsection{Conclusion}
    
 The development of SchemaFuzz and my work for GNU Taler was spread out on a 6 
months duration.
-Within this timelapse, I have discovered the fields of research and real 
software development.
-This discovery has been very beneficial to me in the sens that it gave me the 
chance to acquire experience both on the theoratical and technical sides as 
well as mastering some new technologies and new aspects in the field of 
computer science in general.
+Within this time lapse, I have discovered the fields of research and real 
software development.
+This discovery has been very beneficial to me in the sens that it gave me the 
chance to acquire experience both on the theoretical and technical sides as 
well as mastering some new technologies and new aspects in the field of 
computer science in general.
 
 My work for GNU Taler was primarily to imagine,conceptualize and develop a 
database oriented fuzzing tool. 
 First, I focused on bringing the software from a shape of "general idea" that 
was given to me by my internship supervisor to a concrete and structured 
project. In the process of creation, I started with defining what precise 
features were critical and with what technology they would be implemented.
 
 The main task of SchemaFuzz is to inject malformed data into a specific 
database in order to trigger crashes or unexpected behavior from the program 
that uses the content of this database.
-By working on this project for the past 6 months, I have brought it to a point 
where it fulfills its main task. I have uses a sample database contain content 
with a wide variaty in terms of data types to test the project all along the 
course of the development. However, the application is meant to evolve to a 
more advanced state. Such a big project requires much more time than what I had 
to be fully operationnal.
+By working on this project for the past 6 months, I have brought it to a point 
where it fulfills its main task. I have uses a sample database contain content 
with a wide variety in terms of data types to test the project all along the 
course of the development. However, the application is meant to evolve to a 
more advanced state. Such a big project requires much more time than what I had 
to be fully operational.
 
-Finally, I am convinced that the realisation of this project was a truly 
rewarding experience on all academical, technical and human aspects. All the 
knowledge acquired as GNU developer strenghened the concepts I had learned in 
my academical courses. Moreover, this internship is an excellent social 
experience thanks to the amount of contact with very bright professors, PhD 
students and other interns.     
+Finally, I am convinced that the realization of this project was a truly 
rewarding experience on all academical, technical and human aspects. All the 
knowledge acquired as GNU developer strengthened the concepts I had learned in 
my academical courses. Moreover, this internship is an excellent social 
experience thanks to the amount of contact with very bright professors, PhD 
students and other interns.     
    
-\end{document}
\ No newline at end of file
diff --git a/src/main/java/org/schemaspy/DBFuzzer.java 
b/src/main/java/org/schemaspy/DBFuzzer.java
index f9b5c86..26839ec 100755
--- a/src/main/java/org/schemaspy/DBFuzzer.java
+++ b/src/main/java/org/schemaspy/DBFuzzer.java
@@ -211,7 +211,7 @@ public class DBFuzzer
                     //currentMutation.propagateWeight(); //update parents 
weight according to this node new weight
 
                     LOGGER.info("Target is : " + 
analyzer.getCommandLineArguments().getTarget());
-                    ProcessBuilder ep = new ProcessBuilder("/bin/bash", 
"./stackTraceCParser.sh", analyzer.getCommandLineArguments().getTarget(), 
Integer.toString(currentMutation.getId()));
+                    ProcessBuilder ep = new ProcessBuilder("/bin/bash", 
"./stackTraceCParser.sh", analyzer.getCommandLineArguments().getTarget(), 
Integer.toString(currentMutation.getId()), 
analyzer.getCommandLineArguments().getNestedArguments());
                     ArrayList<GenericTreeNode> pathToRoot = 
currentMutation.pathToRoot();
                     Collections.reverse(pathToRoot);
                     for(int i=0; i< pathToRoot.size();i++)
diff --git a/src/main/java/org/schemaspy/cli/CommandLineArguments.java 
b/src/main/java/org/schemaspy/cli/CommandLineArguments.java
index a4c401d..8b93170 100755
--- a/src/main/java/org/schemaspy/cli/CommandLineArguments.java
+++ b/src/main/java/org/schemaspy/cli/CommandLineArguments.java
@@ -175,6 +175,12 @@ public class CommandLineArguments {
 
     @Parameter(
             names = {
+                    "-na", "--nestedArguments",
+            })
+    private String nestedArguments;
+
+    @Parameter(
+            names = {
                     "-port", "--port", "port",
                     "schemaspy.port"
             }
@@ -241,4 +247,5 @@ public class CommandLineArguments {
 
     public String getReport() {return report; }
 
+    public String getNestedArguments() { return nestedArguments; }
 }
diff --git a/stackTraceCParser.sh b/stackTraceCParser.sh
index 7099b0d..f7f8fb6 100755
--- a/stackTraceCParser.sh
+++ b/stackTraceCParser.sh
@@ -1,22 +1,22 @@
 #!/bin/bash
 
-isBinaryInDir=`ls | grep $1`;
-echo $isBinaryInDir
+#isBinaryInDir=`ls | grep $1`;
+#echo $isBinaryInDir
 
-if [[ -n "$isBinaryInDir" ]]
-then
-    echo "chosen binary is : "$1;
-else
-    echo "couldnt find the binary in the current folder";
-    exit 0;
-fi
+#if [[ -n "$isBinaryInDir" ]]
+#then
+ #   echo "chosen binary is : "$1;
+#else
+ #   echo "couldnt find the binary in the current folder";
+  #  exit 0;
+#fi
 
 binaryWithoutExtention=`echo $1 | cut -d '.' -f1`
 echo "saving result in : "$binaryWithoutExtention;
 
 ulimit -c 9999
 echo "--------------------------"
-./$1
+./$1 -c $3 ## -c SHOULD BE IN THE JAVA CODE AS A NESTED ARGUMENT. This is for 
manual testing
 echo "--------------------------"
 
 checkCoreGen=`ls | grep core`;
@@ -30,7 +30,7 @@ else
     exit 0;
 fi
 
-echo bt | gdb test_c_crash.exe errorReports/core | sed -e 's/(gdb) //' | grep 
\# | uniq > errorReports/stackTrace_$binaryWithoutExtention
+echo bt | gdb $1 errorReports/core | sed -e 's/(gdb) //' | grep \# | uniq > 
errorReports/stackTrace_$binaryWithoutExtention
 tmp=`cat errorReports/stackTrace_$binaryWithoutExtention | sed 's/.* in //' | 
cut -d "(" -f1`
 
 echo "function names : "$tmp

-- 
To stop receiving notification emails like this one, please contact
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]