phpgroupware-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Phpgroupware-developers] SQL Style Guide available


From: Michael Dean
Subject: Re: [Phpgroupware-developers] SQL Style Guide available
Date: 07 May 2003 07:32:12 -0500

On Wed, 2003-05-07 at 03:52, Kai Hofmann wrote:
> 2. Naming tables and columns
> 
> Tables and columns should always be named in
> 
> - lower case
> - english language
> 
> and a name should not use more than 64 characters.

The name of a table should not exceed 26 characters.  And the shorter
the better.  Really, it shouldn't exceed a certain length based on a
number of factors.  They generally should be kept to less than 15
characters if at all possible.

> 
> Each table name should describe the tables content in one or two words - 
> use the plural form.

Plural form is a matter of taste and makes the table names longer.

> Each column of a table must start with a shortened version of the table name
> with 
> a maximum of 6 characters. For example use

This should only apply to fields that will be used a foreign keys in
other tables.  For example, if you have a project that is referenced, it
is best to use

Source: phpgw_project.project_id
Ref: phpgw_task.project_id

You can always tell from what table a field came from.

> Please be careful to use each label only in one table.
> So if you have an invoices table as well as an inventory table use
> inv for the invoices and invt for the inventory table.

Should use table name minus phpgw_, adding _id for the key.

> for long fieldnames like description you can use short ones like "descr".

Yep

> Try using the same vocabulary in all tables.
> This nameing scheme will also shorten your select statemtens when using
> multiple 
> tables, because the column names are more
> unique and so you only need to use the table name for the primary/foreign
> key 
> where condition! 

It's still good practice to qualify the tables in your statement so they
are easier to comprehend.  Especially if someone is taking over
maintenance of an app they have never seen before.

> 3. Using datatypes
> 
> Datatypes for the columns should be used wisely and with care.
> Try to use the datatype that fits best and avoid the usage of blob
> datatypes, 
> because they are NOT portable
> between different dbms. A varchar has a maximum length of 255 characters.
> When using 
> DATETIME (prefered over DATE and TIME because DATETIME is supported by more
> dbms) you 
> should read/write to this datatype in ISO8601 format [4]
> "yyyy-mm-ddThh:mm:ss".

Schema proc and the abstraction classes should be responsible for this,
not the developer.  Most of it is already handled.

> Foreign keys for relations must always have the same datatype as the primary
> of the 
> referenced table.

Some servers won't allow you to create foreign keys if they aren't the
same or you don't have a proper index.

> 4. Defining primary keys
> 
> For primary keys always use an integer datatype (for performance reasons).
> Only use another type when this is a real good alternative like for a
> languages table 
> when using the iso 2 character code as primary key (like in chapter 3).
> Primary keys should always be named as "id" if possible:
> 
> cal_id
> event_id
> msg_id
> 
> except for good reasons like in the language example from chapter 3.

Yep - should be tablename_id minus phpgw_ prefix.

> A special trick that is very useful sometime are self referencing tables
> (for example nested categories).
> In such a case you must rename the foreign key because it can not have the
> same name as the primary key.
> 
> categories
> ----------
> cat_id     INT
> cat_name   VARCHAR(32)
> cat_parent INT         (reference to cat_id)

cat_parent_id would be better in this case.

> 6. Defining indexes
> 
> Indexes will speed up the processing of select statements when they are set
> correctly.
> In database systems without real "foreign keys/relations" (like MySQL 3.x)
> you should create 
> an index for each key that references another tables.
> Otherwise MySQL will do a cost intensive full table scan each time.
> You also should create indexes for often used select statements.

You need an index on those foreign key fields regardless of fk support. 
The schema definition is not explicitly defined, it is done through
abstraction.  You should note that MySQL does have fk support with the
INNODB table format, but that has not been abstracted yet.

> 7. 3rd normal form (3NF)
> 
> For a good database design you should always use the 3rd normal form. This
> will avoid double data 
> within your database and has
> many more advantages. For more about database design and the 3rd normal form
> please read [2], [3].
> 
> Interesting online resources can be found under [5].

This should not say always, but something like "generally" or "in most
cases".

> 8. Database access from the application
> 
> Within your application you should NOT try to do the databases job - for
> example 
> don't associate the data from two tables with each other within your
> application - 
> the database is always faster and better in doing this job.

Again, this depends on several factors.  Every case is different and
there are exceptions.

> Try to move your database access code into extra classes or as a minimum
> move it 
> to an extra method.
> This will allow better maintanance later in the lifecycle of your software.

This should already be handled by the db classes for the app.  It's part
of the phpGW architecture already.

> Write a SQL Statement into one line and don't split it into several lines. 
> This will allow searching via grep.
> Use dynamic parts (variables) only within the WHERE clause.

This isn't always desired.  Some statements are simply too long to put
on one line.  To boot, a lot of statements are build conditionally.

> Write the SQL-Keywords in uppercase - all other things in lowercase.

This is according to taste.  The only case that really matters is table
names and string values.  MySQL is case sensitive on case-senstive
filesystems.  Some RDBMS' use case-sensitive string comparison.

> 9. How to write select statements with multiple tables
> 
> There are several ways how to join tables via a select statement. 
> Use the following form:
> 
> SELECT * FROM events,languages,calendar WHERE event.lang_code =
> languages.lang_code AND event.cal_id = calendar.cal_id
> SELECT * FROM events LEFT JOIN languages LEFT JOIN calendar WHERE
> event.lang_code = languages.lang_code AND event.cal_id = calendar.cal_id

select * should be avoided if possible.  Only return the columns you
need.

> That's because this code is more portable between database systems and 
> the dbms will convert other forms into this one.
> Also this form is shorter, more readable, performs better (because the dbms 
> must not transform it) and is less error prone.
> 
> Please do NOT use the following version, because it is not so readable (only
> when using brackets) 
> and more error prone, because multiple joins are only readable when
> splitting them into several lines. 
> Expanding an SQL-statement is not as simple as the above form.
> 
> SELECT * FROM events JOIN languages ON (event.lang_code =
> languages.lang_code) JOIN calendar ON event.cal_id = calendar.cal_id

This form looks fine to me.  If you need to split it into multiple lines
to read it, do so.

> 10. Connections to other sources like files, ldap and imap
> 
> Instead of using blobs you should save large data as files and reference
> these files 
> by filename and/or relative path from a varchar column.

You should not dictate how application files are handled from a sql
guide.  BUT, BLOBs should definitely be discouraged from use.  They
generally are more difficult to deal with than the filesystem and aren't
really that great on performance.  Give your SQL server a break ;-)

> Referencing emails via IMAP etc. would work via the unique message id.

I prefer URI for abstracting item locations.  But, that's just me.

> For LDAP you can use the DN (distinguished name) as long as it is not larger
> than 
> 255 characters (varchar limit).

This depends on the application and should be abstracted anyway.

> Otherwise it might be better to work with UIDs or add your own unique
> identifier 
> to each object for referencing it from the database.

Well, you shouldn't have to make your own unique IDs if an object
already has one from another system.  Again, a URI would be good here,
but it depends on the application.

Mike





reply via email to

[Prev in Thread] Current Thread [Next in Thread]