Re: [lp-ca-on] Libre Planet Ontario has started a tax project

On 31 May 2016 at 20:13, Greg Knittl <address@hidden> wrote:

Hi Chris,

Yes, I've realized for a long time that it's not exactly a spreadsheet application
but it's only last week that I realized it's only the topological sort component of
a spreadsheet (tsort on the command line) that it needs. See
http://lists.libreplanet.org/archive/html/libreplanet-ca-on/2016-05/index.html
specifically
http://lists.libreplanet.org/archive/html/libreplanet-ca-on/2016-05/msg00048.html
That's an enormous simplification but there are still a lot of design issues to deal with.
Including the fact that the output order does not match the calculation order...

It would be great if you could join the mailing list

Using tsort is a really interesting thought; it somewhat pre-supposes a shape of the

solution implementation.

When I started poking at taxes, I fairly quickly jumped to Prolog as a way of

grappling with it. Prolog has two or three pretty crucial conveniences for

the purposes of analysis:

a) It regards problems as involving databases of facts. We're not talking

about SQL here, we're talking about sets of statements of facts.

For instance, a nice, compact fact for tax purposes might be the following...

t4 (456789012, 2016, 1, 55000, 485, 375, 17000).

This is a (quite permissible!) encoding of someone's T4 slip, indicating that

taxpayer 456789012, in 2016, had, as their first T4, a slip indicating

employment income of $55000, CPP of $485, EIC of $375, and tax

deducted of $17000.

t5 (456789012, 2016, 1, 150, 0, 0).

That's an encoding of a T5 slip, indicating $150 of interest, and no deductions.

(Probably this oversimplifies, but it's a nice starting point.)

There are plenty of "Internal" facts that would be useful to collect into the

database, such as tax rates, currency exchange rates, deductions, and

such.

b) You can then define rules indicating how to infer things from those facts.

For instance, a T1 tax return defines "Total Income" as the sum of

values from various other lines of the return.

So I wound up defining a rule for that thus:

rawline(TAXPAYER, _YEAR, 'T1', 150, TOTAL_INCOME) :-
sumoflines(TAXPAYER, _YEAR, 'T1',
[101, 104, 113, 114, 115, 119, 120, 121, 122, 126, 127, 128, 129, 130, 135, 137, 139, 141, 143, 147],
TOTAL_INCOME).

the rawline() bit is a bit opaque, alas, but it should be decently clear that TOTAL_INCOME,

line 150 (look at your tax return and you'll easily find this) is determined as the sum of

various other lines from the tax return.

A great portion of the details of tax computations look like this, or not TOO much more complex.

The computations done tend to fall into four categories:

a) Adding together values from other places;

b) Multiplying a value by a fraction;

c) Subtracting values, perhaps biasing towards 0

d) Selecting one of several values based on a value

(e.g. - if your income is between $0 and $27000, you pay at one rate, and tax rates increase for higher income levels)

The more that these rules can be structured as data, the better, as that makes

them amenable to capture, storage, and organization.

c) Prolog then provides mechanisms to run "queries" to determine answers based on facts and any rules needed to use to derive results. No worries about tsort; it uses backtracking to derive answers as needed.

My calculator has built those forms that I found useful; a more comprehensive set would presumably be needful for a more universal solution.

While Prolog isn't exactly well-known, there are decent implementations on Linux, and it does a fine job for generating some tax reports.

I have been satisfied with Prolog being a more than satisfactory tool for the initial analysis; it does tax calculations quite nicely, and I have NOT been distracted by much in the way of extraneous concerns.

Further, initial attention for any more extensive project should go into those two initial concerns:

a) How to represent the "facts" that come as input

b) How to represent the rules that indicate how the facts interact to give you a tax return.

It seems to me that a great deal of what would follow in creating a more sophisticated environment to make it comparatively easy to use would involve doubling down on a) and b).

For instance, the labelling of tax forms would indicate a bunch more facts.

- What is the label of that line on the tax return?

- Where does it go on the page?

- What section of the Income Tax Act underlies that fact or rule?

How to compute results is one thing (and yes, indeed a topological sort would be pretty good for that, absent of something like Prolog's backtracking system).

How to render results on screen/paper/file is another thing.

Those things would be well-served by having a series of facts about each line of information that is expected to be on the tax return.

A thing that the Prolog-based representation handles nicely that might be difficult with other representations is the notion that you might have facts for a set of tax payers, perhaps for a series of years. Prolog is quite happy having a database that contains facts for several taxpayers for several years, as long as the facts are represented suitably.

It's enough work to do something else that I haven't put any effort into going anywhere else.

When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

From:	Christopher Browne
Subject:	Re: [lp-ca-on] Libre Planet Ontario has started a tax project
Date:	Thu, 2 Jun 2016 18:25:26 -0400