# NAME


DBIx::PreQL - less awful SQL generation through templates


# SYNOPSIS


    use DBIx::PreQL   ();




    my $q = <<END_SQL;
        *   SELECT
        &       count(*),                   !total!
        *       name,
    END_SQL








    my $q = <<END_SQL;
        *   SELECT
        &       count(*),                   !total!
        D       name,
        D       height,
        *   FROM tbl_monkey
        *   WHERE
        &       AND barrel_id = ?barrel_id?
        &       AND name ILIKE '%' || ?monkey_name? || '%'
        &       AND color ?=monkey_color?
        &       AND ARRAY[type] <@ ?\@types? -- "IN"
        &   ORDER BY name                   !~total!
    END_SQL


    my $total = undef;
    my $rows = $db->select_all( DBIx::PreQL->build_query(
        query  => $q,
        data   => {
            barrel_id       => 32,
            monkey_color    => \'NULL',
            total           => $total,
            types           => [ 'ape', 'chimp' ],
        },
        wanted => ! $total ? ['D'] : [],    # Want 'D'ata if not $total
    ) );


    # Runs the following with @params = ( 32, ['ape','chimp'] ):
    #   SELECT
    #       name,
    #       height
    #   FROM tbl_monkey
    #   WHERE
    #           barrel_id = ?
    #       AND color IS NULL
    #       AND ARRAY[type] <@ ? -- "IN"
    #   ORDER BY name


# DESCRIPTION

This module generates queries based on a query template, a hash of related
data, and possibly a list of wanted tags or a function that determines which
tags are wanted.


## SQL Template


This templating system adds only a handful of concepts to standard SQL text.


The query template is processed on a line-by-line basis.


Each line consists of a __tag__ followed by the __body__ of the line.  That
is, each line is composed of: optional whitespace, a tag composed of one
or more non-whitespace characters, a separator of one or more whitespace
characters, and the remainder of the line is the embedded SQL (the __body__).


Each line __body__ consists of plain SQL text and perhaps some
__named place-holders__ and/or __dependency markers__.


### Body Text


- Named place-holders


A __named place-holder__ is a name bracketed by question marks, like
`?key_name?`.  Use named place-holders where you would normally use
place-holders (`?`) in SQL (with DBI).


When a named place-holder is included in a query, it gets replaced by just
a question mark (`?`, a regular DBI place-holder) and the named value gets
pushed onto the list of query parameters.


There are also special forms of place-holders (`?=key_name?`, `?!key_name?`,
`?@key_name?`, and `?"key_name?`) which we will describe later.


- Dependency markers


A __dependency marker__ is a name bracketed by exclamation points, like
`!key_name!`.  Use dependency markers to indicate that a given line
should only be included in the generated query if the named value is
defined in the data hash.


When the template is processed, dependency markers are just removed (and
nothing is added to the list of query parameters).


If a line has multiple dependency markers, then you can request that the
line be included if _any_ of them are defined or only if _all_ of them
are defined (in the data hash).  See the Tags section for how this is done.


You can also use `!~key_name!` to negate the dependency.  Lines marked with
a negated dependency are only included if the named key is _undefined_ (or
missing) from the data hash.


### Tags


Each tag indicates how to decide whether to include the tagged line of SQL in
the generated query.


Tags must be one or more non-whitespace characters.


There are several pre-defined tags that don't require the use of a 'wanted'
list / function:


- `*`


A tag of `*` (asterisk) means _always include this line_.  The skeleton of
your query will be lines starting with `*`.


- `#`


A tag of `#` (pound sign) means _never include this line_.  Yep, they are
just comments.


- `&`


A tag of `&` (ampersand) means _include this line if we have data for
ALL named place-holders and ALL dependency markers_.  These lines are
the work-horse lines that will handle most of the dynamic query assembly.


You can also use a custom tag that _starts_ with an ampersand (`&`), like
`'&TOT'`.  For such tags, first we check that we have data for all named
place-holders and all dependency markers.  If not, then the line is simply
excluded.  Otherwise, the `&` is stripped and the remainder of the tag
is treated as a custom tag and will be checked against your 'wanted' list /
function.


The tag is `&` to match Perl's `&&` operator since a line like


    & LIMIT ?limit? !~total! !paged!


only gets included if


        defined $data->{limit}
    &&  ! defined $data->{total}
    &&  defined $data->{paged}


- `|`


A tag of `|` (vertical bar) means _include this line if we have data for
ANY dependency markers as well as for ALL named place-holders_.  These
lines will be less common.


Similar to `&`, you can use a custom tag that _starts_ with a vertical
bar (`|`), like `'|SUM'`.  For such tags, first we check that we have data
for ANY dependency markers and for ALL named place-holders.  If not, then the
line is simply excluded.  Otherwise, the `|` is stripped and the remainder
of the tag is treated as a custom tag and will be checked against your
'wanted' list / function.


The tag is a `|` because of how it treats dependency markers.  A line with
two dependency markers, say `!tot!` and `!~sum!`, only gets included if


    defined $data->{tot}  ||  ! defined $data->{sum}


and the `|` tag was chosen to match the `||` operator in that expression.
(Named place-holders are treated the same as for the `&` tag since having
an undefined or missing value for a place-holder would just be fatal.)


Any other block of characters is a __custom tag__.  Whether to include a line
marked by a custom tag or not is determined by a 'wanted' list / function.
If you use custom tags, then _you must supply a 'wanted' list / function_
(described later).


To catch accidental omission of a tag, tags that are common SQL keywords
(like 'FROM') or that end with a comma are fatally rejected (unless you
specify a `'known_tags'` list).


For the same reason, an empty SQL line is fatally rejected unless you used
the `'#'` tag or also omitted the tag.  So there is no way to include a
blank line in the generated SQL because we want to catch cases like:


    query => [
        '*  SELECT',
        '       *',     # Oops, left off the tag on this line
        '*    , CASE ... END AS ...',
        '*  FROM ...',
        ...
    ],


You can use `"* --"` to include a nearly-blank (SQL comment) line in the
generated SQL.


### Advanced template features


- SELECT trailing-comma clean-up


When including a line of SQL that begins with the _word_ `FROM` (case
insensitive, ignoring white-space), we remove the last character of the
previous (included) line, if and only if it is a comma (`,`).


So, please put a comma after the last value in your SELECT list (if you can
follow it by a line that starts with `FROM`) in order to simplify editing
of the template.


No special provisions are made for handling trailing commas anywhere else
in SQL.


- WHERE clause leading-AND clean-up


To greatly simplify the very common case of building a `WHERE` clause from
a subset of several optional conditional expressions that should all be
separated by `AND`, we can also remove an `AND` that appears immediately
after a `WHERE`.


Specifically, if we include a line of SQL that ends with the word `WHERE`
(case insensitive, ignoring white-space) and the next (included) line of SQL
begins with the word `AND` (case insensitive, ignoring white-space), then we
will replace the `AND` with spaces.


Alternatively, you can replace your `'WHERE'` with `'WHERE TRUE'`.  You
should certainly do this if there is a chance that sometimes _all_ of the
conditional expressions will be omitted.  For example,


    *   SELECT some, stuff
    *   WHERE
    &       AND foo = ?foo?
    &       AND bar = ?bar?
    *   ORDER BY some


will result in an SQL syntax error if neither 'foo' nor 'bar' keys are
present (and defined) in the data hash.  But this can and should be prevented
by instead writing:


    *   SELECT some, stuff
    *   WHERE TRUE
    &       AND foo = ?foo?
    &       AND bar = ?bar?
    *   ORDER BY some


- `?=key_name?`


A special form of named place-holder includes an equals sign (`=`) before
the key name.  This place-holder does special handling for `NULL` values.
To specify a `NULL` value, use `\'NULL'` as the associated value (a SCALAR
reference to the string `'NULL'`).


So, a template line like:


    &   AND affil_parent ?=parent?


will become (if the 'parent' key is defined) either:


    AND affil_parent = ?


or


    AND affil_parent IS NULL


The second case (where the place-holder is replaced by `'IS NULL'`) happens
if `$data-`{parent}> is `\'NULL'` (ignoring case and external whitespace).
For this case, the list of query parameters is not added to.


The first case (where the place-holder is replaced by `'= ?'`) happens
if `$data-`{parent}> is not a reference (but is defined).  For this case,
`$data-`{parent}> (or just 'parent') is pushed onto the list of query
parameters.


- `?!key_name?`


A similar special form of named place-holder includes an exclamation point
(`!`) before the key name.  This place-holder similarly supports `\'NULL'`
as the associated value.  But this place-holder represents "distinct from"
(the opposite meaning compared to `?=key_name?`).  Note that actually using
"is distinct from" or "is not distinct from" in your template is discouraged
for these cases as the Postgres query optimizer can be hampered by such.


So, a template line like:


    &   AND affil_parent ?!parent?


will become (if the 'parent' key is defined) either:


    AND affil_parent <> ?


or


    AND affil_parent IS NOT NULL


The second case (where the place-holder is replaced by `'IS NOT NULL'`)
happens if `$data-`{parent}> is `\'NULL'` (ignoring case and external
whitespace).  For this case, the list of query parameters is not added to.


The first case (where the place-holder is replaced by `'<> ?'`)
happens if `$data-`{parent}> is not a reference (but is defined).  For
this case, `$data-`{parent}> (or just 'parent') is pushed onto the list
of query parameters.


- `?@key_name?`


A place-holder with an at sign (like `?@key_name?`) requires that the
associated value be an ARRAY reference but otherwise behaves identically
to a plain, named placed-holder.  DBD::Pg will treat the array reference
as a Postgres array value.


There are a few gotchas with using Postgres array values and `?@key_name?`
so let's give an example of typical usage.  First, let's show the typical
case that one would end up replacing with a use of `?@key_name?`:


    push @where, 'account_id IN (' . join(',',('?')x@sub_accts) . ')';
    push @param, @sub_accts;
    # ...
        join( ' AND ', @where )
    # ...


which ends up generating SQL that includes something like:


    ... AND account_id IN (?,?,?,?,?) AND ...


where the number of `'?'`s matches the number of elements in `@sub_accts`.


The SQL we want to generate to use a Postgres array value instead would be:


    ... AND ARRAY[account_id] <@ ? AND ...


`ARRAY[account_id]` makes a Postgres array value containing a single value
(the value of account\_id).  `<@` means "is contained in".  And the `?`
is a plain DBI place-holder.


So the new code would look something like:


    $rows = $db->select_all( DBIx::PreQL->build_query( {
        data => {
            sub_accts => \@sub_accts,
            # ...
        },
        query => [
            # ...
            "&  AND ARRAY[account_id] <@ ?\@sub_accts?",
            # ...
        ],
        # ...
    } ) );


Note how I had to put a backslash (`\`) in front of the at sign in
`?\@sub_accts?` because my SQL template string was enclosed in double
quotes.  If I hadn't done that, the query would not have worked.  Luckily,
build\_query() would almost certainly have complained because that line had
a `'&'` tag but no named place-holders.


Note that a reference to an _empty_ array would mean that


    ARRAY[account_id] <@ ?\@empty?


would never be true (it would generate an SQL syntax error using the old
method).  Sometimes that is what is wanted.  Often, an empty array means that
the condition should just be ignored.  You can accomplish that easily as
follows:


        data => {
            sub_accts => @sub_accts ? \@sub_accts : undef,
            # ...
        },
        query => [
            # ...
            "&  AND ARRAY[account_id] <@ ?\@sub_accts?",
            # ...
        ],


- `?"key_name?`


For the rare, complicated case, you can put a double quote before the key
name in a named place-holder.  This place-holder will be replaced by the
string value associated with that key name in the data hash.  So the string
value should be a snippet of valid SQL.


For example:


    &   AND affil_parent IS ?"parent? NULL


could be combined with:


    data => {
        parent => $has_parent ? 'NOT' : '',
        # ...
    },


to only find items with a non-NULL affil\_parent if `$has_parent` is true and
vice versa.


### Special data values


The values in the `'data'` hash are usually expected to be strings (or maybe
numeric values).  But some other types of values are also handled by this
templating system.


- `undef()`


The named key being present but with an undefined value associated with it
causes the templating system to act the same as if the key were not present.


- `\$sql`


If the associated value for a named place-holder is a reference to a scalar,
then the referenced scalar is expected to contain a valid snippet of literal
SQL (similar to how AT::SQL and other helpers treat such SCALAR refs).  The
named place-holder will be replaced with the literal SQL (not with `'?'`)
and the list of query parameters will not be added to.


For `?=key_name?`, `'= '` will also be inserted just prior to the literal
SQL snippet, unless the snippet is equivalent to `NULL`, in which case it
will be preceded with `'IS '` instead.  While for `?!key_name?`, `'<` '>
will precede the snippet, except for `NULL` which will be preceded by
`'IS NOT '`.


For `?@key_name?`, a reference to a scalar is a fatal error.


For dependency markers, the value being a reference does not (currently)
matter.


- `\'NULL'`


A reference to a string of `'NULL'` (ignoring case and external whitespace)
is treated differently from a reference to some other snippet of SQL only for
the `?=key_name?` and `?!key_name?` place-holders (as documented elsewhere).


- `\@list`


DBD::Pg can use ARRAY references to represent a Postgres array value,
including in a query parameter.  So it is good to allow an array reference
as a data value.  However, it is quite hard to imagine a spot in an SQL
template where a Postgres array and a non-array value would both be
equally valid.


So we require you to declare whether or not you expect the place-holder to
take an ARRAY reference.  `?@key_name?` requires an array reference.  Other
place-holders treat an array reference as a fatal error.


- Stringifier


If a data value is a reference to a blessed object that overloads
stringification, then no special behavior is triggered.  The object may
be pushed onto the list of query parameters where it will likely later
be stringified.


In such a case, the blessed object being a reference to a SCALAR or to an
ARRAY will be ignored.  So, for example, a blessed reference to an ARRAY
that overloads stringification is a fatal error for a `?@key_name?`
place-holder.


- Other references


Other types of references are treated as fatal errors by named place-holders.
Dependency markers currently treat any kind of reference the same as a
non-reference.  But these behaviors should not be relied upon.


Future versions of this module may add additional special treatments for
different types of references, including changing how dependency markers
treat reference values.


### 'wanted' list / function


The `'wanted'` argument can be a reference to an array containing just the
custom tags whose lines should be included in the generated query.


For truly complicated cases, the `'wanted'` argument can be a CODE reference
that is called for each custom tag.  The associated line(s) will be included
in the generated query if and only if the sub returns a true value for that
tag.


A `'wanted'` function takes two arguments: a tag and the `'data'` hash-ref.


For example, here is one that includes lines for tags that are the same as any
(lower-case) `'data'` hash key having a defined value:


    wanted => sub {
        my( $tag, $data ) = @_;
        return defined $data->{ lc $tag };
    },


Here is a more complex example.  Not that this is a great example of
a case where a `'wanted'` _function_ is preferred over a simpler
`'wanted'` _list_.  But it _is_ a good example of a relatively sane
`'wanted'` function that also avoids the example being extremely complex.


    my( $query, @params ) = DBIx::PreQL->build_query(
        data    => \%data,
        query   => [
           "*   SELECT",
           "C       count(*),",
           "D       m.name,",
           "D       m.height,",
           "*   FROM tbl_monkey AS m",
           "T   JOIN tbl_tree AS t USING( monkey_id )",
           "*   WHERE",
           "&T      AND t.height >= ?min_height?",
           "&T      AND t.bark = ?bark?",
           "*       AND barrel_id ?=barrel_id?",
           "*       AND m.name ILIKE '%' || ?monkey_name? || '%'",
           "*       AND m.color ?!skip_color?",
           "D   ORDER BY name",
        ],
        known_tags => [qw< C D T >],
        wanted  => sub {
            my( $tag, $data ) = @_;
            return defined $data->{total}
                if  'C' eq $tag;            # 'C'ount
            return ! defined $data->{total}
                if  'D' eq $tag;            # 'D'ata (not 'count')
            return 0 < grep defined $data->{$_}, 'min_height', 'bark';
                if  'T' eq $tag;            # 'T'rees
            die "Unknown tag ($tag)";
        },
    );


But note that this equivalent version is simpler in several respects:


    my( $query, @params ) = DBIx::PreQL->build_query(
        data        => \%data,
        query       => [
           "*   SELECT",
           "&       count(*),                       !total!",
           "D       m.name,",
           "D       m.height,",
           "*   FROM tbl_monkey AS m",
           "|   JOIN tbl_tree AS t USING( monkey_id ) !bark! !min_height!",
           "*   WHERE",
           "&       AND t.height >= ?min_height?",
           "&       AND t.bark = ?bark?",
           "*       AND barrel_id ?=barrel_id?",
           "*       AND m.name ILIKE '%' || ?monkey_name? || '%'",
           "*       AND m.color ?!skip_color?",
           "&   ORDER BY name                       !~total!",
        ],
        wanted      => defined $total ? [] : ['D'],
        known_tags  => ['D'],
    );


This is also the our only example that makes use of the `'|'` tag.


## Caution


Conditional generation of SQL is a problem with many bad solutions and no
really good ones.  This library attempts to offer a solid less-bad solution
that keeps SQL near the surface and favors simplicity and readability over
enforced correctness.  If you want correctness, you will have to provide it
yourself.


Don't get too fancy with your 'wanted' subroutines.


Pay attention to your `AND`s, `OR`s, and other joining words / characters.


# EXPORTS


NONE, but please include the empty list when you `use` the module:


    use DBIx::PreQL ();


so the fact that nothing is being imported is obvious to the person reading
that code.


# SUBROUTINES


## build\_query()


build\_query is called as a class method with arguments in name/value pairs:


    ( $query, @params ) = DBIx::PreQL->build_query(
        query       => $template_string,
        data        => \%data,
        wanted      => \@wanted_tags,
        known_tags  => \@tag_list,
        keep_keys   => $boolean,
    );


Most of the prior documentation covers the details of using this method, the
only functionality provided by this module.


### Arguments:


- query


Required.  A string of several (tagged) lines that is your query template.
You can also pass a reference to an array of lines.


- data


Virtually required.  A reference to a hash whose keys can be referenced in
your query template and whose values can end up in the resulting parameter
list (or even incorporated into the generated SQL).


Technically, you don't have to provide a `'data'` argument.  But not
providing one rather severely restricts your templating options and so
is a rather unlikely scenario.


- wanted


This argument is required if you use any custom tags.  This argument is
ignored otherwise.


Either 1) a reference to an array containing the list of (custom) tags whose
lines should be included in the generated SQL.


Or 2) a reference to a subroutine that will be passed a `$tag` and the
`'data'` hash-ref and that returns a true value if lines having that `$tag`
should be included (and returns a false value if such lines should be
excluded).


- known\_tags


Optional.  A reference to an array of custom tag strings.  Used to catch
typos when composing custom tags.  Finding a custom tag not in this list is
a fatal error.  Having a tag in this list that is never found is a warning.


- keep\_keys


Optional.  A boolean value.  Unlikely to be used.


When true, place-holder names (the _keys_ to the `'data'` hash-ref) are
what get pushed onto the query parameter list.  When false (the default),
what gets pushed onto the query parameter list are the _values_ from the
`'data'` hash-ref.


Returns a string which is the generated SQL query and a list of query
parameters to be used with that query.