Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better tracking of prepared statements & other enhancements #223

Merged
merged 24 commits into from
Jan 23, 2021

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Dec 30, 2020

  • The bigger part of the PR is fixing problems related to DB lock is not released #211.
    The PR addresses it by introducing internal prepared statement object _Stmt that is managed by DB, whereas Stmt is just a reference to _Stmt (_Stmt keeps the count of that references).
    The public API is not affected by this change. As before, prepared statements should be finalized when Stmt object goes out of scope.
    But with this PR closing the DB connection automatically closes all its prepared statements.

    Also there's (non-exported) new method SQLite.finalize_statements!(db) that explicitly closes all prepared statements without disconnecting from the DB.
    That is handy when DB schema needs to be updated (e.g. dropping the table), and some statement(s) hold the schema lock. An attempt to use Stmt object
    after its prepared statement was finalized will throw SQLiteException.

    The "non-prepared" statements (execute(db, sql)) now close their internally created prepared statements immediately upon finishing without waiting for GC to close the stale
    prepared statement.

  • The PR also adds support for Bool Julia type: before it was imported into SQLite3 as blob, now it is properly imported as INT. Still need to update the tests, though.

  • It also adds support for passing the statement parameters to bind!()/execute() as keyword arguments. Note: it also adds support for keyword args to DBInterface.execute as well, but the proper place for this would be in DBInterface.jl package itself. I just don't have a precise idea whether such change in DBInterface would require updating other DB interface packages as well.

  • Cleanups in load!():

    • drop nm argument of some internal overloads: it is just confusing and error prone to have both escaped and non-escaped table names as arguments
    • tableinfo() fix/cleanup: now it returns nothing if the table doesn't exist, and the output of columntable() if it does, no need for special struct with a bit cryptic fields
    • escape column names
  • Exporting SQLiteException and throwing it whenever makes sense: instead of @assert and error().

The PR should be ready for the review. I still need to add a few tests for the new features (closing statements upon DB disconnect, Bool columns etc), but I would be waiting for the feedback on how to proceed further. Clearly this PR addresses several different things. I tried to make the commits atomic, so I can split the PR, if necessary -- but maintaining several independent PRs adds a bit of overhead.

@codecov
Copy link

codecov bot commented Dec 30, 2020

Codecov Report

Merging #223 (418b7d2) into master (40794b1) will increase coverage by 3.06%.
The diff coverage is 96.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #223      +/-   ##
==========================================
+ Coverage   79.45%   82.52%   +3.06%     
==========================================
  Files           5        5              
  Lines         555      578      +23     
==========================================
+ Hits          441      477      +36     
+ Misses        114      101      -13     
Impacted Files Coverage Δ
src/SQLite.jl 96.31% <95.65%> (+7.18%) ⬆️
src/tables.jl 99.04% <97.72%> (-0.06%) ⬇️
src/consts.jl 88.88% <0.00%> (-2.03%) ⬇️
src/api.jl 40.49% <0.00%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a826caa...418b7d2. Read the comment docs.

@alyst alyst changed the title [RFC/WIP] Better tTrack Staging [RFC/WIP] Better tracking of prepared statements & other enhancements Dec 30, 2020
rather than error().
Also show the name of the duplicate column.
@alyst alyst force-pushed the staging branch 3 times, most recently from f70ed20 to 8e36dd6 Compare January 2, 2021 20:21
@alyst alyst changed the title [RFC/WIP] Better tracking of prepared statements & other enhancements [RFC] Better tracking of prepared statements & other enhancements Jan 2, 2021
@alyst
Copy link
Contributor Author

alyst commented Jan 4, 2021

@quinnj Could you please have a look?

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions; overall looks good! Thanks!

sqlitetype(::Type{T}) where {T<:AbstractString} = "TEXT NOT NULL"
sqlitetype(::Type{T}) where {T<:Union{Missing, AbstractString}} = "TEXT"
# conversion from Julia to SQLite3 types
sqlitetype_(::Type{<:Integer}) = "INT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the name here to sqlitetype_? It doesnt' seem like it should conflict by leaving it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very much prefer to avoid introducing sqlitetype_() method too. The problem is, sqlitetype() has to return "T NOT NULL" for non-missing types and "T" for missing ones. Which means the pattern sqlitetype(::Type{Union{T, Missing}) = f(sqlitetype(T)) will not work here. Of all solutions to this problem the one I implemented looked like the most straightforward one. But I can change it if you have better ideas.

src/SQLite.jl Outdated
execute(stmt::Stmt, params::Union{NamedTuple, AbstractDict, AbstractVector, Tuple}) =
execute(stmt.db, _stmt(stmt), params)

execute(stmt::Union{Stmt, _Stmt}; kwargs...) = execute(stmt, kwargs.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to document this new form of execute + tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very basic tests are already there (just mimicking what is already there for named tuple version). I will add docs for SQLite.execute().
The problem is, as I state in PR description, the actual API to use for row-returning statements should be DBInterface.execute(), which have to be changed in DBInterface.jl. Should I add keyarg version of execute there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the updated PR I've updated the docs for SQLite.execute clarifying that the params could also be pass as keyargs, and I've added the corresponding tests.

src/tables.jl Outdated
end
return Query(stmt, Ref(status), header, types, Dict(x=>i for (i, x) in enumerate(header)))
end

DBInterface.execute(stmt::Stmt; kwargs...) = DBInterface.execute(stmt, kwargs.data)
DBInterface.execute(db::DB, sql::AbstractString; kwargs...) = DBInterface.execute(DBInterface.prepare(db, sql), kwargs.data) # FIXME should be in DBInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's not pirate here; we can do a PR to DBInterface.jl

Copy link
Contributor Author

@alyst alyst Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now the package requires DBInterface.jl v2.3.0.

src/tables.jl Outdated
end
# table info for load!():
# returns NamedTuple with columns information,
# or nothing if table does not exist
function tableinfo(db::DB, name::AbstractString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the changes here? Seems breaking for not much benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though tableinfo() is rather recent change, and TableInfo structure was not exported, so I changed it into something that looked more generally useful to me.
Before it returned TableInfo struct which actually contained just a vector of table column names. Now it returns the named tuple with all the information that SQLite provides, including column types etc. So the updated tableinfo() could be used to actually get the table information.

Also, the old version was not working, because it was checking whether the result of columntable(...) is a NamedTuple as an indication that table exists. However, columntable() always returns NamedTuple -- just in case of a missing table it has only .q element.

end

checkdupnames(names) = length(unique(map(x->lowercase(String(x)), names))) == length(names) || error("duplicate case-insensitive column names detected; sqlite doesn't allow duplicate column names and treats them case insensitive")
# case-insensitive check for duplicate column names
function checkdupnames(names::Union{AbstractVector, Tuple})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a functional change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still throws in the same situations. However, now it throws SQLiteException instead of generic ErrorException.
Plus the message indicates the duplicated column, which is quite useful :)

@alyst
Copy link
Contributor Author

alyst commented Jan 11, 2021

I've moved DBIntreface.jl changes to JuliaDatabases/DBInterface.jl#30.
I will wait until that PR is resolved (I hope it will get merged :) ) and the new version of DBInterface.jl is tagged to update this PR, as otherwise it will fail the tests.

using types defined in DBInterface in 2.3.0, so the required version
is updated
assert should be used only as internal logic checks
We don't need the special TableInfo structure, in particular if
the table does not exist. The current version
uses execute(f, db, sql) call to avoid leaving behind prepared
statements.

Fixes the table existence check (Query always returns NamesTuple).
this is confusing and error-prone
so that the statement is closed immediately upon completion
update the docstring and add (very) basic tests
@alyst
Copy link
Contributor Author

alyst commented Jan 13, 2021

I've updated the PR against DBInterface.jl 2.3.0. With the tighter control of prepared statements lifetime (esp. with execute(f, db, sql)) I noticed that the prepared statement handle is not actually unique and could be reused by the statements. So in the old version of the PR it could have happened that two independent Stmt objects had the same handle -- one for the already closed statement, and one for the new one, and GCing the first former would have inadvertently closed the new prepared statement. It's fixed now: _Stmt objects are referenced by unique id rather than by the SQLite3 handle.

src/SQLite.jl Outdated
refcount::Int # by how many Stmt objects referenced

function _Stmt(handle::StmtHandle)
stmt = new(handle, Dict{Int, Any}(), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't the refcount start at 1 here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I see how it's used later on; the refcount is how many Stmts reference a _Stmt.

Hmmm, having reviewed the whole PR, the refcount stuff feels a little fragile/fiddly to me; can you restate the benefits of going with this approach of tracking _Stmts apart from Stmt and the refcounting? I just want to make sure I understand what we gain/benefit for all this work, since it just seems like a lot of effort to just ensure we close things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you restate the benefits of going with this approach of tracking _Stmts apart from Stmt and the refcounting?

The idea is to confine the prepared statement handles within DB object and make 1-to-1 correspondence between internal _Stmt object and the prepared statement handle. E.g. even if the user will manage to create two Stmt objects that reference the same statement (e.g. by copying Stmt), there will be no double-close situation, because the handle will be closed only once (from _Stmt object). Also, when DB is closed, Stmt will be "aware" that the handle is no longer valid. I'm not an expert in SQLite3, so I don't know how bad it is to call its API methods with the bogus handle, but I assume it's better if we can intercept an error before the call.

And yes, it was also handy to have all prepared statement automatically tracked at one place, so that they could all be closed at once.
One can implement that on the user side, but it's much more efficient/less error-prone to do it in the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for refcounting itself, I'm not 100% sure about it either.
It's part of the solution to keep track of the Stmt scope when copying is possible.
If copying is forbidden (at least copy throws), it is not needed.
In that case double-close also should not happen, but there still needs
to be a mechanism of correctly closing statements upon DB close.

@alyst alyst closed this Jan 17, 2021
@alyst alyst reopened this Jan 17, 2021
@alyst
Copy link
Contributor Author

alyst commented Jan 18, 2021

@quinnj could it be that CI actions are disabled due to 60 days of inactivity?
I've removed refcounting, so now Stmt just holds the id to _Stmt, and no Stmt objects can share the same id. I think it is as simple as it could be.

@quinnj
Copy link
Member

quinnj commented Jan 23, 2021

Yeah, this looks pretty good now; would you mind copy/pasting this; there's one section like:

- run: |
          julia --project=docs -e '
            using Documenter: doctest
            using JSON3
            doctest(JSON3)'

that we can just remove. That should setup github actions to run on this PR.

- fixes JuliaDatabases#211
- closes all prepared statements upon close!(db)
- adds SQLite3.finalize_statements!(db) call
- execute(db, sql): close the internal prepared statement immediately,
  don't wait for GC
so Union{T, Missing} is handled automatically.
This also fixes "BLOB NOT NULL" case.
we don't need to create DB just to raise an exception
ci.yml taken from JSON3.jl
@alyst alyst changed the title [RFC] Better tracking of prepared statements & other enhancements Better tracking of prepared statements & other enhancements Jan 23, 2021
@alyst
Copy link
Contributor Author

alyst commented Jan 23, 2021

Yeah, this looks pretty good now; would you mind copy/pasting this

@quinnj That worked, thank you! So now everything seems in place.

@quinnj quinnj merged commit dab1455 into JuliaDatabases:master Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants