Bulk Insert With Grails

Importing data from a text or CSV file into a database with Grails seems like it should be a simple thing. As it turns out, however, some approaches perform dramatically better than others.

Simple GORM

Let’s assume that we’ve created a Grails domain class called MyDomain to represent the table we’re importing into, and a parseLine() method to parse one line of file data (CSV or whatever) into values. You might try something like this:

void loadTable(File file) {
file.eachLine() { line ->
MyDomain dom = new MyDomain(parseLine(line))
dom.save()
}
}

Unfortunately, you would find this code to run very slowly and you might even run out of memory. This is because Grails is caching at two different levels, causing locking and a potential memory leak. Read more about it:

So it’s better to try a different approach. There are many possibilities.

Call an External Tool

If your database platform offers a command line bulk import tool, you could simply call and execute it outside of Grails.

String command = """
db2 LOAD CLIENT FROM '/home/me/data.txt' OF ASC 
METHOD L
(1 9, 10 10, 11 19, 20 20, 21 26, 27 35, 36 71, 72 107, 108 127, 128 129, 
130 134, 135 138, 139 141, 142 144, 145 148, 149 149, 150 150, 151 155)
NULL INDICATORS (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
INSERT INTO MY_SCHEMA.MY_TABLE
STATISTICS NO
ALLOW NO ACCESS;
"""
command.execute()

But you may not have such a command line tool, or it may not be available to you in all environments. And it’s probably not a good idea to give your web server the authority to run database commands from the shell.

Groovy SQL

Another approach might be to bypass Grails’ domain classes and the underlying Hibernate persistence layer and instead use Groovy Sql.

void loadTable(File file) {
def db = grailsApplication.config.dataSource
def sql = Sql.newInstance(db.url, db.user, db.password, db.driver)
file.eachLine() { line ->
sql.execute( "insert into my_table (field0, field1, field2, ...) values (?, ?, ?, ...)".toString(), parseLine(line))
}
}

Or for better performance, in Groovy 1.8.1+ (Grails 2.0.1+), you can batch the inserts:

void loadTable(File file) {
def db = grailsApplication.config.dataSource
def sql = Sql.newInstance(db.url, db.user, db.password, db.driver)
sql.withBatch(20, "insert into my_table (field0, field1, field2, ...) values (:val0, :val1, :val2, ...)".toString() { preparedStatement ->
file.eachLine() { line ->
def fields = parseLine(line)
preparedStatement.addBatch(val0:fields[0], val1:fields[1], val2:fields[2], ...)
}
}
}

Hibernate StatelessSession

If you prefer to stick with GORM, you can. You’ll just need to compensate for Grails’ and Hibernate’s caching mechanisms, by pausing to clear the caches after every 20 records or so, as described in Burt Beckwith’s article on validation caching and Hibernate’s documentation on batch processing. Or instead of having to worry about clearing the Hibernate cache, you can open a stateless database session, which does not cache, for the duration of your import.

void loadTable(File file) {
StatelessSession session = sessionFactory.openStatelessSession()
Transaction tx = session.beginTransaction()
file.eachLine() {line ->
MyDomain dom = new MyDomain(parseLine(line))
session.insert(dom)
}
tx.commit()
session.close()
}

I have not benchmarked any of these approaches to see which performs the best. You will probably have your own reasons to prefer one over the other.

Advertisements

#grails, #groovy, #hibernate, #insert, #sql