Mixing techniques causes a memory leak

A co-worker solved an interesting problem this week.

A number of developers have been contributing to a Groovy script that exports a large amount of data from one of our databases. Although the conversion rules for the data can be complex, at its core, the script is pretty simple. Using groovy.sql.Sql, the script does a SELECT to get a number of insurance policies. Then it iterates through the result set and does a second SELECT to get some additional information about each policy. Here’s the pseudocode:


sql = Sql.newInstance(connectionProperties)
String SELECT_POLICIES = "SELECT * FROM POLICIES WHERE STATUS = 'ACTIVE'"
sql.eachRow(SELECT_POLICIES) { polRow ->
String SELECT_DETAILS = "SELECT FIELDS FROM TABLE1, TABLE2, ETC. WHERE POLICY_NO = ${polRow.policy_no}"
sql.eachRow(SELECT_DETAILS) { detailRow ->
writeToFile(formatValues(detailRow))
}
}

Until the latest sprint, the script had been working fine. However, it recently began to cause OutOfMemory errors after writing about 60,000 rows, and none of the developers could figure out why. The main difference that we saw was that a new developer had modified one of the SQL statements, and instead of using string substitution to modify the inner query, he had used parameter markers — which is a technique that, really, we all should be adopting for a number of good reasons.


sql = Sql.newInstance(connectionProperties)
String SELECT_POLICIES = "SELECT * FROM POLICIES WHERE STATUS = 'ACTIVE'"
sql.eachRow(SELECT_POLICIES) { polRow ->
String SELECT_DETAILS = "SELECT FIELDS FROM TABLE1, TABLE2, ETC. WHERE POLICY_NO = ${polRow.policy_no} AND SOME_PROP = ?"
sql.eachRow(SELECT_DETAILS, [polRow.some_prop]) { detailRow ->
writeToFile(formatValues(detailRow))
}
}

My co-worker solved the puzzle for us. With the addition of the parameter in the statement, the Groovy Sql class started compiling the inner query into a prepared statement. Prepared statements are generally faster because they are compiled and then cached and re-used, so you don’t incur the overhead of having to recompile on each use. However, with the substitution of a different policy number in the string on each iteration of the loop, making each string unique, on each iteration of the loop the Sql processor was preparing and caching a new statement. When the cache filled with about 60,000 instances, we ran out of heap memory.

The solution, of course, was to get rid of all the string substitution in our SQL statements and use parameters instead. The benefit of forcing all the queries to use prepared statements was that the script now runs about 30% faster. For a process that used to take 4.5 hours, that’s a significant improvement.

Lesson #1: Use prepared statements. Not only are they more efficient, but if you’re working with a web application, they are resistant to SQL injection.
Lesson #2: Don’t mix techniques. Practice code review and pair programming to encourage all of your team to develop shared best practices.

Advertisements

#groovy, #groovy-sql, #memory-leak