Mocking SQLAlchemy
When writing test cases for a library I was developing for Netforecast Inc., I found a need to write simple and easy unittests for SQLAlchemy. Fortunately, there was a library called alchemy-mock which could help solve these problems. However, this library lacked certain features I desired, and also was no longer supported. As such, I created my own fork of the project adding documentation, more features, and modernized the project. For a full list of changes view, go to the changelog. Additionally, the documentation has more details and examples should you want to view more information about this library. The concept is quite simple: create general architecture for groupings of classes dependent on each other. So starting with a basic concept of model, I looked to make something in PyTorch that represented my idea. The documentation for this project goes into more depth about this idea.
I will include some basic examples here to illustrate the use cases of this library. Feel free to check out the GitHub repository for the source code.
Real-World Examples
In these real-world examples, I will explain hypothetical or real scenarios in which I have used this library to mock SQLAlchemy in order to efficiently test my code. I will also explain several alternatives to this library to use for testing, and why specifically this library may be useful in the specific scenario.
Data Stubbing
My main use case for this library came into play when using a codebase that had entry points (runnable scripts) that required session objects. These scripts use the session objects to integrate a combination of SQL tables to perform data analysis or other techniques in some manner. While each individual data analysis techniques were tested separately through unit tests, I wanted to test the integration of these components. One solution is to use transactions so that your database is never modified. However, this method requires access to the real SQL server and also is unlikely to provide stable and consistent data. Tests should be rerunnable with the same output every time and consistent. Another solution would be to set up a test database. However, this is very time consuming both in set up and tests take quite long to run. Additionally, some local machines struggle to set up a SQL server locally, so it is not the best solution. Finally, I ran into the original version of this library created by Miroslav Shubernetskiy. I found this library to combine the abilities I needed in order to test scripts that required a session object as a parameter. By creating a mocked-up session, I was able to effectively test my functions that combined many different SQL tables together for data analysis. Since there were some additional features I desired to add, I created my own version of the library to use in my own projects.
Now, let us take a look at some example code for this scenario. First, let us consider the function we want to test. Please note the code below was created to support the scenario above and therefore is not runnable, but merely exemplary to what this library can perform.
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
= declarative_base()
Base
# assume similar classes for Data2 and Data3
class Data1(Base):
= 'some_table'
__tablename__ = Column(Integer, primary_key=True)
pk1 = Column(Integer)
data_val1 = Column(Integer)
data_val2 = Column(Integer)
data_val3 def __init__(self, pk1, val1, val2, val3):
self.pk1 = pk1
self.data_val1 = val1
self.data_val2 = val2
self.data_val3 = val3
class CombinedAnalysis(Base):
= 'some_table'
__tablename__ = Column(Integer, primary_key=True)
pk1 = Column(Integer)
analysis_val1 = Column(Integer)
analysis_val2 = Column(Integer)
analysis_val3 def __init__(self, pk1, val1, val2, val3):
self.pk1 = pk1
self.analysis_val1 = val1
self.analysis_val2 = val2
self.analysis_val3 = val3
def __eq__(self, other):
if not isinstance(other, CombinedAnalysis):
return NotImplemented
return (
self.analysis_val1 == other.analysis_val1
and self.analysis_val2 == other.analysis_val2
and self.analysis_val3 == other.analysis_val3
)
def complex_data_analysis(cfg, session):
# collects some data upto some point
= session.query(Data1).filter(Data1.utc_time < cfg["final_time"])
dataset1 = session.query(Data2).filter(Data2.utc_time < cfg["final_time"])
dataset2 = session.query(Data3).filter(Data3.utc_time < cfg["final_time"])
dataset3 # performs some analysis
= analysis(dataset1, dataset2)
analysis12 = analysis(dataset1, dataset3)
analysis13 = analysis(dataset2, dataset3)
analysis23 # combine the data analysis (returns object CombinedAnalysis)
= intergrate_analysis(analysis12, analysis13, analysis23)
combined_analysis # assume the combined_analysis are stored in some SQL table
self.session.add_all(combined_analysis)
self.session.commit()
When using the mock-alchemy
package, the test function can now test this complex_data_analysis
function despite it containing multiple calls to SQL and combining those calls. Here is an example of how this might look. Assume the file detailed above is called data_analysis
.
import datetime
import mock
import pytest
from mock_alchemy.mocking import UnifiedAlchemyMagicMock
from data_analysis import complex_data_analysis, Data1, Data2, Data3, CombinedAnalysis
def test_data_analysis():
= datetime.datetime.utcnow()
stop_time = {
cfg "final_time": stop_time
}= [
data1_values 1, some, data, values),
Data1(2, some, data, values),
Data1(3, some, data, values),
Data1(
]= [
data2_values 1, some, data, values),
Data2(2, some, data, values),
Data2(3, some, data, values),
Data2(
]= [
data3_values 1, some, data, values),
Data3(2, some, data, values),
Data3(3, some, data, values),
Data3(
]= UnifiedAlchemyMagicMock(data=[
session
(
[mock.call.query(Data1),filter(Data1.utc_time < stop_time)],
mock.call.
data1_values
),
(
[mock.call.query(Data2),filter(Data2.utc_time < stop_time)],
mock.call.
data2_values
),
(
[mock.call.query(Data3),filter(Data3.utc_time < stop_time)],
mock.call.
data3_values
),
])
complex_data_analysis(cfg, session)= [
expected_anyalsis 1, some, anyalsis, values),
CombinedAnalysis(2, some, anyalsis, values),
CombinedAnalysis(3, some, anyalsis, values),
CombinedAnalysis(
]= session.query(CombinedAnalysis).all()
combined_anyalsis assert sorted(combined_anyalsis, key=lambda x: x.pk1) == sorted(expected_anyalsis, key=lambda x: x.pk1)