Integration Testing Base Classes¶
Base testing classes¶
- There are two main testing classes in Sparkly:
- SparklyTest:
- Instantiates Sparkly context specified in context attribute.
- The context will be available via self.hc.
- SparklyGlobalContextTest:
- Reuses single SparklyContext for all tests for performance boost.
Example:
from sparkly import SparklyContext
from sparkly.test import SparklyTest
class MyTestCase(SparklyTest):
context = SparklyContext
def test(self):
df = self.hc.read_ext.by_url(...)
self.assertDataFrameEqual(
df, [('test_data', 1)], ['name', 'number']
)
...
class MyTestWithReusableContext(SparklyGlobalContextTest):
context = SparklyContext
def test(self):
df = self.hc.read_ext.by_url(...)
...
Fixtures¶
Fixtures is term borrowed from testing in Django framework. A fixture will load data to a database upon text execution.
- There are couple of databases supported in Sparkly:
- Mysql (requires: PyMySql)
- Elastic
- Cassandra (requires: cassandra-driver)
Example:
from sparkly.test import MysqlFixture, SparklyTest
class MyTestCase(SparklyTest):
...
fixtures = [
MysqlFixture('mysql.host',
'user',
'password',
'/path/to/setup_data.sql',
'/path/to/remove_data.sql')
]
...
-
class
sparkly.testing.
CassandraFixture
(host, setup_file, teardown_file)[source]¶ Fixture to load data into cassandra.
Notes
- Depends on cassandra-driver.
Examples
>>> class MyTestCase(SparklyTest): ... fixtures = [ ... CassandraFixture( ... 'cassandra.host', ... absolute_path(__file__, 'resources', 'setup.cql'), ... absolute_path(__file__, 'resources', 'teardown.cql'), ... ) ... ] ...
>>> class MyTestCase(SparklyTest): ... data = CassandraFixture( ... 'cassandra.host', ... absolute_path(__file__, 'resources', 'setup.cql'), ... absolute_path(__file__, 'resources', 'teardown.cql'), ... ) ... def setUp(self): ... data.setup_data() ... def tearDown(self): ... data.teardown_data() ...
>>> def test(): ... fixture = CassandraFixture(...) ... with fixture: ... test_stuff() ...
-
class
sparkly.testing.
ElasticFixture
(host, es_index, es_type, mapping=None, data=None, port=None)[source]¶ Fixture for elastic integration tests.
Notes
- Data upload uses bulk api.
Examples
>>> class MyTestCase(SparklyTest): ... fixtures = [ ... ElasticFixture( ... 'elastic.host', ... 'es_index', ... 'es_type', ... '/path/to/mapping.json', ... '/path/to/data.json', ... ) ... ] ...
-
class
sparkly.testing.
Fixture
[source]¶ Base class for fixtures.
Fixture is a term borrowed from Django tests, it’s data loaded into database for integration testing.
-
class
sparkly.testing.
KafkaFixture
(host, port=9092, topic=None, key_serializer=None, value_serializer=None, data=None)[source]¶ Fixture for kafka integration tests.
Notes
- depends on kafka-python lib.
- json file should contain array of dicts: [{‘key’: ..., ‘value’: ...}]
Examples
>>> class MyTestCase(SparklyContext): ... fixtures = [ ... KafkaFixture( ... 'kafka.host', 'topic', ... key_serializer=..., value_serializer=..., ... data='/path/to/data.json', ... ) ... ]
-
class
sparkly.testing.
MysqlFixture
(host, user, password=None, data=None, teardown=None)[source]¶ Fixture for mysql integration tests.
Notes
- depends on PyMySql lib.
Examples
>>> class MyTestCase(SparklyTest): ... fixtures = [ ... MysqlFixture('mysql.host', 'user', 'password', '/path/to/data.sql') ... ] ... def test(self): ... pass ...
-
class
sparkly.testing.
SparklyGlobalContextTest
(methodName='runTest')[source]¶ Base test case that keeps a single instance for the given context class across all tests.
Integration tests are slow, especially when you have to start/stop Spark context for each test case. This class allows you to reuse Spark context across multiple test cases.
-
class
sparkly.testing.
SparklyTest
(methodName='runTest')[source]¶ Base test for spark scrip tests.
Initializes and shuts down Context specified in context param.
Example
>>> class MyTestCase(SparklyTest): ... def test(self): ... self.assertDataFrameEqual( ... self.hc.sql('SELECT 1 as one').collect(), ... [{'one': 1}], ... )
-
assertDataFrameEqual
(actual_df, expected_data, fields=None, ordered=False)[source]¶ Ensure that DataFrame has the right data inside.
Parameters: - actual_df (pyspark.sql.DataFrame|list[pyspark.sql.Row]) – Dataframe to test data in.
- expected_data (list[dict]) – Expected dataframe rows defined as dicts.
- fields (list[str]) – Compare only certain fields.
- ordered (bool) – Does order of rows matter?
-
context
¶ alias of
SparklyContext
-