23 Apr 2021 · Software Engineering

    Generating Fake Data for Python Unit Tests with Faker

    8 min read
    Contents

    Introduction

    When writing unit tests, you might come across a situation where you need to generate test data or use some dummy data in your tests. If you already have some data somewhere in a database, one solution you could employ is to generate a dump of that data and use that in your tests (i.e. fixtures).

    However, you could also use a package like faker to generate fake data for you very easily when you need to. This tutorial will help you learn how to do so in your unit tests.

    Prerequisites

    For this tutorial, it is expected that you have Python 3.6 and Faker 0.7.11 installed.

    Basic Examples in the Command Line

    Let’s see how this works first by trying out a few things in the shell.

    Before we start, go ahead and create a virtual environment and run it:

    $ python3 -m venv faker
    $ source faker/bin/activate

    Once in the environment, install faker.

    $ pip install faker

    After that, enter the Python REPL by typing the command python in your terminal.

    Once in the Python REPL, start by importing Faker from faker:

    Python 3.6.0 (default, Jan  4 2017, 15:38:35)
    [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from faker import Faker
    >>>

    Then, we are going to use the Faker class to create a myFactory object whose methods we will use to generate whatever fake data we need.

    >>> myFactory = Faker()

    Let’s generate a fake text:

    >>> myFactory.text()
    'Commodi quidem ipsam occaecati. Porro veritatis numquam nisi corrupti.'

    As you can see some random text was generated. Yours will probably look very different.

    Let us try a few more examples:

    >>> myFactory.words()
    ['libero', 'commodi', 'deleniti']
    >>> myFactory.name()
    'Joshua Wheeler'
    >>> myFactory.month()
    '04'
    >>> myFactory.sentence()
    'Iure expedita eaque at odit soluta repudiandae nam.'
    >>> myFactory.state()
    'Michigan'
    >>> myFactory.random_number()
    2950548

    You can see how simple the Faker library is to use. Once you have created a factory object, it is very easy to call the provider methods defined on it. You should keep in mind that the output generated on your end will probably be different from what you see in our example — random output.

    If you would like to try out some more methods, you can see a list of the methods you can call on your myFactory object using dir.

    >>> dir(myFactory)

    You can also find more things to play with in the official docs.

    Integrating Faker with an Actual Unit Test

    Let’s now use what we have learnt in an actual test.

    If you are still in the Python REPL, exit by hitting CTRL+D. Do not exit the virtualenv instance we created and installed Faker to it in the previous section since we will be using it going forward.

    Now, create two files, example.py and test.py, in a folder of your choice. Our code will live in the example file and our tests in the test file.

    Look at this code sample:

    # example.py
    
    class User:
      def __init__(self, first_name, last_name, job, address):
        self.first_name = first_name
        self.last_name = last_name
        self.job = job
        self.address = address
    
      @property
      def user_name(self):
        return self.first_name + ' ' + self.last_name
    
      @property
      def user_job(self):
        return self.user_name + " is a " + self.job
    
      @property
      def user_address(self):
        return self.user_name + " lives at " + self.address

    This code defines a User class which has a constructor which sets attributes first_name, last_name, job and address upon object creation.

    It also defines class properties user_name, user_job and user_address which we can use to get a particular user object’s properties.

    In our test cases, we can easily use Faker to generate all the required data when creating test user objects.

    # test.py
    
    import unittest
    
    from faker import Faker
    
    from example import User
    
    class TestUser(unittest.TestCase):
        def setUp(self):
            self.fake = Faker()
            self.user = User(
                first_name = self.fake.first_name(),
                last_name = self.fake.last_name(),
                job = self.fake.job(),
                address = self.fake.address()
            )
    
        def test_user_creation(self):
            self.assertIsInstance(self.user, User)
    
        def test_user_name(self):
            expected_username = self.user.first_name + " " + self.user.last_name
            self.assertEqual(expected_username, self.user.user_name)

    You can see that we are creating a new User object in the setUp function. Python calls the setUp function before each test case is run so we can be sure that our user is available in each test case.

    The user object is populated with values directly generated by Faker. We do not need to worry about coming up with data to create user objects. Faker automatically does that for us.

    We can then go ahead and make assertions on our User object, without worrying about the data generated at all.

    You can run the example test case with this command:

    $ python -m unittest

    At the moment, we have two test cases, one testing that the user object created is actually an instance of the User class and one testing that the user object’s username was constructed properly. Try adding a few more assertions.

    Localization

    Faker comes with a way of returning localized fake data using some built-in providers. Some built-in location providers include English (United States), Japanese, Italian, and Russian to name a few.

    Let’s change our locale to to Russia so that we can generate Russian names:

    # example.py
    
    from faker import Factory
    
    myGenerator = Factory.create('ru_RU')
    
    print(myGenerator.name())

    In this case, running this code gives us the following output:

    > python example.py
    Мельникова Прасковья Андреевна

    Providers

    Providers are just classes which define the methods we call on Faker objects to generate fake data. In the localization example above, the name method we called on the myGenerator object is defined in a provider somewhere. You can see the default included providers here.

    Let’s create our own provider to test this out.

    # example.py
    
    import random
    
    from faker import Faker
    from faker.providers import BaseProvider
    
    fake = Faker()
    
    # Our custom provider inherits from the BaseProvider
    class TravelProvider(BaseProvider):
        def destination(self):
            destinations = ['NY', 'CO', 'CA', 'TX', 'RI']
    
            # We select a random destination from the list and return it
            return random.choice(destinations)
    
    # Add the TravelProvider to our faker object
    fake.add_provider(TravelProvider)
    
    # We can now use the destination method:
    print(fake.destination())

    To define a provider, you need to create a class that inherits from the BaseProvider. That class can then define as many methods as you want. Our TravelProvider example only has one method but more can be added.

    Once your provider is ready, add it to your Faker instance like we have done here:

    fake.add_provider(TravelProvider)

    Here is what happens when we run the above example:

    $ python example.py
    CA

    Of course, you output might differ. Try running the script a couple times more to see what happens.

    Seeds

    Sometimes, you may want to generate the same fake data output every time your code is run. In that case, you need to seed the fake generator.

    You can use any random number as a seed.

    Example:

    # example.py
    
    from faker import Faker
    
    myGenerator = Faker()
    
    myGenerator.random.seed(5467)
    
    for i in range(10):
        print(myGenerator.name())

    Running this code twice generates the same 10 random names:

    > python example.py
    Denise Reed
    Megan Douglas
    Philip Obrien
    William Howell
    Michael Williamson
    Cheryl Jackson
    Janet Bruce
    Colton Martin
    David Melton
    Paula Ingram
    > python example.py
    Denise Reed
    Megan Douglas
    Philip Obrien
    William Howell
    Michael Williamson
    Cheryl Jackson
    Janet Bruce
    Colton Martin
    David Melton
    Paula Ingram

    If you want to change the output to a different set of random output, you can change the seed given to the generator.

    Using Faker on Semaphore

    To use Faker on Semaphore, make sure that your project has a requirements.txt file which has faker listed as a dependency. If you used pip to install Faker, you can easily generate the requirements.txt file by running the command pip freeze > requirements.txt. This will output a list of all the dependencies installed in your virtualenv and their respective version numbers into a requirements.txt file.

    After pushing your code to git, you can add the project to Semaphore, and then configure your build settings to install Faker and any other dependencies by running pip install -r requirements.txt. That command simply tells Semaphore to read the requirements.txt file and add whatever dependencies it defines into the test environment.

    After that, executing your tests will be straightforward by using python -m unittest discover.

    Conclusion

    In this tutorial, you have learnt how to use Faker’s built-in providers to generate fake data for your tests, how to use the included location providers to change your locale, and even how to write your own providers.

    We also covered how to seed the generator to generate a particular fake data set every time your code is run.

    Lastly, we covered how to use Semaphore’s platform for Continuous Integration.

    Feel free to leave any comments or questions you might have in the comment section below.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Amos Omondi
    Writen by: