23 Apr 2021 · Software Engineering

    Getting Started with Behavior Testing in Python with Behave

    13 min read
    Contents

    Introduction

    Behavior testing simply means that we should test how an application behaves in certain situations. Often the behavior is given to us developers by our customers. They describe the functionality of an application, and we write code to meet their specifications. Behavioral tests are a tool to formalize their requirements into tests. This leads naturally to behavior-driven development (BDD).

    After completing this tutorial, you should be able to:

    • Explain the benefits of behavior testing
    • Explain the “given”, “when”, and “then” phases of Behave
    • Write basic behavioral tests using Behave
    • Write parameterized behavioral tests using Behave

    Prerequisites

    Before starting, make sure you have the following installed:

    Setting Up Your Environment

    This tutorial will walk you through writing tests for and coding a feature of a Twenty-One (or “Blackjack”) game. Specifically, we’ll be testing the logic for the dealer. To get started, create a root directory where your code will go, and then create the following directories and blank files:

    .
    ├── features
    │   ├── dealer.feature
    │   └── steps
    │       └── steps.py
    └── twentyone.py

    Here’s a brief explanation of the files:

    • dealer.feature: The written out tests for the dealer feature.
    • steps.py: The code that runs the tests in dealer.feature.
    • twentyone.py: The implementation code for the dealer feature.

    Writing Your First Test

    Although behavioral tests do not require test-driven development, the two methodologies go hand-in-hand. We’ll approach this problem from a test-driven perspective, so instead of jumping to code, we’ll start with the tests.

    Writing the Scenario

    Open dealer.feature and add the following first line:

    Feature: The dealer for the game of 21
    

    This line describes the feature. In a large application, you would have many features. Next, we’ll add a test. The first test will be simple — when the round starts, the dealer should deal itself two cards. The word Behave uses to define a test is “Scenario”, so go ahead and add the following line:

    Scenario: Deal initial cards
    

    Before we write more, we need to understand the three phases of a basic Behave test: “Given”, “When”, and “Then”. “Given” initializes a state, “When” describes an action, and “Then” states the expected outcome. For this test, our state is a new dealer object, the action is the round starting, and the expected outcome is that the dealer has two cards. Here’s how this is translated into a Behave test:

    Scenario: Deal initial cards
      Given a dealer
      When the round starts
      Then the dealer gives itself two cards
    

    Notice that the three phases read like a normal English sentence. You should strive for this when writing behavioral tests because they are easily readable by anyone working in the code base.

    Now to see how Behave works, simply open a terminal in the root directory of your code and run the following command:

    behave

    You should see this output:

    Feature: The dealer for the game of 21 # features/dealer.feature:1
    
      Scenario: Deal initial cards             # features/dealer.feature:3
        Given a dealer                         # None
        When the round starts                  # None
        Then the dealer gives itself two cards # None
    
    
    Failing scenarios:
      features/dealer.feature:3  Deal initial cards
    
    0 features passed, 1 failed, 0 skipped
    0 scenarios passed, 1 failed, 0 skipped
    0 steps passed, 0 failed, 0 skipped, 3 undefined
    Took 0m0.000s
    
    You can implement step definitions for undefined steps with these snippets:
    [ The rest of output removed for brevity ]

    The key part here is that we have one failing scenario (and therefore a failing feature) that we need to fix. Below that, Behave suggests how to implement steps. You can think of a step as a task for Behave to execute. Each phase (“given”, “when”, and “then”) are all implemented as steps.

    Writing the Steps

    The steps that Behave runs are written in Python and they are the link between the descriptive tests in .feature files and the actual application code. Go ahead and open steps.py and add the following imports:

    from behave import *
    from twentyone import *

    Behave steps use annotations that match the names of the phases. This is the first step as described in the scenario:

    @given('a dealer')
    def step_impl(context):
        context.dealer = Dealer()

    It’s important to notice that the text inside of the annotation matches the scenario text exactly. If it doesn’t match, the test cannot run.

    The context object is passed from step to step, and it is where we can store information to be used by other steps. Since this step is a “given”, we need to initialize our state. We do that by creating a Dealer object, and attaching that object to the context. If you run behave again, you’ll see the test fails, but now for a different reason: We haven’t defined the Dealer class yet! Again, we have a failing test that is “driving” us to do work.

    Now we will open twentyone.py and create a Dealer class:

    class Dealer():
        pass

    Run behave once again to verify that we fixed the last error we saw, but that the scenario still fails because the “when” and “then” steps are not implemented. From here on, the tutorial will not explicitly state when you should run behave. But remember, the cycle is to write a test, see that it fails, and then write code to make the test pass.

    Here are the next steps to add to steps.py:

    @when('the round starts')
    def step_impl(context):
        context.dealer.new_round()
    
    
    @then('the dealer gives itself two cards')
    def step_impl(context):
        assert (len(context.dealer.hand) == 2)

    Again, the annotation text matches the text in the scenario exactly. In the “when” step, we have access to the dealer created in “given” and we can now call a method on that object. Finally, in the “then” step, we still have access to the dealer, and we assert that the dealer has two cards in its hand.

    We defined two new pieces of code that need to be implemented: new_round() and hand. Switch back to twentyone.py and add the following to the Dealer class:

    class Dealer():
        def __init__(self):
            self.hand = []
    
        def new_round(self):
            self.hand = [_next_card(), _next_card()]

    The _next_card() function will be defined as a top-level function of the module, along with a definition of the cards. At the top of the file, add the following:

    import random
    
    _cards = ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']
    
    
    def _next_card():
        return random.choice(_cards)

    Remember that random is not secure and should not be used in a real implementation of this game, but for this tutorial it will be fine.

    If you run behave now, you should see that the test passes:

    Feature: The dealer for the game of 21 # features/dealer.feature:1
    
      Scenario: Deal initial cards             # features/dealer.feature:3
        Given a dealer                         # features/steps/steps.py:5 0.000s
        When the round starts                  # features/steps/steps.py:9 0.000s
        Then the dealer gives itself two cards # features/steps/steps.py:14 0.000s
    
    1 feature passed, 0 failed, 0 skipped
    1 scenario passed, 0 failed, 0 skipped
    3 steps passed, 0 failed, 0 skipped, 0 undefined
    Took 0m0.000s

    Writing Tableized Tests

    Often when writing tests we want to test the same behavior against many different parameters and check the results. Behave makes this easier to do by providing tools to create a tableized test instead of writing out each test separately. The next game logic to test is that the dealer knows the point value of its hand. Here is a test that checks several scenarios:

    Scenario Outline: Get hand total
      Given a <hand>
      When the dealer sums the cards
      Then the <total> is correct
    
      Examples: Hands
      | hand          | total |
      | 5,7           | 12    |
      | 5,Q           | 15    |
      | Q,Q,A         | 21    |
      | Q,A           | 21    |
      | A,A,A         | 13    |
    

    You should recognize the familiar “given, when, then” pattern, but there’s a lot of differences in this test. First, it is called a “Scenario Outline”. Next, it uses parameters in angle brackets that correspond to the headers of the table. Finally, there’s a table of inputs (“hand”) and outputs (“total”).

    The steps will be similar to what we’ve seen before, but we’ll now get to use the parameterized steps feature of Behave.

    Here’s how to implement the new “given” step:

    @given('a {hand}')
    def step_impl(context, hand):
        context.dealer = Dealer()
        context.dealer.hand = hand.split(',')

    The angle brackets in the dealer.feature file are replaced with braces, and the hand parameter becomes an object that is passed to the step, along with the context.

    Just like before, we create a new Dealer object, but this time we manually set the dealer’s cards instead of generating them randomly. Since the hand parameter is a simple string, we split the parameter to get a list.

    Next, add the remaining steps:

    @when('the dealer sums the cards')
    def step_impl(context):
        context.dealer_total = context.dealer.get_hand_total()
    
    @then('the {total:d} is correct')
    def step_impl(context, total):
        assert (context.dealer_total == total)

    The “when” step is nothing new, and the “then” step should look familiar. If you’re wondering about the “:d” after the total parameter, that is a shortcut to tell Behave to treat the parameter as an integer. It saves us from manually casting with the int() function. Here’s a complete list of patterns that Behave accepts and if you need advanced parsing, you can define your own pattern.

    There’s many different approaches to summing values of cards, but here’s one solution to find the total of the dealer’s hand. Create this as a top-level function in the twentyone.py module:

    def _hand_total(hand):
        values = [None, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 10]
        value_map = {k: v for k, v in zip(_cards, values)}
    
        total = sum([value_map[card] for card in hand if card != 'A'])
        ace_count = hand.count('A')
    
        for i in range(ace_count, -1, -1):
            if i == 0:
                total = total + ace_count
            elif total + (i * 11) + (ace_count - i) <= 21:
                total = total + (i * 11) + ace_count - i
                break
    
        return total

    In short, the function maps the card character strings to point values, and sums the values. However, aces have to be handled separately because they can value 1 or 11 points.

    We also need to give the dealer the ability to total its cards. Add this function to the Dealer class:

    def get_hand_total(self):
        return _hand_total(self.hand)

    If you run behave now, you’ll see that each example in the table runs as its own scenario. This saves a lot of space in the features file, but still gives us rigorous tests that pass or fail individually.

    We’ll add one more tableized test, this time to test that the dealer plays by the rules. Traditionally, the dealer must play “hit” until he or she has 17 or more points. Add this scenario outline to test that behavior:

    Scenario Outline: Dealer plays by the rules
      Given a hand <total>
       when the dealer determines a play
       then the <play> is correct
    
      Examples: Hands
      | total  | play   |
      | 10     | hit    |
      | 15     | hit    |
      | 16     | hit    |
      | 17     | stand  |
      | 18     | stand  |
      | 19     | stand  |
      | 20     | stand  |
      | 21     | stand  |
      | 22     | stand  |
    

    Before we add the next steps, it’s important to understand that when using parameters, the order matters. Parameterized steps should be ordered from most restrictive to least restrictive. If you do not do this, the correct step may not be matched by Behave. To make this easier, group your steps by type. Here is the new given step, ordered properly:

    @given('a dealer')
    def step_impl(context):
        context.dealer = Dealer()
    
    ## NEW STEP
    @given('a hand {total:d}')
    def step_impl(context, total):
        context.dealer = Dealer()
        context.total = total
    
    
    @given('a {hand}')
    def step_impl(context, hand):
        context.dealer = Dealer()
        context.dealer.hand = hand.split(',')

    The typed parameter {total:d} is more restrictive than the untyped {hand}, so it must come earlier in the file.

    The new “when” step is not parameterized and can be placed anywhere, but, for readability, should be grouped with the other when steps:

    @when('the dealer determines a play')
    def step_impl(context):
        context.dealer_play = context.dealer.determine_play(context.total)

    Notice that this test expects a determine_play() method, which we can add to the Dealer class:

    def determine_play(self, total):
        if total < 17:
            return 'hit'
        else:
            return 'stand'

    Last, the “then” step is parameterized so it needs to also be ordered properly:

    @then('the dealer gives itself two cards')
    def step_impl(context):
        assert (len(context.dealer.hand) == 2)
    
    
    @then('the {total:d} is correct')
    def step_impl(context, total):
        assert (context.dealer_total == total)
    
    ## NEW STEP
    @then('the {play} is correct')
    def step_impl(context, play):
        assert (context.dealer_play == play)

    Putting Everything Together

    We’re going to add one final test that will tie together all of the code we’ve just written. We’ve proven to ourselves with tests that the dealer can deal itself cards, determine its hand total, and make a play separately, but there’s no code to tie this together. Since we are emphasizing test-driven development, let’s add a test for this behavior.

    Scenario: A Dealer can always play
      Given a dealer
      When the round starts
      Then the dealer chooses a play
    

    We already wrote steps for the “given” and “when” statements, but we need to add a step for “the dealer chooses a play.” Add this new step, and be sure to order it properly:

    @then('the dealer gives itself two cards')
    def step_impl(context):
        assert (len(context.dealer.hand) == 2)
    
    #NEW STEP
    @then('the dealer chooses a play')
    def step_impl(context):
        assert (context.dealer.make_play() in ['stand', 'hit'])
    
    
    @then('the {total:d} is correct')
    def step_impl(context, total):
        assert (context.dealer_total == total)

    This test relies on a new method make_play() that you should now add to the Dealer class:

    def make_play(self):
        return self.determine_play(self.get_hand_total())

    This method isn’t critical, but makes it easier to use the Dealer class.

    If you’ve done everything correctly, running behave should display all of the tests and give a summary similar to this:

    1 feature passed, 0 failed, 0 skipped
    16 scenarios passed, 0 failed, 0 skipped
    48 steps passed, 0 failed, 0 skipped, 0 undefined
    Took 0m0.007s

    Conclusion

    This tutorial walked you through setting up a new project with the Behave library and using test-driven development to build the code based off of behavioral tests.

    If you would like to get experience writing more tests with this project, try implementing a Player class and player.feature that plays with some basic strategy.

    To learn more about BDD and why you might want to adopt it, check out our article on Behavior-Driven Development.

    3 thoughts on “Getting Started with Behavior Testing in Python with Behave

    1. Hi Phillip,

      Thank you so much for this tutorial. It is very helpful, and easy to follow. I want to also be able to produce pretty test reports like Cucumber Java. Is there a plugin, a library, or a framework … anything equivalent in Python?

      Thanks

      1. I think you can use BehaveX, which allows you to execute tests in parallel, and comes with great reports (including an HTML report): https://github.com/hrcorval/behavex

        As BehaveX is implemented over the Behave framework, with this wrapper you should be able to replace the “behave” executable by “behavex”, and run the tests in parallel using the following commands:

        behavex -t @TAG –parallel-processes 4 –parallel-scheme scenario

        behavex -t @TAG –parallel-processes 3 –parallel-scheme feature

        Please, take a look at the documentation for more details. It is important to mention that not all Behave command line arguments are yet supported (but the most important ones)

        Hope it helps 🙂

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    Phillip Johnson is a software developer based in Columbus, Ohio. He blogs at Let's Talk Data and tweets @phillipcjohnson.