1. Introduction

This post belongs to a new series of posts where I intend to face the challenge of learning a new programming language, such as Python, by doing. There are tons of material about Python tutorials for every level of expertise. It has been quite a while now that platforms such as codewars, topcoders and similar have been around. The nice thing that I like about these platforms is that the trainee is faced with many tasks or challenges of increasing difficulty. The same task is often proposed for multiple languages (Python, Javascript, C++ and many others). Learn by doing is gaining popularity in recent years as it helps the trainee to figure out how and where he/she can get stacked. Solutions of other members are accessible only when an admissible solution has been successfully submitted to the automatic testing process.

So why should I write a blog post to show how to solve such kinds of tasks and how could it be useful to any extend to you? In line with the philosophical view of the above-mentioned platforms, I encourage you to read each single example description, step out of the blog post, try to work it out by yourself (without surfing Stack-overflow too much!) and coming back to the post to go through the proposed solution (something I tend to provide more than one) and the related comments. I have indeed struggled sometimes to fully capture other members’ solutions with no comments or details of the logic behind the code.

In this post we develop a Python code from scratch to determine the travelled distance of a bike rider in a given time span, if the annual target keeps increasing. It continues the previous post’s topic.

2. Description

James is a bike rider who challenges himself to cover a very long distance along a river path every year, as part of a life ritual. Some days he rides more than others depending on the weather, his energy and the road conditions. He challenges himself to increase the annual distance by a fixed amount of distance and calculate the overall travel length over the next 10 years. He has spent a year marking down his daily progress.

The monthly data are grouped into four seasonal lists.

The total distance of all the trips over a year will be used for estimating the number he might cover in N=10 years, as:

$$ D_{tot} = \sum_{i=0}^{N-1} \big(D_{year} + i\cdot \theta \big) $$

where $\theta$ is the fixed annual increment from second year on.

Input data structure is a list of the four seasons, each containing covered distance of every month within that season.

We need to make sure our solution takes into account all of the nesting within the input array.

3. Input data

We use the test data set (testSet) from the previous example.

print('Test set length is {}, while each example length is {}'.format(len(testSet), len(testSet[0])))
Test set length is 10, while each example length is 4

4. Reference function

This section is meant to define the procedure to calculate the output in the most basic Pythonic way, to be readable and 100% correct.

We take the first example, we use the powerful Python syntax to unpack a N-long list into N distinct variables, we concatenate all these lists into one (year), sum all the elements of the year list and save it to distance. We then sum this term over Nyear=10 years by adding the annual increment and print it out. Since the annual increment is with respect to the previous year, an iterator yy is introduced as shown in the problem definition. The sum ranges from 0 to Nyear-1, due to no increment in the first year, as does the range operator.

Concatenating lists is as simple as using the + operator.

example = testSet[0]
winter, spring, summer, autumn = example
year = winter + spring + summer + autumn
totDistance, distance = 0, 0
for month in year:
    distance += month
distIncrement = 100
Nyear = 10
for yy in range(Nyear):
    totDistance += distance + yy*distIncrement
print('Total distance travelled in the first example is {} km.'.format(totDistance))
Total distance travelled in the first example is 114810 km.

We embed this code into the reference function refFun(). The output function variable has to be declared within the return statement, while an additional argument is fed to the function to declare the annual distance increment.

def refFun(example, Nyear = 10, distIncrement=100):
    winter, spring, summer, autumn = example
    year = winter + spring + summer + autumn
    totDistance, distance = 0, 0
    for month in year:
        distance += month
    for yy in range(Nyear):
        totDistance += distance + yy*distIncrement
    return totDistance

Python lets you define the default value of any function argument, treating it as a parameter. It means that the function can be executed without feeding every argument, such as in this case, where the default value (100) is used:

print('Total distance travelled in the first example is {} km, when the annual distance increment is {}.'\
      .format(refFun(example), distIncrement))
Total distance travelled in the first example is 114810 km, when the annual distance increment is 100.

or it can be executed with a specific value, such as in the next case, where the default value is replaced with 0:

distIncrement = 0
print('Total distance travelled in the first example is {} km, when the annual distance increment is {}.'\
      .format(refFun(example, distIncrement), distIncrement))
Total distance travelled in the first example is 110310 km, when the annual distance increment is 0.

5. Output data

We need to repeat the process defined in the above function to the get the output for the entire testSet. We do not change the default value of the annual distance increment.

testResults = [refFun(example) for example in testSet]
example, result = testSet[0], testResults[0]

6. Function development

Since the input is a nested list, we can use two nested for-loops to process the overall distance. The solution can be written to be more compact and, in a long-term perspective, easier to maintain. We use the sum operator, which returns the sum of the numerical elements of a list. The nested list can be flattened into a standard list. The top-down order of original for-loop becomes a left-right process in the list comprehension. The constant term of annual travel distance is stored in distConst, while incremental annual distance term cumulated over Nyear=10 years is distIncrement.

def fun11(example, Nyear = 10, distIncrement=100):
    distConst = sum(month for season in example for month in season)
    distIncrement = sum(yy*distIncrement for yy in range(Nyear))
    return distConst*10 + distIncrement
print('Total distance, calculated with method 1, is {} km (ground-truth is {}).'.format(fun11(example), result))
Total distance, calculated with method 1, is 114810 km (ground-truth is 114810).

A further step into the single-line-of-code benchmark is to introduce the formula of a partial sum of the first $n$ natural numbers, which is a triangular number:

$$ \sum_{k=0}^n k = \frac{n\cdot(n+1)}{2} $$

In our case, we need to sum the first Nyear-1 years, so the final expression is:

$$ \sum_{k=0}^{Nyear-1} k = \frac{(Nyear-1)\cdot(Nyear-1+1)}{2} $$

$$ = \frac{(Nyear-1)\cdot Nyear}{2} $$

See this page for further details. The overall distance is:

$$ D_{tot} = Nyear\cdot D_{year} + \frac{Nyear\cdot(Nyear-1)}{2}\cdot D_{incr} $$

$$ D_{tot} = Nyear\cdot\big(D_{year} + \frac{Nyear-1}{2}\cdot D_{incr}\big) $$

def fun12(example, Nyear = 10, distIncrement=100):
    return Nyear*(sum(month for season in example for month in season) + (Nyear-1)/2*distIncrement)
print('Total distance, calculated with method 2, is {} km (ground-truth is {}).'.format(fun12(example), result))
Total distance, calculated with method 2, is 114810.0 km (ground-truth is 114810).

7. Function assessment

In this last section of the second task, we compare the performance of every developed function with the benchmark results. The final score is the percentage of successful tests obtained with that function.

I recall a list-comprehension method to get the score, embedded into an in-place scoring function using the powerful lambda operator.

This scoring function takes a function as input and returns its score over the test set.

scoring = lambda fun: sum(1 for example, result in zip(testSet, testResults) if fun(example) == result)/len(testSet)*100

We finally place the 2 developed functions into a list and prints the score associated to each of them. The enumerate iterator gives the 0-based index and the corresponding element of the input list.

funs = [fun11, fun12]
for kk, fun in enumerate(funs):
    print('Final score for function {} is {}%.'.format(str(kk+1), scoring(fun)))
Final score for function 1 is 100.0%.
Final score for function 2 is 100.0%.

We finally compare the performance of each function in terms of computational time. We use one of the built-in magic commands available in the Jupyter notebook, timeit, which estimates the execution time of a Python statement or expression. One (%timeit) or two (%%timeit) percent signs are required to evaluate one-line statement or one cell, respectively.

The first attempt is about $7\mu s$ slower than the reference function, while the second one is $10\mu s$ faster.

%%timeit
for example in testSet:
    fun11(example)
36.7 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for example in testSet:
    fun12(example)
18.8 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for example in testSet:
    refFun(example)
29.6 µs ± 478 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If the number of years increases up to Nyear=50, the first function and the reference function timing match.

%%timeit
for example in testSet:
    fun11(example, 50)
76.8 µs ± 3.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for example in testSet:
    refFun(example, 50)
78 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If the number of years increases up to Nyear=100, the first function becomes faster than the reference function by $ 9\mu s$.

%%timeit
for example in testSet:
    fun11(example, 100)
134 µs ± 3.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for example in testSet:
    refFun(example, 100)
143 µs ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In general, these two implementations lack performance and are not recommended to use in practise, since their timing is affected by the Nyear parameter. It does not hold for the second attempt, fun12, whose average computational time remains close to the initial value ($22\mu s$ vs $18\mu s$).

%%timeit
for example in testSet:
    fun12(example, 100)
22.3 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)