Weights and Biases at Digital Turbine

The Data Science research team at Digital Turbine has been working on a new product, the core component of which was an image classification DL model.


When the team jumped into this task, our first thought was “What shall we use as our experiment management tool?”


For the POC phase, we thought that it would be great for us to try out some new tools that we had never tried before.  This would give us some real value in deciding how we should move forward in terms of development.


We basically wanted a tool that would answer the following needs:

  • Visualize experiment results in real-time
  • Allow convenient comparison of experiment runs
  • Allow us an automatic method for Hyperparameter Tuning
  • Will be able to easily connect our code base and our development framework

Based on the above needs, we decided to choose Weights and Biases (W&B).


Let’s take a look at what this tool offers and how it can help us reach our goals.

The following excerpt is taken from W&B documentation:


Weights & Biases is the machine learning platform for developers to build better models faster. Use W&B’s lightweight, interoperable tools to quickly track experiments, version and iterate on datasets, evaluate model performance, reproduce models, visualize results and spot regressions, and share findings with colleagues.


The bold parts are exactly the things we wanted to achieve.


For any Data Science use case, there are 3 main parts that we should take into account while developing:

  1. Data
  2. Model
  3. Code


For our use case, the data is loaded into PyTorch Data Loaders, the model is built based on PyTorch Lightning, and our code is based on these two technologies, alongside some pure Python.  Luckily, W&B has an integration with all of these frameworks.


Within our POC with this tool, we wanted to make sure that:

  • We are totally aligned with regards to using the tool
  • Specifically for our use case, we wanted to make sure we try many combinations of our model, to improve our accuracy and reduce some complexity that may not be required.


This obviously includes Hyperparameter Tuning, as well as some basic runs.

W&B Basic Runs

Here, the goal was to start the process and track a number of “basic” runs that will offer us a sense of how our model performs.


Using W&B, we managed to do so quickly, easily, and most importantly – we were able to get a decision regarding how we proceed to a “deeper” phase of Hyperparameter Tuning.


Below, you can see a graph that demonstrates some runs of different models, together with tracking a specific metric we are most interested in observing (validation GAP – Global Average Precision):

The integration is pretty straightforward, and includes two main parts:

  • Defining a Project – a “project” within W&B is what contains several model runs, and is eventually intended to contain all of the information about the model development phase
  • Defining a Run – once we have a project, we can define a run which will include the “infrastructure” of what we plan to track:
    • Parameters during epochs – includes the information about the current run, specifically – with which parameters did this model run with
    • Run name (e.g. model=ResNet__date=2022-03-29_15_30_00) – a unique name to help you analyze that particular run
    • Metrics during epochs – this may be the most important monitoring part. It includes the metrics that you want to visualize – e.g. accuracy, loss, several evaluation functions, you name it. To add a metric, just invoke a log method that is based on the W&B logger

Hyperparameter Usage

W&B has a component called “Sweeps”.  In a word, a “sweep” is a hyperparameter automation process, whereby a developer can easily dictate the strategy of hyperparameter tuning, with great “space search” capabilities, mainly:

  • Grid search (to check all possible combinations of a given model)
  • Random search (to add some “randomization” salt on top of the params search)
  • Bayesian search – a statistical method using the Bayes theorem to minimize / maximize a given function


In our case, our hyperparameters are focusing on the engineering side of the table, e.g.:

  • Embedding size of the “features” model – In our use case, embedding size is the size of the vector that represents a given image and a given class. Our goal is to make it as small as possible, while maintaining a great level of accuracy, alongside some other model evaluation metrics. If we keep it small, this means lower amount of memory to save/load, thus faster performance on the inference
  • Precision size – The main goal of this parameter is performance, relevant for PyTorch Lightning. Lightning supports either double precision (64), full precision (32), or half precision (16) training, to store the parameter values.
    Half precision, or mixed precision, is the combined use of 32 and 16 bit floating points to reduce memory footprint during model training
  • “Backbone” of a given model – Backbone is a term used in DL models/papers to refer to the feature extractor network.
    These feature extractor networks compute features from the input image.
  • We acknowledge that the lower the GPU memory footprint is, the more we can squeeze out of our resources:
    • The faster a given model can be applied for a batch of data samples
    • Alternatively, instead of squeezing more data in parallel, a smaller GPU can be used, or more GPU required work can be submitted in parallel


However, just comparing GPU memory usage is not enough to understand the performance of the model – we must also track the actual runtime of each model application (during training or validation), as some models could be the least GPU resources-consuming, yet perform overall with the slowest computation flow.

  • Of course, there are other relevant system metrics we leave for the engineering team to focus on and optimize, e.g. CPU core usage, disk utilization and even network traffic

For this reason, and since we had time and resources for the experiment, we wanted to make sure we cover most of the runs, so below you can find a snapshot of one of our “Sweeps”.


Using this graph, we can easily understand which model is pushing us towards better accuracy, and reach a data-based decision.



To wrap up, Weights and Biases allowed us to perform a very fast and successful POC in our project.  We are positive that using it in the long-term will have a great impact on our speed and ease of DL models development within Digital Turbine!


This post was co-written by Dani Kogan and Daniel Hen.

Read these next