Containerized Selenium Browser Automation Architecture With Azure Functions .NET 3.1 | Pt.1: Setup and Design

Table Of Contents

Overview

In this series, I’ll explore setting up an Azure Functions application that connects to a containerized Selenium server and interacts with its shell, all that in a serverless manner and following good OOP practices.

The main purpose of this article is to present a clean architecture, so I only briefly skim over the setup process. I’m assuming the basic knowledge of:

  • Azure Functions
  • .NET Core 3.1
  • Selenium
  • Docker

Download Selenium Image

If you don’t have Docker set up yet, follow this Get Started guide first.

Now, we need to download the image from this GitHub repo. Follow the Quick Start guide, but use the Chrome image and not Firefox (for the purpose of this demo). In essence, all we have to do is to run the following command in Powershell or Command Prompt on our machine:

docker run -d -p 4444:4444 --shm-size="2g" selenium/standalone-chrome

Once the image finishes downloading, you can confirm the list of images available locally on your machine by running a command:

docker image ls

You can also check currently running containers by using a command:

docker ps

Since we used the docker run command, the container should get started after the download is complete. If docker ps shows a running selenium container, visiting http://localhost:4444/ should now show you a nice little dashboard with one Chrome instance running. Notice we currently have 0 sessions on that node and that the container runs on Linux.

Create a New Functions App or Pull Boilerplate Code From GitHub

Create a new Azure Functions app via Visual Studio.

Choose HTTP Trigger, change Function Authorization Level to Anonymous and leave the Storage Emulator default value.

Alternatively, you can download the ready boilerplate code from my GitHub Repository

Functions App Architecture Overview

When writing any application, I always think of extendability. While Selenium might be a testing framework, I believe we can create some interesting, more complex browser automation apps when using the right design.

The structure can be visualized in the following way:

Design Visualization

This might look a little confusing at first, but let me explain what those labels mean:

We can distinguish 5 main elements of the application:

  1. Function class – This only uses an IBrowser instance – Chrome class in our case, which encapsulates all the functionality
  2. Browser – All operations on the browser happen here. It injects an INavigator interface..
  3. Navigator – A simple interface responsible for all the navigation between pages and multiple On Page Actions
  4. On-Page Action Service – encapsulates a concrete tasks on the pages. Each On Page Action will have one or more injected pages and some other services if needed.
  5. Page – Represents a single web page. The page class usually has “By” properties, which allow searching for selectors and interacting with them

Example: Chrome (browser class) has a method called GetScheduledReports(). The browser class method uses INavigator’s methods to navigate to the Scheduled Reports page and once on it, it calls DownloadScheduledReportsToLocalFilesystem() method that belongs to the ReportDownloader (on-page action service). The on-page action class method calls the page class methods to download all the needed, scheduled reports – ScheduledEmailReportsPage (page class) contains code to find the Download buttons by their selectors and click them.

Configure Application Startup and Register DI

Create a Startup class and paste the below code. It ensures:

using DockerFunctionsBoilerplate;
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Remote;
using System;

[assembly: FunctionsStartup(typeof(Startup))]
namespace DockerFunctionsBoilerplate
{
	public class Startup : FunctionsStartup
	{
		public override void Configure(IFunctionsHostBuilder builder)
		{
			// register dependency injection of RemoteWebDriver class:
			builder.Services.AddScoped(serviceProvider =>
			{
				// get IConfiguration instance:
				var config = serviceProvider
                      .GetRequiredService<IConfiguration>();

				// fetch container url from local.settings.json:
				var containerUri = new Uri(config["containerUrl"]);

				// you can register all your chrome.exe
				// startup options with the below class:
				var opts = new ChromeOptions();
				
				return new RemoteWebDriver(containerUri, opts);
			});
		}
	}
}

You’ll need to install the following packages:

Install-Package Microsoft.Azure.Functions.Extensions
Install-Package Selenium.WebDriver
Install-Package Microsoft.Extensions.Configuration.Abstractions -Version 3.1.20

In local.settings.json, under Values node, add:

"containerUrl":  "http://localhost:4444/wd/hub"

Business Rules Layer

If you’re following the new project creation, add a new project and call it YourProjectName.BRL.

Let’s adds some folders to it in an organized manner. I suggest the following:

YourProjectName.BRL
 ┣ 📂Browsers
 ┃ ┣ 📂Abstractions
 ┃ ┗ 📂Implementations
 ┣ 📂Extensions
 ┣ 📂OnPageActions
 ┃ ┣ 📂Abstractions
 ┃ ┗ 📂Implementations
 ┣ 📂Models
 ┣ 📂Pages
 ┃ ┣ 📂Abstractions
 ┃ ┗ 📂Implementations
 ┣ 📂Services
 ┃ ┣ 📂Abstractions
 ┃ ┗ 📂Implementations
 ┗ 📜YourProjectName.BRL.csproj

IBrowser – A High-Level Task

The purpose of IBrowser is to perform a concrete, high-level task. This could be, for example, GetSampleData(), yet, under the hood, to get that data, it will have to first navigate to the login page, fill the form, login, navigate to the sample data page, select some filters, and finally click the Download button. IBrowser doesn’t care about all that detail, though – its dependencies will take care of it.

Let’s create a simple example using one of the cool examples provided here.

Let’s pretend we have to

As I mentioned before, IBrowser is all about a high-level task, so the interface can have a method named GetMavenProjectTextFile(). Create a new file called IBrowser in YourProjectName.BRL.Browsers.Abstractions:

namespace DockerFunctionsBoilerplate.BRL.Browsers.Abstractions
{
	public interface IBrowser
	{
		Task GetMavenProjectTextFile();
	}
}

We’ll implement this interface next. In YourProjectName.BRL.Browsers.Implementations add a new file called Chrome.cs and make it implement IBrowser interface. Also, create an empty constructor – we’ll be injecting some services there soon.

So, what do we need to do in order to log in and download the file we want for this example?

  • Navigate to the login page
  • Log in
  • Navigate to the file download page
  • Download file

Your code should look something like this:

using DockerFunctionsBoilerplate.BRL.Browsers.Abstractions;
using System;

namespace DockerFunctionsBoilerplate.BRL.Browsers.Implementations
{
	public class Chrome : IBrowser
	{
		public Chrome()
		{

		}

		public async Task GetMavenProjectTextFile()
		{
			// Navigate to the login page > INavigator task
			// Log in > IFileDownloaderOnPageAction task
			// Navigate to the file download page > INavigator task
			// Download file > IFileDownloaderOnPageAction task
		}
	}
}

From the above, we can already see we’ll need 2 new interfaces:

  • INavigator
  • IFileDownloaderOnPageAction

Note: we should probably keep Log In Page and File Downloader as separate On-Page Actions, but for the sake of simplicity, we’ll just keep them in the same class

Let’s leave the Browser class for a moment and focus on what INavigator should be for us.

INavigator – Take Me Somewhere Nice!

RemoteWebDriver class provides an INavigate interface, but I’d like to extend it to perform some asynchronous logging and take screenshots of the current page. In other words, provide auditing capabilities. From my experience, running automated browser sessions on live websites can sometimes surprise you, so it’s good to have some sort of execution history and visuals handy.

Add a new interface called INavigator in your Browsers/Abstractions folder and paste the following:

using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.Browsers.Abstractions
{
	public interface INavigator
	{
		Task GoToAsync(string url);
		Task GoBackAsync();
		Task GoForwardAsync();
		Task RefreshAsync();
	}
}

The interface itself looks kind of plane and very much resemble the INavigate interface from the package, except for the asynchronous manner of execution.

However, the implementation (should be in Browsers/Implementations folder) of the above would look something like this:

using DockerFunctionsBoilerplate.BRL.Browsers.Abstractions;
using Microsoft.Extensions.Configuration;
using OpenQA.Selenium.Remote;
using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.Browsers.Implementations
{
	public class Navigator : INavigator
	{
		private readonly RemoteWebDriver _driver;
		private readonly IConfiguration _config;

		public Navigator(RemoteWebDriver driver,
						 IConfiguration config)
		{
			_driver = driver;
			_config = config;
		}


		public async Task GoToAsync(string url)
		{
			_driver.Navigate().GoToUrl(url);
			await HandleAudit();
		}

		public async Task GoBackAsync()
		{
			_driver.Navigate().Back();
			await HandleAudit();
		}

		public async Task GoForwardAsync()
		{
			_driver.Navigate().Forward();
			await HandleAudit();
		}

		public async Task RefreshAsync()
		{
			_driver.Navigate().Forward();
			await HandleAudit();
		}

		private async Task HandleAudit()
		{
			if (bool.TryParse(_config["isAuditActive"], out var isAuditActive))
			{
				// take a screenshot > IAuditService ?
				// log message async > IAuditService ?
			}
		}
	}
}

As you can see, we let the user control if they get the audit logs or just plain INavigate methods executed. The HandleAudit method determines if we should be logging the outputs based on the isAuditActive config variable. We have to add it in our local.setting.json file too (within Values node):

"isAuditActive":  true

There’s no logic for auditing here, as it’s not the purpose of this tutorial, but it’s available under IAuditService interface in the boilerplate code here.

We can now register INavigator dependency injection in Startup class like so (remember to add BRL project reference and adequate using directives):

builder.Services.AddScoped<INavigator, Navigator>();

Finally, we can inject INavigator into the Chrome class. To get the URLs to navigate to, we’ll use IConfiguration again. We can fill in some blanks in the Chrome class since we have our INavigator service. The Chrome class should look like this now:

using DockerFunctionsBoilerplate.BRL.Browsers.Abstractions;
using Microsoft.Extensions.Configuration;
using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.Browsers.Implementations
{
	public class Chrome : IBrowser
	{
		private readonly IConfiguration _config;
		private readonly INavigator _nav;
		public Chrome(IConfiguration config, INavigator nav)
		{
			_config = config;
			_nav = nav;
		}

		public async Task GetMavenProjectTextFile()
		{
			// Navigate to the login page > INavigator task
			var loginPageUrl = _config["loginPageUrl"];
			await _nav.GoToAsync(loginPageUrl);

			// Log in > IFileDownloaderOnPageAction task
			// TODO

			// Navigate to the file download page > INavigator task
			var fileDownloadPageUrl = _config["fileDownloadPageUrl"];
			await _nav.GoToAsync(fileDownloadPageUrl);

			// Download file > IFileDownloaderOnPageAction task
			// TODO
		}
	}
}

We’re able to move around the internet, and (in theory) provide some auditing, but we can’t interact with any elements yet. Let’s look at that next.

On-Page Actions – Where Things Happen

While INavigator can take you to a specific URL and perform actions like Forward and Back, On-Page Actions are designed to control the flow of users actions. Let me give you some concrete examples.

  • When you open your browser and want to navigate to Google, you use your address bar and type google.com to navigate to it. Think of INavigator as something like your navigation bar up top.
  • Once you’re at Google, you leave the address bar alone and start typing your search term and once you’re done, you press enter. This takes you to the SERP page, on which you then click the result that interest you. Those actions (typing search query, pressing enter to display the results and clicking on the chosen result) would be an On-Page Action. That On-Page class would contain 3 pages – GooglePage, SERPPage and ChosenResultPage.

The actual class would look more or less like this (in this example, we want to fetch HTML from the first result page for a given keyword):

    public class FindResultOnGoogleOnPageAction
    {
		private readonly IGooglePage _googlePage;
		private readonly IGoogleSERPPage _SERPPage;
		private readonly IChosenResultPage _chosenResultPage;
		public FindResultOnGoogleOnPageAction(IGooglePage googlePage,
									    IGoogleSERPPage SERPPage,
										IChosenResultPage chosenResultPage)
		{
			_googlePage = googlePage;
			_SERPPage = SERPPage;
			_chosenResultPage = chosenResultPage;
		}

		public void GetHTMLForBestRankingResultOnGoogleForGivenKeyword
             (string keyword)
		{
			_googlePage.PerformSearch(keyword);
			_SERPPage.ClickFirstPositionResult();
			_chosenResultPage.GetHTML();
		}
    }

Ok, I hope I have shed some light on the concept of On-Page Actions and we can focus on the service that we need – IFileDownloaderOnPageAction

Add a new interface called IFileDownloaderOnPageAction in your OnPageActions/Abstractions folder and paste the following:

using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.OnPageActions.Abstractions
{
	public interface IFileDownloaderOnPageAction
	{
		Task SignIn();
		Task DownloadTheFile(string fileName);
	}
}

As we established in the implementation of IBrowser, we need the service to perform two actions – sign in and download the file. We can give it some extendability and let the user tell the automation which file he wants.

Add a new file in OnPageActions/Implementations folder called FileDownloaderOnPageAction and paste the following:

using DockerFunctionsBoilerplate.BRL.OnPageActions.Abstractions;
using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.OnPageActions.Implementations
{
	public class FileDownloaderOnPageAction : IFileDownloaderOnPageAction
	{
		private readonly ILoginPage _loginPage;
		private readonly IFileDownloadPage _fileDownloadPage;
		private readonly ILocalFileStorageManager _cache;

		public FileDownloaderOnPageAction(ILoginPage loginPage,
							IFileDownloadPage fileDownloadPage,
							ILocalFileStorageManager cache)
		{
			_loginPage = loginPage;
			_fileDownloadPage = fileDownloadPage;
			_cache = cache;
		}

		public async Task SignIn()
		{
			_loginPage.LogIn();
		}

		public async Task DownloadTheFile(string fileName)
		{
			var filePath = _fileDownloadPage.DownloadFile(fileName);
			await _cache.SaveFileLocallyAs(filePath, destFileName);
		}
	}
}

So, what happens here? There are a few interfaces here we don’t yet have:

  • ILoginPage – it’s going to be a single page and any actions on that page
  • IFileDownloadPage – same as above
  • ILocalFileStorageManager – I’ve added that to showcase that OnPageActions can also have other services within them to help with stuff that happens outside of our Docker container and the browser. For example. IFileDownloadPage is only a page and its actions. It can click a Download button on the page, but that’s about it. But our method can return a known path to the file where they get downloaded to in the Docker container, and then, our ILocalFileStorageManager can take care of accessing that file and save it anywhere we want (i.e. Azure File Share, or Blob Container) once we know it’s downloaded.

We can now register the OnPageAction service in our DI container in the Startup class:

builder
    .Services
    .AddScoped<IFileDownloaderOnPageAction, FileDownloaderOnPageAction>();

Let’s fill the gaps in our Chrome class, its final state should look like this:

using DockerFunctionsBoilerplate.BRL.Browsers.Abstractions;
using DockerFunctionsBoilerplate.BRL.OnPageActions.Abstractions;
using Microsoft.Extensions.Configuration;
using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate.BRL.Browsers.Implementations
{
	public class Chrome : IBrowser
	{
		private readonly IConfiguration _config;
		private readonly INavigator _nav;
		private readonly IFileDownloaderOnPageAction _fileDownloaderOPA;

		public Chrome(IConfiguration config, INavigator nav, IFileDownloaderOnPageAction fileDownloaderOPA)
		{
			_config = config;
			_nav = nav;
			_fileDownloaderOPA = fileDownloaderOPA;
		}

		public async Task GetMavenProjectTextFile()
		{
			// Navigate to the login page > INavigator task
			var loginPageUrl = _config["loginPageUrl"];
			await _nav.GoToAsync(loginPageUrl);

			// Log in > IFileDownloaderOnPageAction task
			_fileDownloaderOPA.SignIn();

			// Navigate to the file download page > INavigator task
			var fileDownloadPageUrl = _config["fileDownloadPageUrl"];
			await _nav.GoToAsync(fileDownloadPageUrl);

			// Download file > IFileDownloaderOnPageAction task
			var fileName = "maven-project.txt";
			_fileDownloaderOPA.DownloadTheFile(fileName);
		}
	}
}

We also need to add the links to local.settings.json:

So, we now have all the high-level instructions and it’s time to get onto the Pages themselves! Let’s do that next.

Page – An Isolated Container For a Single Webpage

Remember the Google search example I gave earlier? Let’s focus on the IGooglePage interface. From the example, we can gather it needs a method called PerformSearch which takes a string keyword as a parameter. So how would the implementation of IGooglePage look like? Consider the below:

	public class GooglePage : IGooglePage
	{
		private By SearchBoxXPath { get; } = By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input");

		private readonly RemoteWebDriver _driver;
		
		public GooglePage(RemoteWebDriver driver)
		{
			_driver = driver;
		}


		public Task PerformSearch(string keyword)
		{
			// find search box element:
			var searchBox = _driver.FindElement(SearchBoxXPath);

			// focus on the box and clear the input if any:
			searchBox.Click();
			searchBox.Clear();

			// input keyword:
			searchBox.SendKeys(keyword);

			// press enter:
			searchBox.SendKeys(Keys.Enter);
		}
	}

Pages follow a few rules:

  • A page class represents an isolated, single webpage and any actions we perform on that webpage
  • Each page should have an injected instance of RemoteWebDriver available for page manipulation
  • All the By or other search element input string literals should be either stored on top of the class with easy access and meaningul naming (like SearchBoxXPath above), or, preferrably, in the app config, since those paths tend to change from time to time and storing them in config instead of hardcoding them will allow you to make changes without re-deployment

Note: If you don’t know how to find an element within a DOM, here’s an example how to find XPath of an element

I hope you get the idea behind the pages now. Let’s implement the pages we need for our little project.

Add two new interfaces called ILoginPage and IFileDownloadPage in your pages/Abstractions folder and paste the following:

// ILoginPage:
namespace DockerFunctionsBoilerplate.BRL.Pages.Abstractions
{
	public interface ILoginPage
	{
		void LogIn();
	}
}


// IFileDownloader:
namespace DockerFunctionsBoilerplate.BRL.Pages.Abstractions
{
	public interface IFileDownloadPage
	{
		void DownloadFile(string fileName);
	}
}

Next, let’s implement the ILoginPage interface in Pages/Implementations folder:

using DockerFunctionsBoilerplate.BRL.Pages.Abstractions;
using Microsoft.Extensions.Configuration;
using OpenQA.Selenium;
using OpenQA.Selenium.Remote;

namespace DockerFunctionsBoilerplate.BRL.Pages.Implementations
{
	public class LoginPage : ILoginPage
	{
		public By LoginTextBoxXPath { get; } = By.XPath("/html/body/div[2]/div/div/form/div[1]/div/input");
		public By PasswordTextBoxXPath { get; } = By.XPath("/html/body/div[2]/div/div/form/div[2]/div/input");
		public By LoginButtonXPath { get; } = By.XPath("/html/body/div[2]/div/div/form/button");


		private readonly RemoteWebDriver _driver;
		private readonly IConfiguration _config;
		public LoginPage(RemoteWebDriver driver, IConfiguration config)
		{
			_driver = driver;
			_config = config;
		}

		public void LogIn()
		{
			// find login box and input login:
			var loginBox = _driver.FindElement(LoginTextBoxXPath);
			loginBox.SendKeys(_config["login"]);

			// find password box and input password:
			var passwordBox = _driver.FindElement(PasswordTextBoxXPath);
			passwordBox.SendKeys(_config["password"]);

			// find auth form submission button and click it:
			var btn = _driver.FindElement(LoginTextBoxXPath);
			btn.Click();
		}
	}
}

We’re using login and password fields from app config (not a great idea in production to store your passwords this way – always use a vault!), so we have to add those to local.settings.json:

// those values are provided on the page we'll be logging in to:
// http://the-internet.herokuapp.com/login
"login": "tomsmith",
"password": "SuperSecretPassword!"

We also need the IFileDownloadPage implementation:


using DockerFunctionsBoilerplate.BRL.Pages.Abstractions;
using OpenQA.Selenium.Remote;

namespace DockerFunctionsBoilerplate.BRL.Pages.Implementations
{
	public class FileDownloadPage : IFileDownloadPage
	{
		private readonly RemoteWebDriver _driver;

		public FileDownloadPage(RemoteWebDriver driver)
		{
			_driver = driver;
		}

		public void DownloadFile(string fileName)
		{
			// file the file download link by it's name:
			var file = _driver.FindElementByLinkText(fileName);

			// download file:
			file.Click();
		}
	}
}

Here, we don’t need any class properties to hold any CSS selectors – we’re passing the fileName directly as a parameter to the DownloadFile method and in this case we can use a FindElementByLinkText method because the file name just happens to be a link text too.

We have to make sure all our services are registered within our DI container. The entirety of the Configure method in the Startup class should look like this:

public override void Configure(IFunctionsHostBuilder builder)
{
	// register dependency injection of RemoteWebDriver class:
	builder.Services.AddScoped(serviceProvider =>
	{
		var config = serviceProvider
				.GetRequiredService<IConfiguration>();
		var containerUri = new Uri(config["containerUrl"]);
		var opts = new ChromeOptions();

		return new RemoteWebDriver(containerUri, opts);
	});

	// browsers: 
	builder.Services.AddScoped<IBrowser, Chrome>();
	builder.Services.AddScoped<INavigator, Navigator>();

	// on-page actions:
	builder
		.Services
		.AddScoped
			<IFileDownloaderOnPageAction, FileDownloaderOnPageAction>();

	// pages:
	builder.Services.AddScoped<ILoginPage, LoginPage>();
	builder.Services.AddScoped<IFileDownloadPage, FileDownloadPage>();
}

Test Our Code

Exciting! We can now test our code.

Remember that Azure Function you created? You should have a function called Run in Function1.cs. Let’s change it a little. Remove the static keywords and inject the IBrowser interface. Then, call the GetMavenProjectTextFile() method:

using DockerFunctionsBoilerplate.BRL.Browsers.Abstractions;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using System.Threading.Tasks;

namespace DockerFunctionsBoilerplate
{
	public class Function1
	{
		private readonly IBrowser _browser;

		public Function1(IBrowser browser)
		{
			_browser = browser;
		}

		[FunctionName("Function1")]
		public async Task<IActionResult> Run(
			[HttpTrigger(AuthorizationLevel.Anonymous,
            "get", "post", Route = null)] HttpRequest req,
			ILogger log)
		{
			await _browser.GetMavenProjectTextFile();
			return new OkResult();
		}
	}
}

Done! All that’s left to do is run your Docker container and Azure Functions app, and paste the address of the function in your browser to make a GET request (or use Fiddler/Postman/any other tool). My default address is:

http://localhost:7071/api/Function1

Conclusion

I have shown you an example of a clean architecture for the Selenium Browser Automation application. I’ve introduced some container concepts to help categorize and abstract your app elements, namely:

  • Browser
  • Navigator
  • On-Page Action
  • Page

This solution can be deployed as a serverless Azure Functions app connecting to an Azure Container Instance > Container Registry, which makes it a scalable solution for any application size.

Hopefully, you found this article useful. If I made any mistakes, or you have better ideas for the code structure presented above, feel free to comment, or get in touch.

Default image
Pawel Flajszer
Articles: 5

Leave a Reply