Introduction
In this exercise, we’ll try to get the daily hot topics from “Google Trends”. We’ll walk through the WebCrawlingHelper class, which simplifies the process of navigating web pages, finding elements, and extracting data.
Prerequisites
Before we begin, ensure you have the following:
- Visual Studio or any C# IDE.
- .NET SDK installed.
- Selenium WebDriver and ChromeDriver packages installed. You can add them via NuGet Package Manager:
Install-Package Selenium.WebDriver
Install-Package Selenium.WebDriver.ChromeDriver
Step 1: Setting Up the WebCrawlingHelper Class
Create the Helper Class
Create a new C# class file named WebCrawlingHelper.cs and copy the following code into it:
using System.Collections.Generic;
using System.Linq;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
namespace PRA.Util
{
public class WebCrawlingHelper
{
private IWebDriver driver;
private List<WebElementModel> elements;
public WebCrawlingHelper()
{
var options = new ChromeOptions()
{
BinaryLocation = "C:\Program Files\Google\Chrome\Application\chrome.exe"
};
options.AddArguments(new List<string>() {
"headless",
"disable-gpu"
});
driver = new ChromeDriver(options);
}
// Navigate to a URL
public void GoToUrl(string _url)
{
this.driver.Navigate().GoToUrl(_url);
this.elements = new List<WebElementModel>();
}
// Add a single web element
public void AddElement(By _by, string _elementName)
{
WebElementModel tempElement = new WebElementModel();
tempElement.ElementName = _elementName;
tempElement.Element = this.driver.FindElement(_by);
this.elements.Add(tempElement);
}
// Add multiple web elements
public void AddElements(By _by, string _elementName)
{
var tempElements = this.driver.FindElements(_by);
int i = 0;
foreach (IWebElement element in tempElements)
{
WebElementModel tempElement = new WebElementModel();
tempElement.ElementName = _elementName + "_" + i.ToString();
tempElement.Element = element;
this.elements.Add(tempElement);
i++;
}
}
// Count elements matching a locator
public int CountNoOfElements(By _by)
{
return this.driver.FindElements(_by).Count;
}
// Send keys to an input element
public void ElementSendKeys(string _key, string _elementName)
{
IWebElement input = this.elements.FirstOrDefault(x => x.ElementName == _elementName)?.Element;
input?.SendKeys(_key);
}
// Get all stored elements
public List<WebElementModel> GetElements()
{
return this.elements;
}
// Click on a web element
public void Click(string _elementName)
{
IWebElement button = this.elements.FirstOrDefault(x => x.ElementName == _elementName)?.Element;
button?.Click();
System.Threading.Thread.Sleep(5000); // Wait for page load
}
// Quit the driver
public void Quit()
{
this.driver.Quit();
}
}
public class WebElementModel
{
public string ElementName { get; set; }
public IWebElement Element { get; set; }
}
}
Explanation of the Code
- Driver Initialization: The ChromeDriver is initialized with options to run in headless mode (without a UI).
- Element Management: Methods are provided to add single/multiple elements, count elements, send keys, and click elements.
- Data Storage: Elements are stored in a list for easy access later.
Step 2: Using the WebCrawlingHelper
Create a Main Method
Now, let’s create a method to demonstrate how to use the WebCrawlingHelper. Add the following method to your program:
public static void WebCrawling_POC()
{
WebCrawlingHelper crawler = new WebCrawlingHelper();
// Navigate to Google Trends
crawler.GoToUrl("https://trends.google.com/trending?geo=US&hl=zh-HK&hours=24");
// Count the number of rows in the trends table
int noOfRows = crawler.CountNoOfElements(By.XPath("//*[@id='trend-table']/div[1]/table/tbody[2]/tr"));
// Store all related elements
for (int i = 0; i < noOfRows; i++)
{
string iplus1 = (i + 1).ToString();
string istring = i.ToString();
crawler.AddElements(By.XPath($"//*[@id='trend-table']/div[1]/table/tbody[2]/tr[{iplus1}]/td[2]/div[1]"), "topic_" + istring);
crawler.AddElements(By.XPath($"//*[@id='trend-table']/div[1]/table/tbody[2]/tr[{iplus1}]/td[3]/div/div[1]"), "rate_" + istring);
}
// Print all related elements
for (int j = 0; j < noOfRows; j++)
{
var topic = crawler.GetElements().FirstOrDefault(x => x.ElementName == $"topic_{j}");
var rate = crawler.GetElements().FirstOrDefault(x => x.ElementName == $"rate_{j}_0");
if (topic != null && rate != null)
{
Console.WriteLine($"Topic: {topic.Element.GetAttribute("innerText")}, Rate: {rate.Element.GetAttribute("innerText")}");
}
}
// Clean up
crawler.Quit();
}
Explanation of the Usage Code
- Navigate to URL: The GoToUrl method is called to navigate to Google Trends.
- Count Elements: The number of rows in the trends table is counted.
- Store Elements: Topics and rates are stored using the AddElements method.
- Print Data: The results are printed to the console.
- Quit Driver: Finally, the Quit method is called to close the browser.
Step 3: Running Your Code
To run your code:
- Ensure all necessary packages are installed.
- Call the WebCrawling_POC method from your Main method.
- Compile and run your application.
Result:
Conclusion
You have successfully created a web crawling utility using Selenium in C#. You can enhance this utility further by adding more features such as error handling, logging, and scraping additional data.
Feel free to experiment with different websites and expand the functionality as needed! Happy coding!
Remarks
You could also get elements by:
- By.Id(_id)
- By.TagName(_tagName)
- By.CssSelector(_css)
And you could use any browser inspect tools to find the XPath (Copy XPath) you want!
Demo:
Source link
lol