Imagine this: you’re running an HR firm tasked with staffing top-level positions. Now, picture spending several hours manually sorting through candidate data just to find a handful of potential rock stars.
Sounds exhausting, right?
That’s exactly what our client was facing before implementing Bitcot’s RPA-based LinkedIn Scraping Solutions.
Before our client came to us, their process of scraping LinkedIn and Loxo profiles, validating them, and calculating years of experience was nothing short of a time sink.
It took them up to 30 hours to process data for just 200 profiles. That’s 1 to 2 full days spent on data crunching alone!
We did more than just save them time – we changed their workflow.
We’ll explore in this article how automating these time-wasters not only saved them time but has completely transformed the way they hire people.
Bitcot’s RPA-Based LinkedIn Scrapping Solutions
Our client, an HR company, provides staffing for senior-level positions like managers, CEOs, and VPs. They were previously using a portal (loxo.co) to scrape candidate information. The portal allowed them to search and filter profiles, screen candidates, and export the data as a CSV file.
They would then format the exported data to include key parameters, such as total years of experience, current location, and the number of years it took candidates to reach managerial or director-level positions.
In addition, they used a formula to identify “rock star” candidates, those who reached a managerial position in a relatively short time, say in 10 years, compared to others who took 15-20 years. This helped them identify candidates with impressive career trajectories.
So the initial project for us involved automatically exporting the data from Loxo, formatting it in a specific way, calculating key parameters, and inserting them into the Excel sheet for easier analysis.
In our efforts to optimize data extraction and processing, we implemented a two-phase approach. The first phase focused on building custom automated flows to handle repetitive tasks, while the second phase introduced Scrapin.io to enhance our capabilities further.
First LinkedIn Scrapping Solution Involving Loxo
In the first solution, the focus was on building basic automated workflows using custom scripts, APIs, and web scraping techniques to handle repetitive tasks and streamline data extraction from LOXO and LinkedIn.
We developed a series of automated desktop flows designed to streamline the extraction of profile experiences and company details and the conversion of PDF documents into Excel. These flows make use of APIs and web scraping techniques to gather accurate and up-to-date information from LOXO and LinkedIn.
Our automated solution involved automating six distinct flows. Each flow was designed with a systematic process to ensure consistency, accuracy, and efficiency.
1. Exporting Profile Data From Loxo
The first flow involves downloading the candidate profile data from Loxo.com, allowing the client to skip the manual process of collecting and organizing the data.
2. Converting PDF Into Excel
- Upload PDF File: The system uploads the PDF file to a designated location. This step is crucial as it provides the input data that the workflow will process.
- Convert PDF File to Excel: The RPA bot converts the PDF into a structured Excel file.
- Save Excel File in Phase 3 Folder: Once the Excel file is generated, it is automatically saved in a predefined folder called the “Phase 3” folder. This helps organize output files and ensures accessibility for subsequent workflows.
- Save Excel Path in a Text File: The path to the saved Excel file is stored in a text file. This step enables easy reference or retrieval by other processes or team members without needing to search for the file manually.
3. Calculating and Preparing the Dashboard Sheet
The third flow calculates key metrics and prepares a dashboard for easy identification of relevant candidate information.
-
- Open Excel Template: The system opens a pre-designed Excel template to act as the starting point for storing developer data. This template ensures consistency in the input data format for subsequent processing.
- Open Developer Excel File: The workflow retrieves the Excel file containing developer information, generated in a previous flow. This file serves as the input dataset for the current process.
- Loop Through Each Developer: The RPA bot iterates through the list of developers in the Excel file, processing each profile one by one to ensure comprehensive data coverage.
- Get Profile Information from LOXO: For each developer, the bot uses the LOXO API and the provided profile URL to fetch profile details.
- Profile Existence Check
- If Profile Exists: The bot proceeds to retrieve additional details such as job history using the LOXO API.
- If Profile Does Not Exist: The profile URL is saved in a separate Excel sheet for follow-up or manual review.
- Job Data Validation
-
- If Jobs Are Found:
- Calculate Overall Experience: Determines the total number of years of professional experience.
- Calculate Years to Reach Specific Position: Evaluates how long it took for the developer to achieve their current role.
- Calculate Time in Current Role: Determines how many years the developer has been in their current position.
- If No Jobs Are Found: No calculations are performed, and the bot skips to the next developer in the list.
- If Jobs Are Found:
- Save Experiences and Basic Details in Dashboard Excel: The calculated experience metrics and profile details are stored in the Dashboard Excel file. This consolidated file is useful for analytics and decision-making.
- Save Excel File in Phase 3 Folder: The processed Excel file is automatically saved in the designated “Phase 3” folder. This ensures easy access and proper organization for subsequent workflows.
- Run LinkedIn Web-Scrape Flow: The workflow concludes by triggering the LinkedIn web-scraping process. This step supplements the data gathered from LOXO and provides additional validation for the developer profiles.
4. Checking and Updating University and Other Information
The fourth flow validates and updates university and other experience data to ensure the profiles were accurate.
Many candidates have military or university experience, and they wanted to exclude that from the total experience. For example, if someone worked in the military for four years and has 20 years of total experience, we need to reduce the experience by the number of years spent in the military.
These adjustments need to be clearly marked, so that when someone reviews the profile, they can easily see which years of experience were adjusted or excluded. This helps to ensure the accuracy of the final data.
5. Directly Extracting Data from LinkedIn
Loxo had some limitations, such as not having the most up-to-date LinkedIn profiles. Additionally, some profile information are missing from Loxo, providing only the candidate’s name and LinkedIn profile URLs.
In these cases, we have another flow where we open the LinkedIn profile link, scrape additional information like the career history, and calculate the necessary details before inserting them into the spreadsheet.
-
- Open Created Excel from Previous Flow: The Excel file generated from the previous workflow is opened. This file contains a list of profiles requiring further data enrichment.
- Filter Profiles: The workflow identifies profiles where information is missing in LOXO and prepares them for the scraping process.
- Open LinkedIn Web Page: The LinkedIn webpage is automatically opened to initiate the profile search process.
- Loop Through Each Profile: The system loops through each profile listed in the filtered Excel sheet to begin individual searches.
- Search for Profile: The workflow searches LinkedIn for each profile listed in the Excel sheet to retrieve relevant data.
- Check Profile Availability:
-
-
- If the profile exists: Proceed to check if the profile has any listed professional experience.
- If the profile does not exist: Skip to the next profile.
-
- Analyze Profile Data (if the profile has experience):
-
- Calculate Overall Experience of Profile: The total professional experience is computed based on the profile’s job history.
- Calculate Years to Reach a Specific Position: The time taken by the individual to achieve a specific role is calculated.
- Calculate Time Spent in Positions: The duration spent in each position is extracted for detailed insights.
- Save Experiences and Basic Details in Dashboard Excel: The gathered experience data and key details from each profile are saved in the Excel dashboard for reporting and analysis.
- Update Loxo ID Column: The Loxo ID column in the Excel sheet is updated with the status indicating that the profile was successfully scraped from LinkedIn.
- Save Excel Dashboard: The processed Excel file is saved in the designated “Phase 3” folder, ensuring it is organized and easily accessible for subsequent workflows.
- Trigger Extract Company Details Flow: The workflow concludes by triggering the “Extract Company Details” flow to gather additional company-related data associated with the scraped profiles.
6. Extract Company Details Flow
-
- Open Companies Excel: The automation opens the designated Excel file containing company names, sourced from previous workflows, ensuring all required data is centralized for processing.
- Open LinkedIn Web Page: LinkedIn is launched within the browser to serve as the primary source for extracting company-related details.
- Iterate Through Company List: The workflow loops through each company name in the Excel file, ensuring comprehensive processing for all listed entries.
- Search for Company: The company name is searched on LinkedIn to retrieve relevant information.
- Check Company Existence:
-
-
- If the company exists, the workflow continues to extract its details.
- If not, the workflow logs the missing company data for further review.
-
- Extract Company Details:
-
- Company Name
- Website URL
- Headquarters Location
- Company Size/Strength
The extracted data is saved in the corresponding columns of the same Excel file.
- Update Profile Dashboard: The extracted company details are used to update another Excel file referred to as the “Profile Dashboard” when the company name in the dashboard is matched with the current company being processed.
- Send Email with Profile Dashboard: The updated Profile Dashboard is attached to an email and sent to relevant stakeholders.
Second LinkedIn Scrapping Solution Involving Scrapin.io
In addition to Loxo.com, the team discovered Scraping.io, a platform that allows us to easily extract LinkedIn profile data. It functions similarly to Loxo, but Loxo has outdated data and does not provide APIs for all the necessary details.
Scraping.io, on the other hand, is much more reliable and provides better functionality for extracting the necessary details. Given its improved capabilities, the team decided to create a new flow specifically for Scraping.io.
This flow was designed to extract LinkedIn profile data from Scraping.io, offering a more reliable and current data source compared to Loxo.
Both Loxo.com and Scraping.io were utilized simultaneously, with their respective flows running in parallel. The Scraping.io flow was completely separate from the Loxo-based flow but was integrated into the same system.
Ultimately, the client chose to use Loxo.com as their primary platform because they had other processes integrated into Loxo that they didn’t want to change. However, they incorporated Scraping.io into their workflow to validate and verify the profiles retrieved from Loxo, ensuring data accuracy and minimizing errors.
Scrapin.io provided the client with robust web scraping capabilities, enabling them to efficiently extract detailed and accurate data from various online sources, further automating and optimizing our processes.
Incorporating Scrapin.io into our process involved the following steps:
Data Scraping Setup
The first step in the integration was configuring Scrapin.io to focus on specific web pages and data fields critical to our analysis.
This involved setting up the tool to target LinkedIn profiles and extract key details such as names, job titles, and other professional information.
Custom scripts were developed within Scrapin.io to handle dynamic website elements like asynchronous loading and ensure the extracted data was accurate and relevant.
Data Processing and Storage
The data extracted using Scrapin.io underwent processing and formatting to align with the structures of our current datasets.
After transformation, this data was automatically updated in the Profile Dashboard and other essential Excel sheets, ensuring all systems had access to consistent, comprehensive, and up-to-date information.
Overcoming Challenges with Loxo and Scrapin.io
Implementing both solutions involved tackling multiple challenges. Below is a detailed breakdown of the issues encountered and the solutions implemented.
Loxo Data Export Issues
One of the primary challenges arose from Loxo’s limited support for exporting data directly into Excel.
When LinkedIn URLs exceeded the expected length, they were split across multiple cells during export. This fragmentation caused issues for the RPA bot, which struggled to process and correlate data split across multiple rows or columns.
To address this, we developed a separate flow that exports the data as a PDF, then converts it into an Excel file.
Scraping LinkedIn Profile Data
Scraping LinkedIn profiles for candidate information posed another significant challenge. LinkedIn’s advanced login and activity monitoring mechanisms detected automated scripts, often leading to account blocks or temporary restrictions. This made continuous data scraping impractical.
We refined our automation process to emulate human behavior. Delays were programmed into the RPA flow to introduce natural pauses between actions such as scrolling, clicking on sections, and interacting with elements like profile links or contact buttons.
By incorporating these delays and adjusting the pace of actions, the automation appeared less robotic, effectively bypassing LinkedIn’s detection systems.
Complexity in Experience Calculation
Calculating a candidate’s total professional experience and role-specific timelines was tedious, especially when excluding irrelevant periods like education or military service.
To accurately compute career timelines without manual intervention, we incorporated advanced calculation logic into the automation process. This logic filtered out non-professional periods and computed overall and role-specific experiences dynamically. The results were then mapped to structured fields within the Excel output for easy review.
Dynamic Web Scraping
Extracting data fields like company size, headquarters, and website links required navigating dynamically changing web pages. These pages often had varying structures depending on the company or candidate profile.
To scrape variable data fields from dynamic and complex web layouts, we developed an intelligent scraping mechanism capable of adapting to web page layouts. The scraper used element recognition to identify the required fields dynamically, minimizing the chances of failure due to unexpected changes in page structure.
Handling Dynamic Content
One of the primary challenges in the second solution was extracting data from websites with dynamic content that loaded asynchronously or changed based on user interactions.
To overcome this, we customized Scrapin.io’s scraping scripts to handle such complexities effectively. These adjustments ensured the tool captured accurate and up-to-date data, even from pages with dynamic loading elements.
Data Integration and Consistency
Integrating the data from Scrapin.io with other sources while avoiding discrepancies posed another challenge.
We addressed this by implementing robust data validation and transformation steps. This process standardized the data extracted from Scrapin.io, allowing seamless integration with our existing datasets and maintaining consistency across all systems.
The Client’s Major Impact and Success With RPA
The customer previously needed to do the entire process manually before applying the RPA solution and it was very slow. If, for example, they had to verify and average out years of experience for 200 profile candidates, it would take them 24–30 hours (1–2 whole days) to accomplish this. This involved:
- Scraping data for candidate profiles,
- Validating the profiles,
- Calculating the number of years it took for candidates to reach specific positions, and
- Building a complete dashboard sheet with all the relevant information.
By integrating RPA, the process has been significantly accelerated. Now, the same task that once took over a day can be completed in just 2 hours. This speed improvement is crucial for their ability to:
- Screen profiles quickly: The automated scraping and validation processes speed up profile analysis, reducing the time spent on each candidate.
- Scrape and validate data more efficiently: Automated processes ensure data is extracted and validated much faster, improving the accuracy and consistency of the information.
- Generate actionable insights in a fraction of the time: With faster data processing, the client can quickly extract key insights, such as identifying candidates with the most impressive career trajectories.
Beyond speeding up processes, the RPA solution brought several additional benefits to the client:
Time Savings and Efficiency
The automation of profile data export, formatting, and key parameter calculations saved the client considerable time that was previously spent on manual tasks. What once took hours, now takes minutes. This allowed the client to focus more on high-level decision-making rather than tedious data manipulation.
Improved Data Accuracy
The solution eliminated the errors that often occurred in the manual formatting process. By automatically exporting and structuring the data, we ensured the accuracy of key candidate parameters, such as years of experience, career trajectory, and location. This improved the quality of insights the client gained from their data.
Faster Decision-Making
With critical candidate data being automatically formatted and organized, the client was able to quickly identify top-tier candidates, such as those who reached managerial positions in shorter timeframes. This accelerated their ability to make informed hiring decisions, particularly for senior-level roles.
Scalability and Future Growth
The two-phase approach – starting with automated flows and later introducing advanced tools – has given our client a scalable solution. The system can easily handle larger datasets as their business grows, ensuring they are well-equipped for future demands.
Enhanced Competitive Edge
The automation of LinkedIn and Loxo scraping, combined with the ability to directly extract and process up-to-date candidate information, has given our client a competitive advantage in staffing for executive-level positions. They are now able to identify and secure top talent faster than their competitors.
Final Thoughts
Looking back at what our client has achieved with RPA, it’s clear that the benefits go far beyond just time savings. It’s a complete transformation of how they operate at every level.
Tasks that used to take days are now accomplished in moments, unlocking new possibilities for scaling, decision-making, and delivering more value to their clients.
They’ve gained the ability to assess profiles with precision, speed up applicant processing, and even identify top candidates more accurately. They’re no longer simply managing data – they’re leveraging it to drive better outcomes.
The question is, how long are you going to remain stuck with outdated, manual processes when you can get things done faster, smarter, and more effectively?
Manual effort isn’t the actual bottleneck – inefficiency is. If you’re looking to redefine your workflows and make sharper decisions, we can help.
Reach out and explore Bitcot’s RPA solutions. Let’s make your HR processes faster, enhance your candidate screening, and give you the competitive edge you need to lead. Your next big step starts here.