Tutorials

Step-by-step guides for common web parsing use cases.

Getting Started

Your First Extraction

What You'll Learn

Setting up the Uneralon SDK
Making your first extraction request
Understanding selectors
Handling responses

Duration: 15 minutes

import uneralon

client = uneralon.Client(api_key="YOUR_API_KEY")

# Extract product data
result = client.extract(
    url="https://example.com/product/123",
    selectors={
        "name": "h1.product-title",
        "price": ".price-amount",
        "description": ".product-description",
        "image": "img.main-image::attr(src)"
    }
)

print(f"Product: {result.data['name']}")
print(f"Price: {result.data['price']}")

E-commerce Tutorials

Price Monitoring System

What You'll Build

Track competitor prices across multiple sites
Store price history in a database
Send alerts when prices change
Generate price comparison reports

Technologies

Python + Uneralon SDK
PostgreSQL for storage
Slack for notifications

import uneralon
from datetime import datetime

client = uneralon.Client(api_key="YOUR_API_KEY")

competitors = [
    {"name": "Store A", "url": "https://store-a.com/product/123"},
    {"name": "Store B", "url": "https://store-b.com/item/456"},
]

for competitor in competitors:
    result = client.extract(
        url=competitor["url"],
        selectors={
            "price": ".price",
            "stock": ".availability"
        },
        render_js=True  # Enable for dynamic sites
    )

    print(f"{competitor['name']}: ${result.data['price']}")

Product Catalog Extraction

What You'll Build

Extract entire product catalogs
Handle pagination automatically
Normalize data across sources
Export to CSV/JSON

def extract_catalog(base_url, page_selector, product_selectors):
    products = []
    page = 1

    while True:
        result = client.extract(
            url=f"{base_url}?page={page}",
            selectors=product_selectors,
            render_js=True
        )

        if not result.data["products"]:
            break

        products.extend(result.data["products"])
        page += 1

    return products

Lead Generation Tutorials

Company Data Extraction

What You'll Build

Extract company information from directories
Discover contact emails and phones
Build prospect lists
Enrich existing data

# Extract company data from a directory listing
result = client.extract(
    url="https://directory.example.com/companies",
    selectors={
        "companies": {
            "selector": ".company-card",
            "type": "list",
            "fields": {
                "name": ".company-name",
                "website": "a.website::attr(href)",
                "email": "a[href^='mailto:']::attr(href)",
                "phone": ".phone-number",
                "address": ".address",
                "industry": ".industry-tag"
            }
        }
    }
)

for company in result.data["companies"]:
    print(f"{company['name']}: {company['email']}")

LinkedIn Company Scraping

What You'll Build

Extract public company profiles
Get employee counts and growth
Monitor job postings
Track company updates

Note: Always respect LinkedIn's Terms of Service and rate limits.

Competitive Intelligence

Review Aggregation

What You'll Build

Collect reviews from multiple platforms
Analyze sentiment patterns
Track rating changes over time
Compare against competitors

review_sources = [
    {"platform": "Google", "url": "https://..."},
    {"platform": "Yelp", "url": "https://..."},
    {"platform": "Trustpilot", "url": "https://..."},
]

all_reviews = []

for source in review_sources:
    result = client.extract(
        url=source["url"],
        selectors={
            "reviews": {
                "selector": ".review",
                "type": "list",
                "fields": {
                    "rating": ".rating::attr(data-rating)",
                    "text": ".review-text",
                    "date": ".review-date",
                    "author": ".reviewer-name"
                }
            }
        }
    )

    for review in result.data["reviews"]:
        review["platform"] = source["platform"]
        all_reviews.append(review)

Market Research

What You'll Build

Track competitor product launches
Monitor pricing strategies
Analyze feature comparisons
Generate market reports

AI Training Data

Text Corpus Collection

What You'll Build

Collect large-scale text data
Clean and normalize content
Handle multiple languages
Export in ML-ready formats

# Batch extract articles for training data
urls = ["https://blog.example.com/post/1", "https://blog.example.com/post/2", ...]

results = client.batch_extract(
    urls=urls,
    selectors={
        "title": "h1",
        "content": "article.content",
        "category": ".category-tag",
        "date": "time::attr(datetime)"
    },
    concurrency=50
)

# Export as JSONL for ML pipelines
import json
with open("training_data.jsonl", "w") as f:
    for result in results:
        f.write(json.dumps(result.data) + "\n")

Image Dataset Building

What You'll Build

Extract images with metadata
Handle pagination and lazy loading
Download and organize images
Generate annotation files

Advanced Techniques

Handling Anti-Bot Protection

# Sites with strong protection
result = client.extract(
    url="https://protected-site.com",
    selectors=selectors,
    proxy_type="residential",  # Use residential proxies
    country="US",              # Geo-target requests
    render_js=True,            # Full browser rendering
    wait_for=".content-loaded" # Wait for element
)

Dynamic Content Extraction

# Single-page applications (SPAs)
result = client.extract(
    url="https://spa-site.com/products",
    selectors=selectors,
    render_js=True,
    wait_for=".product-list",      # Wait for content
    scroll_to_bottom=True,         # Load lazy content
    execute_js="window.loadMore()" # Custom JS execution
)

Webhook Integration

# Async extraction with webhooks
client.extract_async(
    url="https://example.com",
    selectors=selectors,
    webhook_url="https://your-server.com/webhook",
    webhook_headers={"Authorization": "Bearer your-token"}
)

# Your webhook receives the result when ready

Best Practices

Rate Limiting

Respect website rate limits
Use appropriate delays between requests
Implement exponential backoff

Error Handling

Always handle extraction failures
Implement retry logic
Log errors for debugging

Data Validation

Validate extracted data
Handle missing fields gracefully
Clean and normalize data

Compliance

Respect robots.txt
Follow website terms of service
Handle personal data responsibly

Need help with a specific use case? Contact support or join our Discord.

Getting Started​

Your First Extraction​

E-commerce Tutorials​

Price Monitoring System​

Product Catalog Extraction​

Lead Generation Tutorials​

Company Data Extraction​

LinkedIn Company Scraping​

Competitive Intelligence​

Review Aggregation​

Market Research​

AI Training Data​

Text Corpus Collection​

Image Dataset Building​

Advanced Techniques​

Handling Anti-Bot Protection​

Dynamic Content Extraction​

Webhook Integration​

Best Practices​

Rate Limiting​

Error Handling​

Data Validation​

Compliance​

Getting Started

Your First Extraction

E-commerce Tutorials

Price Monitoring System

Product Catalog Extraction

Lead Generation Tutorials

Company Data Extraction

LinkedIn Company Scraping

Competitive Intelligence

Review Aggregation

Market Research

AI Training Data

Text Corpus Collection

Image Dataset Building

Advanced Techniques

Handling Anti-Bot Protection

Dynamic Content Extraction

Webhook Integration

Best Practices

Rate Limiting

Error Handling

Data Validation

Compliance