Skip to main content

Tutorials

Step-by-step guides for common web parsing use cases.

Getting Started

Your First Extraction

What You'll Learn

  • Setting up the Uneralon SDK
  • Making your first extraction request
  • Understanding selectors
  • Handling responses

Duration: 15 minutes

import uneralon

client = uneralon.Client(api_key="YOUR_API_KEY")

# Extract product data
result = client.extract(
url="https://example.com/product/123",
selectors={
"name": "h1.product-title",
"price": ".price-amount",
"description": ".product-description",
"image": "img.main-image::attr(src)"
}
)

print(f"Product: {result.data['name']}")
print(f"Price: {result.data['price']}")

E-commerce Tutorials

Price Monitoring System

What You'll Build

  • Track competitor prices across multiple sites
  • Store price history in a database
  • Send alerts when prices change
  • Generate price comparison reports

Technologies

  • Python + Uneralon SDK
  • PostgreSQL for storage
  • Slack for notifications
import uneralon
from datetime import datetime

client = uneralon.Client(api_key="YOUR_API_KEY")

competitors = [
{"name": "Store A", "url": "https://store-a.com/product/123"},
{"name": "Store B", "url": "https://store-b.com/item/456"},
]

for competitor in competitors:
result = client.extract(
url=competitor["url"],
selectors={
"price": ".price",
"stock": ".availability"
},
render_js=True # Enable for dynamic sites
)

print(f"{competitor['name']}: ${result.data['price']}")

Product Catalog Extraction

What You'll Build

  • Extract entire product catalogs
  • Handle pagination automatically
  • Normalize data across sources
  • Export to CSV/JSON
def extract_catalog(base_url, page_selector, product_selectors):
products = []
page = 1

while True:
result = client.extract(
url=f"{base_url}?page={page}",
selectors=product_selectors,
render_js=True
)

if not result.data["products"]:
break

products.extend(result.data["products"])
page += 1

return products

Lead Generation Tutorials

Company Data Extraction

What You'll Build

  • Extract company information from directories
  • Discover contact emails and phones
  • Build prospect lists
  • Enrich existing data
# Extract company data from a directory listing
result = client.extract(
url="https://directory.example.com/companies",
selectors={
"companies": {
"selector": ".company-card",
"type": "list",
"fields": {
"name": ".company-name",
"website": "a.website::attr(href)",
"email": "a[href^='mailto:']::attr(href)",
"phone": ".phone-number",
"address": ".address",
"industry": ".industry-tag"
}
}
}
)

for company in result.data["companies"]:
print(f"{company['name']}: {company['email']}")

LinkedIn Company Scraping

What You'll Build

  • Extract public company profiles
  • Get employee counts and growth
  • Monitor job postings
  • Track company updates

Note: Always respect LinkedIn's Terms of Service and rate limits.

Competitive Intelligence

Review Aggregation

What You'll Build

  • Collect reviews from multiple platforms
  • Analyze sentiment patterns
  • Track rating changes over time
  • Compare against competitors
review_sources = [
{"platform": "Google", "url": "https://..."},
{"platform": "Yelp", "url": "https://..."},
{"platform": "Trustpilot", "url": "https://..."},
]

all_reviews = []

for source in review_sources:
result = client.extract(
url=source["url"],
selectors={
"reviews": {
"selector": ".review",
"type": "list",
"fields": {
"rating": ".rating::attr(data-rating)",
"text": ".review-text",
"date": ".review-date",
"author": ".reviewer-name"
}
}
}
)

for review in result.data["reviews"]:
review["platform"] = source["platform"]
all_reviews.append(review)

Market Research

What You'll Build

  • Track competitor product launches
  • Monitor pricing strategies
  • Analyze feature comparisons
  • Generate market reports

AI Training Data

Text Corpus Collection

What You'll Build

  • Collect large-scale text data
  • Clean and normalize content
  • Handle multiple languages
  • Export in ML-ready formats
# Batch extract articles for training data
urls = ["https://blog.example.com/post/1", "https://blog.example.com/post/2", ...]

results = client.batch_extract(
urls=urls,
selectors={
"title": "h1",
"content": "article.content",
"category": ".category-tag",
"date": "time::attr(datetime)"
},
concurrency=50
)

# Export as JSONL for ML pipelines
import json
with open("training_data.jsonl", "w") as f:
for result in results:
f.write(json.dumps(result.data) + "\n")

Image Dataset Building

What You'll Build

  • Extract images with metadata
  • Handle pagination and lazy loading
  • Download and organize images
  • Generate annotation files

Advanced Techniques

Handling Anti-Bot Protection

# Sites with strong protection
result = client.extract(
url="https://protected-site.com",
selectors=selectors,
proxy_type="residential", # Use residential proxies
country="US", # Geo-target requests
render_js=True, # Full browser rendering
wait_for=".content-loaded" # Wait for element
)

Dynamic Content Extraction

# Single-page applications (SPAs)
result = client.extract(
url="https://spa-site.com/products",
selectors=selectors,
render_js=True,
wait_for=".product-list", # Wait for content
scroll_to_bottom=True, # Load lazy content
execute_js="window.loadMore()" # Custom JS execution
)

Webhook Integration

# Async extraction with webhooks
client.extract_async(
url="https://example.com",
selectors=selectors,
webhook_url="https://your-server.com/webhook",
webhook_headers={"Authorization": "Bearer your-token"}
)

# Your webhook receives the result when ready

Best Practices

Rate Limiting

  • Respect website rate limits
  • Use appropriate delays between requests
  • Implement exponential backoff

Error Handling

  • Always handle extraction failures
  • Implement retry logic
  • Log errors for debugging

Data Validation

  • Validate extracted data
  • Handle missing fields gracefully
  • Clean and normalize data

Compliance

  • Respect robots.txt
  • Follow website terms of service
  • Handle personal data responsibly

Need help with a specific use case? Contact support or join our Discord.