Tutorials
Step-by-step guides for common web parsing use cases.
Getting Started
Your First Extraction
What You'll Learn
- Setting up the Uneralon SDK
- Making your first extraction request
- Understanding selectors
- Handling responses
Duration: 15 minutes
import uneralon
client = uneralon.Client(api_key="YOUR_API_KEY")
# Extract product data
result = client.extract(
url="https://example.com/product/123",
selectors={
"name": "h1.product-title",
"price": ".price-amount",
"description": ".product-description",
"image": "img.main-image::attr(src)"
}
)
print(f"Product: {result.data['name']}")
print(f"Price: {result.data['price']}")
E-commerce Tutorials
Price Monitoring System
What You'll Build
- Track competitor prices across multiple sites
- Store price history in a database
- Send alerts when prices change
- Generate price comparison reports
Technologies
- Python + Uneralon SDK
- PostgreSQL for storage
- Slack for notifications
import uneralon
from datetime import datetime
client = uneralon.Client(api_key="YOUR_API_KEY")
competitors = [
{"name": "Store A", "url": "https://store-a.com/product/123"},
{"name": "Store B", "url": "https://store-b.com/item/456"},
]
for competitor in competitors:
result = client.extract(
url=competitor["url"],
selectors={
"price": ".price",
"stock": ".availability"
},
render_js=True # Enable for dynamic sites
)
print(f"{competitor['name']}: ${result.data['price']}")
Product Catalog Extraction
What You'll Build
- Extract entire product catalogs
- Handle pagination automatically
- Normalize data across sources
- Export to CSV/JSON
def extract_catalog(base_url, page_selector, product_selectors):
products = []
page = 1
while True:
result = client.extract(
url=f"{base_url}?page={page}",
selectors=product_selectors,
render_js=True
)
if not result.data["products"]:
break
products.extend(result.data["products"])
page += 1
return products
Lead Generation Tutorials
Company Data Extraction
What You'll Build
- Extract company information from directories
- Discover contact emails and phones
- Build prospect lists
- Enrich existing data
# Extract company data from a directory listing
result = client.extract(
url="https://directory.example.com/companies",
selectors={
"companies": {
"selector": ".company-card",
"type": "list",
"fields": {
"name": ".company-name",
"website": "a.website::attr(href)",
"email": "a[href^='mailto:']::attr(href)",
"phone": ".phone-number",
"address": ".address",
"industry": ".industry-tag"
}
}
}
)
for company in result.data["companies"]:
print(f"{company['name']}: {company['email']}")
LinkedIn Company Scraping
What You'll Build
- Extract public company profiles
- Get employee counts and growth
- Monitor job postings
- Track company updates
Note: Always respect LinkedIn's Terms of Service and rate limits.
Competitive Intelligence
Review Aggregation
What You'll Build
- Collect reviews from multiple platforms
- Analyze sentiment patterns
- Track rating changes over time
- Compare against competitors
review_sources = [
{"platform": "Google", "url": "https://..."},
{"platform": "Yelp", "url": "https://..."},
{"platform": "Trustpilot", "url": "https://..."},
]
all_reviews = []
for source in review_sources:
result = client.extract(
url=source["url"],
selectors={
"reviews": {
"selector": ".review",
"type": "list",
"fields": {
"rating": ".rating::attr(data-rating)",
"text": ".review-text",
"date": ".review-date",
"author": ".reviewer-name"
}
}
}
)
for review in result.data["reviews"]:
review["platform"] = source["platform"]
all_reviews.append(review)
Market Research
What You'll Build
- Track competitor product launches
- Monitor pricing strategies
- Analyze feature comparisons
- Generate market reports
AI Training Data
Text Corpus Collection
What You'll Build
- Collect large-scale text data
- Clean and normalize content
- Handle multiple languages
- Export in ML-ready formats
# Batch extract articles for training data
urls = ["https://blog.example.com/post/1", "https://blog.example.com/post/2", ...]
results = client.batch_extract(
urls=urls,
selectors={
"title": "h1",
"content": "article.content",
"category": ".category-tag",
"date": "time::attr(datetime)"
},
concurrency=50
)
# Export as JSONL for ML pipelines
import json
with open("training_data.jsonl", "w") as f:
for result in results:
f.write(json.dumps(result.data) + "\n")
Image Dataset Building
What You'll Build
- Extract images with metadata
- Handle pagination and lazy loading
- Download and organize images
- Generate annotation files
Advanced Techniques
Handling Anti-Bot Protection
# Sites with strong protection
result = client.extract(
url="https://protected-site.com",
selectors=selectors,
proxy_type="residential", # Use residential proxies
country="US", # Geo-target requests
render_js=True, # Full browser rendering
wait_for=".content-loaded" # Wait for element
)
Dynamic Content Extraction
# Single-page applications (SPAs)
result = client.extract(
url="https://spa-site.com/products",
selectors=selectors,
render_js=True,
wait_for=".product-list", # Wait for content
scroll_to_bottom=True, # Load lazy content
execute_js="window.loadMore()" # Custom JS execution
)
Webhook Integration
# Async extraction with webhooks
client.extract_async(
url="https://example.com",
selectors=selectors,
webhook_url="https://your-server.com/webhook",
webhook_headers={"Authorization": "Bearer your-token"}
)
# Your webhook receives the result when ready
Best Practices
Rate Limiting
- Respect website rate limits
- Use appropriate delays between requests
- Implement exponential backoff
Error Handling
- Always handle extraction failures
- Implement retry logic
- Log errors for debugging
Data Validation
- Validate extracted data
- Handle missing fields gracefully
- Clean and normalize data
Compliance
- Respect robots.txt
- Follow website terms of service
- Handle personal data responsibly
Need help with a specific use case? Contact support or join our Discord.