WP Content Crawler Get content from almost any site automatically!
Get content from almost any site to your WordPress blog, automatically!
FOR WHAT IT CAN BE USED
- Create a personal site which collects news, posts, etc. from your favorite sites to see them in one place
- Use it with WooCommerce to collect products from shopping sites
- Keep track of competitors
- You can imagine anything. The internet is full of contents
HOW IT WORKS
It’s all about CSS selectors and you can learn how to use them in minutes by watching the introduction tutorial. The plugin’s Visual Inspector tool also helps you find CSS selectors easily by clicking onto the elements in the target sites. Here is the gist of it:
WHAT WP CONTENT CRAWLER CAN DO
Here is the list of some features of WP Content Crawler. To learn about all of the features, please see the features table below.
SEE IT IN ACTION, LEARN IN MINUTES
|WP Content Crawler introduction video (English)|
|WP Content Crawler introduction video (Turkish)|
|Quick Start Guide|
|Using CSS Selectors in WP Content Crawler|
|HTML & CSS Selectors|
|Using short codes to place any data anywhere in the post|
|Save images as WooCommerce product gallery|
|Save arcade games|
|Save every post detail
Title, excerpt, content, tags, categories, slug, date, custom meta, taxonomies, meta keywords, meta description, featured image, post images, status… Just everything.
Just click to an element to find its CSS selector. There is no need to leave your admin panel anymore.
|Crawl (scrape, grab, save) posts
After the settings are configured, the plugin finds URLs of the posts and crawls them automatically in the background.
|Recrawl (update) posts
Recrawl posts automatically to keep them updated all the time. You can limit how many times a post can be updated, set update interval, and ignore old posts.
You want to delete old crawled posts? The plugin can delete them automatically.
You can set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minutes.
The target category does not exist in your site? No problem. The plugin can create the target categories for you. Just define the CSS selectors that find category names. They can even be created as subcategories.
|Save slugs (permalink)
You can define the permalink of the posts. You can get the permalink from the target site, enter custom text, and even create templates for the slugs by using short codes.
|Custom post meta
Save anything as custom post meta. You can use a CSS selector or just type the value.
Prepare post content, title, excerpt, list item and gallery item templates using short codes.
No worries. You can save paginated posts as well.
|List type posts
Some sites create posts with a list inside. You can extract the list from the post, create a template that should be applied to each list item and even reverse the list.
|Remove unnecessary elements
Just write its CSS selector and it is removed.
|Automatically insert category URLs
Target site has hundreds of categories? Piece of cake. Just write the CSS selector and the plugin will insert them for you.
Set post type. It can be a post, a page, a product, or any other post type available in your WordPress installation.
You can remove links from the post. Just check the checkbox and the links are gone. That easy.
You can set a password for the posts to show them only to the users who have the password.
You can add notes for yourself to remind you things about the site. CSS selectors, TODO list, anything.
|Test everything on the fly
Test post crawling, URL collection, CSS selectors, regular expressions, find and replace options and proxies on the fly.
Using the tools, you can save posts manually with their URL, recrawl posts with their ID or delete already-saved URLs.
|Custom general settings for each site
You can provide custom general settings for each post to override them and make them suitable for a site.
You can directly publish the saved posts or keep them as draft to check them before publishing.
|Save all images in post content
Saving all images in the content of the post is as easy as checking a single checkbox.
|Save images as gallery
You can save the images in the target page as gallery and provide a template for each image to make it suitable for the gallery library that you use on frontend. You can also save the images as WooCommerce gallery by just checking one checkbox.
|Any data as short code
Get anything from target page as a short code and use the short codes in the plugin’s templates to place any data anywhere you want.
Use a proxy or proxies to get content from the sites to which your IP does not have access.
|Crawl as many posts as you want
You can set how many times post crawling or URL collection CRON events should run. Just be careful and consider your server’s capacity.
Set CSS selectors whose values should not be empty for category and post pages.
|Advanced HTML manipulations
Find-replace in response HTML, find and replace in element attributes, exchange element attributes, remove element attributes, manipulate HTML of an element, remove HTML elements…
Use the artificial intelligence of Google Cloud Translation API, Microsoft Translator Text API, Yandex Translate API or Amazon Translate API to automatically translate the posts. You can see their pricing pages to learn more.
Use spinning to automatically rewrite crawled posts’ contents to improve search engine optimization. The plugin currently implements Spin Rewriter API and Turkce Spin API, which are paid services. You can visit their website to learn the pricing details.
|Duplicate post check
Check duplicate posts by URL, post title and/or post content. If you are using WooCommerce, products whose SKU already exists are considered as duplicate and they will not be added to your site.
You can add/remove minutes to/from the post date. By this way, you can schedule post publishing.
|Save WooCommerce products
Save price, inventory, shipping, attributes, and advanced options. You can save the product as a simple or an external product.
You have the control! Define many options for the values found by a CSS selector. The options include find-replace, calculation, template, and JSON parsing settings. You can easily import/export the options defined in the options boxes as well.
|Handle files like a pro
Rename, copy, and move saved files easily.
|Handle iframes and scripts like a pro
You can turn iframe and script HTML elements into short codes by just checking a checkbox. The short code will show iframes and scripts from the allowed source domains defined by you.
With quick save button, you can save the settings much more quickly. No need to wait for page to reload.
Define regular expressions in find-replace options to find-replace anything.
|Handle character encoding problems
You can convert the encoding by checking a single checkbox.
|Navigate between settings easily
Fix navigation to the top! No more getting lost among the settings.
|Manual crawling tool
With manual crawling tool, save multiple posts by entering their URLs.
|Add URLs to the database
The plugin collects URLs automatically. By this way, the specified URLs will be crawled using your scheduling options, automatically.
|Enable/disable automatic crawling for a specific site
You can enable or disable automatic crawling for each site individually.
You can import and export site settings easily. Just copy and paste the code created by the plugin.
Add unlimited sites and activate how many of them you want.
See what’s going on in the background.
|Get updates from your admin panel
Just go to your updates page in your admin panel.
|Use the most secure PHP
The plugin supports the latest versions of PHP.
|Use the most modern browsers
The plugin supports Chrome, Firefox, Safari, Opera, and Edge.
Interactive guides show you how to configure settings to achieve certain things, step-by-step, like a living documentation. You can even start them from a specific step.
|Quick guides right next to the settings
Each setting in the plugin has a quick guide that will help you understand what each setting does.
Watch video tutorials to easily learn how to use the plugin.
|Ready to translate
You can translate the plugin into your own language using Poedit.
|Requirements||PHP >= 7.2, json, mbstring, curl, dom, WP-Cron. These are already available in most hosts. See the documentation for more information.|
|Tested with WP versions||5.3, 5.2, 5.1, 5.0, 4.9|
|Tested with WooCommerce versions||3.9, 3.8, 3.7, 3.6, 3.5|
|Shortcomings||For more information, please see Can I get content from X site?.|
WHY WP CONTENT CRAWLER
Problems with crawling a website
- Not an easy task, requires advanced programming skills
- Every website is different and needs tailored crawling implementation
- Pages and their source codes need to be investigated intensively to come up with a crawling plan
- Knowing how to save certain information in a specific place in WordPress requires knowledge about the internal structure of WordPress and how WordPress works
- If certain information should be saved into a specific field defined by a third-party plugin, one should modify the crawling implementation after researching for hours about how to save that information
- One should know about how HTML works and how to extract certain parts from HTML code
- One should handle all possible inconsistencies that might be in the source codes of websites to provide a robust solution that will keep working
Our vision and mission
We believe that robust, reliable, and automated crawling capabilities should be available for anyone. We want to democratize this field by letting anyone have these capabilities, not just developers. To let it accessible by anyone, we make the plugin low-cost and easy-to-use. How we solve these problems
We have been developing WP Content Crawler for almost 4 years such that we have come across almost all the what-ifs. Working with our customers and listening to their needs, we provide robust and reliable solutions to these problems.
Sometimes you might not feel like reading the documentation. You can start the interactive guides showing you step-by-step how you can do certain things any time and from any step you want.
One of the most distinctive features of WP Content Crawler is the ability to test almost any configuration. By this way, you will not come across any surprises after you enable automatic crawling.
We make them as flexible as possible to make them fit your needs.