BETA

Configuration

Log format

We support any log format. By default, it's configured to parse default apache log, which is the same format used by a lot of web server (like nginx). However the format is customizable (see below).

We accept compressed logs (gzip, bzip2, rar, zip) and uncompressed logs.

You can upload the whole log or just the bot traffic. We recommend the whole log, otherwise, you will be missing a lot of useful metrics : page speed, inactive pages, traffic, search referral...

Before uploading your first log or after any configuration change, you should use the logtester to check if your log format is correctly parsed.

Custom log format

Log format can be customized on the website configuration page. Specify your web server type and a log format directive.

Apache : Specify the LogFormat directive of your apache configuration file. Don't forget to unescape the quotes. \" should become ".

Below, an example of valid apache log format :

%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"

IIS custom log format: IIS must be configured to use the W3C extended log file format. The log format is specified by the #Fields directive you can find on each log file header.

Below, an example of valid IIS log format :

date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent) cs(Referrer)

Nginx and other web servers: You should be able to represent other web server log format using the Apache log directive.

Required fields

The following fields must be provided :

  • IP : used to detected bot and cloaker, you can disable this check on configuration page. Can be partially obfuscated (1.2.3.x).
  • User-Agent
  • HTTP Method
  • HTTP Status
  • Request date
  • Requested URL
Page speed

If you want to enable the page speed feature of the log analyzer, you must enable the logging of request execution time on your server.

  • Apache : Use %D variable in the LogFormat directive of your server.
  • NGINX : First, you must specify the $request_time variables in your log_format directive on your servers. Then, you can translate this $request_time in Apache Log Format using the %d variable in your custom log format configuration on the log analyzer.
  • IIS : Add the time-taken variable in your log format directive of your server and on the log analyzer.

Logs submission

The log submission is done in two stages : upload and parsing.

Log upload

You should upload your logs using either the form uploader of the website or the API (best for recurring uploads).

Your uploaded logs don't have to be sorted. You can upload them in random orders so it won't be a problem if you have multiple servers and a log rotate process.

However, there are some restrictions on the date of the logs, you can't upload a log :

  • Too far in the past (today - 30 days by default)
  • In the future (today + 3 days)
  • Before your currently parsed date. More explanation about dates restrictions in the parsing section below.

Also, now you can't upload multiple logs in parallel, we may remove this limitation later.

If you uploaded a wrong log file by error, you can delete it on the uploads page. Only log which have not been submitted to the parser can be deleted.

Log parsing

The role of the parsing job is to split and to sort your logs by day, it is done by our service so you don't have to bother about that.

Once you uploaded your logs, you can trigger the parsing via the website by clicking on the "Parse uploaded logs" button of the upload page or using the API.

Once the parsing is done, each day will be analyzed, except the last day. (That's why you see nothing when you only upload one day of log).

This last day will be your new currently parsed day : You can only upload logs newer or equal than this date. Logs before this date will be ignored.

The currently parsed day is always specified on the uploads page.

You can reset your website if you want to re-upload your logs from the beginning.

After the parsing step, your logs will be deleted from the server (releasing your storable quota) and you will shortly see the result of the analyze on your dashboard.

However, if you don't trigger the parsing, logs will just be stored on the server and nothing will happen. You must trigger log parsing after each logs upload session. Also, you can't store an infinite amount of logs on our server without triggering a parsing : website & plans are subject to rate limit.

Errors

Sometimes, the log parser may encounter errors while parsing your logs. You can see these errors on the uploads page in the uploaded logs tab.

It is normal to have some errors in your logs, lines with error are just skipped and doesn't prevent the logs from being processed.

However, logs with an abnormal amount of errors will be highlighted in red, click on the details button to see the list of errors.

Here is a list of the most common errors :

Date before currently parsed dayYou try to upload a log file which contains dates before your currently parsing day. Your currently parsing day is the most recent day extracted from your previous parsing session. You can only upload day equal, or after this day, read thoroughly the log submission documentation to understand this process.
Date too oldYou can't upload a log too far in the past, 30 days from now by default. You can contact the support if you need to upload old logs archives.
Date in futureLike above, you can't upload a log in the future, 3 days from now.
Max hit per day exceededSome plan are limited to a maximum hits per day, once you reach this number for a specific day, lines of this day are skipped and marked with this error.
Invalid IP

Most of the time this error happen because of an invalid log format (see next error invalid field).

If you are sure your log format is correct, an IP can be invalid for multiple reasons : reverse dns instead of IP, we don't support ipv6 yet, obfuscated IP.

Also, if you use a reverse proxy or don't provide the real client ip in your logs, main bots (Google, Bing) won't be detected. You can disable IP validation on configuration page.

Invalid fieldField can be user-agent, uri, referrer... If you see a lot of errors like this, it's very likely your log format directive is incompatible with the log you provided. You should use the logtester to debunk this kind of problem.
Uploaders

The easiest way to upload log is to use the form located on the uploads page. However, this form is limited to files up to 2 GB in size. For bigger file and recurring uploads you have to use the API.

Sitemap & crawl

One of the most useful feature is the ability to compare the bot crawl to a sitemap.

The sitemap being an exhaustive list of the pages of your website, we use it to compute :

  • Uncrawled pages : the list of pages present in your sitemap which are not crawled by the bots.
  • Overcrawled pages : the list of pages crawled by the bot but which are not in your sitemap.
  • Crawled pages : the list of pages present in your sitemap crawled by the bots.

By default, we try to auto-discover the sitemap of your website by parsing robots.txt and fetching sitemaps on a regular basis.

However you can specify an alternative URL to the sitemap on the configuration page.

Crawl compare

A sitemap is not limited to a xml file. We also accept a simple text file with one url per line (and so does Googlebot !).

So if you use an on-site tool or a crawler, you can dump the crawled urls of your tool and specify the remote URLs of this dump in the custom sitemap configuration option.

This way, you can compare the crawl of your tool versus the crawl of a GoogleBot.

Tags

Tags allow you to categorize your site URLs and reports.

Tags modification will be applied on the next log analyze. You can use the log-tester to check the tag correctness instantly, this way you don't have to wait the next analyze to see if tags are correct.

Tag order matter : an URL can only be tagged once. If the URL match multiples tags the first one will be used. That's why you can re-order your tags.

TagEx syntax

The TagEx syntax is like regular expression but in more simpler way and are more restricted. We look forward to improving the TagEx syntax, but for now, it's enough to match most of the URL format.

The TagEx apply on the path and query part of the URL, without the hostname part :

http://example.com/matching?area=1
^URL beginning with
$URL ends with (be carefull, when using this one, never forget the query).
::num::A number (RegEx equivalent : [0-9]+)
::alphanum::An alphanumeric string (RegEx equivalent : [a-zA-Z0-9]+)
::alpha::An alphabetic string (RegEx equivalent : [a-zA-Z]+)
::alphalower::A string of lower chars(RegEx equivalent : [a-z]+)
::alphaupper::A string of upper chars (RegEx equivalent : [A-Z]+)
TagEx examples
  • ^/20::num::/

    CORRESPOND /2015/02/04/title.html
    NE CORRESPOND PAS /page/2015/02/04/title.html
  • /20::num::/

    CORRESPOND /2015/02/04/title.html
    CORRESPOND /page/2015/02/04/title.html
  • ^/category/c-::num::.html

    TAG /category/c-12123.html?query=1
  • ?id=::num::$

    CORRESPOND /page?id=123
    NE CORRESPOND PAS /?id=123&page=1

Website reset

Website reset will erase all the data of your website. Only website configuration and shared access will be kept.

Website reset is the only way to remove a tracked bot.

A website reset may also be useful if you mess with your log uploads : it will reset your oldest uploadable day.

Rate limit

Each website is subject to rate limit. If you are too much restricted by a rate limit, upgrade your plan or contact the support. Main rate limits are defined on the plans & pricing page. You can also access your rate limit status on the upload page.

  • URLs

    Maximum number of different URLs we will store for your website. Once you reach this limit (either in your sitemap or in your logs). Additional URLs will be ignored.

  • Tracked bots

    Maximum number of bots/crawler you can track on your panel.

  • Date limit

    You can't upload logs older than 30 days by default, if you need to upload older logs, contact the support (paid service).

  • The following limits only apply to free DEMO plans.

  • Daily hits

    Maximum number of log lines we will accept for a single day. When parsing a log, once this limit is reached for a day, next lines of this same day are skipped.

  • Daily quota

    The total size of compressed files you can upload per day. Only compressed size is used for this quota. By example, if you upload a 20MB gziped log which is 100MB uncompressed, only 20MB of your quota will be consumed. Once your daily quota is totally depleted, further upload will consume extra quota. Daily quota is reset everyday. There is also a limit on the number of files you can upload per day.

  • Extra quota

    Total size of compressed files you can upload once your daily quota is totally depleted. Mainly used to upload log archives when you just created a new website. Extra quota is not reset but you can buy extra quota by contacting the support.

  • Storable quota

    Total size of compressed files you can store on the server without triggering a log parsing. Also, there is a limit on the number of files you can store without parsing.