We support any log format. By default, it's configured to parse default apache log, which is the same format used by a lot of web server (like nginx). However the format is customizable (see below).
We accept compressed logs (gzip, bzip2, rar, zip) and uncompressed logs.
You can upload the whole log or just the bot traffic. We recommend the whole log, otherwise, you will be missing a lot of useful metrics : page speed, inactive pages, traffic, search referral...
Before uploading your first log or after any configuration change, you should use the logtester to check if your log format is correctly parsed.
Log format can be customized on the website configuration page. Specify your web server type and a log format directive.
Apache : Specify the LogFormat directive of your apache configuration file. Don't forget to unescape the quotes. \" should become ".
Below, an example of valid apache log format :
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"
IIS custom log format: IIS must be configured to use the W3C extended log file format. The log format is specified by the #Fields directive you can find on each log file header.
Below, an example of valid IIS log format :
date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent) cs(Referrer)
Nginx and other web servers: You should be able to represent other web server log format using the Apache log directive.
The following fields must be provided :
If you want to enable the page speed feature of the log analyzer, you must enable the logging of request execution time on your server.
%D
variable in the LogFormat directive of your server.$request_time
variables in your log_format directive on your servers. Then, you can translate this $request_time
in Apache Log Format using the %d
variable in your custom log format configuration on the log analyzer.time-taken
variable in your log format directive of your server and on the log analyzer.The log submission is done in two stages : upload and parsing.
You should upload your logs using either the form uploader of the website or the API (best for recurring uploads).
Your uploaded logs don't have to be sorted. You can upload them in random orders so it won't be a problem if you have multiple servers and a log rotate process.
However, there are some restrictions on the date of the logs, you can't upload a log :
Also, now you can't upload multiple logs in parallel, we may remove this limitation later.
If you uploaded a wrong log file by error, you can delete it on the uploads page. Only log which have not been submitted to the parser can be deleted.
The role of the parsing job is to split and to sort your logs by day, it is done by our service so you don't have to bother about that.
Once you uploaded your logs, you can trigger the parsing via the website by clicking on the "Parse uploaded logs" button of the upload page or using the API.
Once the parsing is done, each day will be analyzed, except the last day. (That's why you see nothing when you only upload one day of log).
This last day will be your new currently parsed day : You can only upload logs newer or equal than this date. Logs before this date will be ignored.
The currently parsed day is always specified on the uploads page.
You can reset your website if you want to re-upload your logs from the beginning.
After the parsing step, your logs will be deleted from the server (releasing your storable quota) and you will shortly see the result of the analyze on your dashboard.
However, if you don't trigger the parsing, logs will just be stored on the server and nothing will happen. You must trigger log parsing after each logs upload session. Also, you can't store an infinite amount of logs on our server without triggering a parsing : website & plans are subject to rate limit.
Sometimes, the log parser may encounter errors while parsing your logs. You can see these errors on the uploads page in the uploaded logs tab.
It is normal to have some errors in your logs, lines with error are just skipped and doesn't prevent the logs from being processed.
However, logs with an abnormal amount of errors will be highlighted in red, click on the details button to see the list of errors.
Here is a list of the most common errors :
Date before currently parsed day | You try to upload a log file which contains dates before your currently parsing day. Your currently parsing day is the most recent day extracted from your previous parsing session. You can only upload day equal, or after this day, read thoroughly the log submission documentation to understand this process. |
Date too old | You can't upload a log too far in the past, 30 days from now by default. You can contact the support if you need to upload old logs archives. |
Date in future | Like above, you can't upload a log in the future, 3 days from now. |
Max hit per day exceeded | Some plan are limited to a maximum hits per day, once you reach this number for a specific day, lines of this day are skipped and marked with this error. |
Invalid IP |
Most of the time this error happen because of an invalid log format (see next error invalid field). If you are sure your log format is correct, an IP can be invalid for multiple reasons : reverse dns instead of IP, we don't support ipv6 yet, obfuscated IP. Also, if you use a reverse proxy or don't provide the real client ip in your logs, main bots (Google, Bing) won't be detected. You can disable IP validation on configuration page. |
Invalid field | Field can be user-agent, uri, referrer... If you see a lot of errors like this, it's very likely your log format directive is incompatible with the log you provided. You should use the logtester to debunk this kind of problem. |
The easiest way to upload log is to use the form located on the uploads page. However, this form is limited to files up to 2 GB in size. For bigger file and recurring uploads you have to use the API.
One of the most useful feature is the ability to compare the bot crawl to a sitemap.
The sitemap being an exhaustive list of the pages of your website, we use it to compute :
By default, we try to auto-discover the sitemap of your website by parsing robots.txt and fetching sitemaps on a regular basis.
However you can specify an alternative URL to the sitemap on the configuration page.
A sitemap is not limited to a xml file. We also accept a simple text file with one url per line (and so does Googlebot !).
So if you use an on-site tool or a crawler, you can dump the crawled urls of your tool and specify the remote URLs of this dump in the custom sitemap configuration option.
This way, you can compare the crawl of your tool versus the crawl of a GoogleBot.
Tags allow you to categorize your site URLs and reports.
Tags modification will be applied on the next log analyze. You can use the log-tester to check the tag correctness instantly, this way you don't have to wait the next analyze to see if tags are correct.
Tag order matter : an URL can only be tagged once. If the URL match multiples tags the first one will be used. That's why you can re-order your tags.
The TagEx syntax is like regular expression but in more simpler way and are more restricted. We look forward to improving the TagEx syntax, but for now, it's enough to match most of the URL format.
The TagEx apply on the path and query part of the URL, without the hostname part :
http://example.com/matching?area=1
^ | URL beginning with |
---|---|
$ | URL ends with (be carefull, when using this one, never forget the query). |
::num:: | A number (RegEx equivalent : [0-9]+) |
::alphanum:: | An alphanumeric string (RegEx equivalent : [a-zA-Z0-9]+) |
::alpha:: | An alphabetic string (RegEx equivalent : [a-zA-Z]+) |
::alphalower:: | A string of lower chars(RegEx equivalent : [a-z]+) |
::alphaupper:: | A string of upper chars (RegEx equivalent : [A-Z]+) |
^/20::num::/
CORRESPOND /2015/02/04/title.html
NE CORRESPOND PAS /page/2015/02/04/title.html
/20::num::/
CORRESPOND /2015/02/04/title.html CORRESPOND /page/2015/02/04/title.html
^/category/c-::num::.html
TAG /category/c-12123.html?query=1
?id=::num::$
CORRESPOND /page?id=123
NE CORRESPOND PAS /?id=123&page=1
Website reset will erase all the data of your website. Only website configuration and shared access will be kept.
Website reset is the only way to remove a tracked bot.
A website reset may also be useful if you mess with your log uploads : it will reset your oldest uploadable day.
Each website is subject to rate limit. If you are too much restricted by a rate limit, upgrade your plan or contact the support. Main rate limits are defined on the plans & pricing page. You can also access your rate limit status on the upload page.
Maximum number of different URLs we will store for your website. Once you reach this limit (either in your sitemap or in your logs). Additional URLs will be ignored.
Maximum number of bots/crawler you can track on your panel.
You can't upload logs older than 30 days by default, if you need to upload older logs, contact the support (paid service).
The following limits only apply to free DEMO plans.
Maximum number of log lines we will accept for a single day. When parsing a log, once this limit is reached for a day, next lines of this same day are skipped.
The total size of compressed files you can upload per day. Only compressed size is used for this quota. By example, if you upload a 20MB gziped log which is 100MB uncompressed, only 20MB of your quota will be consumed. Once your daily quota is totally depleted, further upload will consume extra quota. Daily quota is reset everyday. There is also a limit on the number of files you can upload per day.
Total size of compressed files you can upload once your daily quota is totally depleted. Mainly used to upload log archives when you just created a new website. Extra quota is not reset but you can buy extra quota by contacting the support.
Total size of compressed files you can store on the server without triggering a log parsing. Also, there is a limit on the number of files you can store without parsing.