AJAX рд╕рд╛рдЗрдЯреЛрдВ рдХреЛ рдЕрдиреБрдХреНрд░рдорд┐рдд рдХрд░рдирд╛

рдЬрдм рдПрдХ рд╡реЗрдм рдЕрдиреБрдкреНрд░рдпреЛрдЧ рдХрд╛ рдЗрдВрдЯрд░рдлрд╝реЗрд╕ рд╡рд┐рдХрд╕рд┐рдд рд╣реЛ рд░рд╣рд╛ рд╣реИ, рддреЛ рдпрд╣ рдХрд╛рд░реНрдп AJAX рдЕрдиреБрд░реЛрдз рджреНрд╡рд╛рд░рд╛ рдмрдирд╛рдП рдЧрдП рдкреГрд╖реНрдареЛрдВ рдХреЛ рдЦреЛрдЬ рдЗрдВрдЬрди рджреНрд╡рд╛рд░рд╛ рдЕрдиреБрдХреНрд░рдорд┐рдд рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рдЙрддреНрдкрдиреНрди рд╣реБрдЖред Yandex рдФрд░ Google рдХреЗ рдкрд╛рд╕ рдРрд╕реЗ рдкреГрд╖реНрдареЛрдВ ( https://developers.google.com/webmasters/ajax-crawling/ http://help.yandex.ru/webmaster/robot-workings/ajax-indexing.xml ) рдХреЛ рдЕрдиреБрдХреНрд░рдорд┐рдд рдХрд░рдиреЗ рдХрд╛ рдПрдХ рддрдВрддреНрд░ рд╣реИред рдиреАрдЪреЗ рдХреА рд░реЗрдЦрд╛ рдХрд╛рдлреА рд╕рд░рд▓ рд╣реИ, рд░реЛрдмреЛрдЯ рдХреЛ рдкреГрд╖реНрда рдХреЗ HTML рд╕рдВрд╕реНрдХрд░рдг рдХреЗ рдмрд╛рд░реЗ рдореЗрдВ рдмрддрд╛рдиреЗ рдХреЗ рд▓рд┐рдП, рдЖрдкрдХреЛ рд╢рд░реАрд░ рдореЗрдВ рдПрдХ рдЯреИрдЧ рд╢рд╛рдорд┐рд▓ рдХрд░рдиреЗ рдХреА рдЖрд╡рд╢реНрдпрдХрддрд╛ рд╣реИ . AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




        .       AJAX . HTML       www.example.com/?_escaped_fragment_=.  ,      http://widjer.net/posts/posts-430033,       http://widjer.net/posts/posts-430033?_escaped_fragment_=. 
      

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .




. AJAX . HTML www.example.com/?_escaped_fragment_=. , http://widjer.net/posts/posts-430033, http://widjer.net/posts/posts-430033?_escaped_fragment_=.

, , ajax , .





ASP MVC durandaljs (http://durandaljs.com/). durandal (http://durandaljs.com/documentation/Making-Durandal-Apps-SEO-Crawlable.html). , Blitline (http://www.blitline.com/docs/seo_optimizer). , . , Amazon S3 bucket. , .





http://aws.amazon.com/s3/ . , . , , .



S3

S3 buckets: day, month, weak. , . bucket Lifecycle. , , 7 30 bucket.



Blitline . bucket .

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddCannedAcl", "Effect": "Allow", "Principal": { "CanonicalUser": "dd81f2e5f9fd34f0fca01d29c62e6ae6cafd33079d99d14ad22fbbea41f36d9a"}, "Action": [ "s3:PutObjectAcl", "s3:PutObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

YOUR_BUCKET_NAME bucket.

S3 , .



, MVC Controller

SPA, HomeController, durandal . Index Home .

if (Request.QueryString["_escaped_fragment_"] == null) { return View(); } try { //We┬┤ll crawl the normal url without _escaped_fragment_ var result = await _crawler.SnaphotUrl( Request.Url.AbsoluteUri.Replace("?_escaped_fragment_=", "") ); return Content(result); } catch (Exception ex) { Trace.TraceError("CrawlError: {0}", ex.Message); return View("FailedCrawl"); }





_crawler



public interface ICrawl { Task<string> SnaphotUrl(string url); }



url, , html .



public class Crawl: ICrawl { private IUrlStorage _sorage; // S3 private ISpaSnapshot _snapshot; // public Crawl(IUrlStorage st, ISpaSnapshot ss) { Debug.Assert(st != null); Debug.Assert(ss != null); _sorage = st; _snapshot = ss; } public async Task<string> SnaphotUrl(string url) { // (S3 ) string res = await _sorage.Get(url); // , if (!string.IsNullOrWhiteSpace(res)) return res; // , await _snapshot.TakeSnapshot(url, _sorage); // var i = 0; do { res = await _sorage.Get(url); if(!string.IsNullOrWhiteSpace(res)) return res; Thread.Sleep(5000); } while(i < 3); // throw new CrawlException(" "); } }

, .



S3

IUrlStorage

public interface IUrlStorage { Task<string> Get(string url); // Task Put(string url, string body); // // IUrlToBucketNameStrategy BuckName { get; } // url bucketname IUrlToKeyStrategy KeyName { get; } // url }



S3 , .

public class S3Storage: IUrlStorage { private IUrlToBucketNameStrategy _buckName; // url bucket public IUrlToBucketNameStrategy BuckName { get { return _buckName;} } private IUrlToKeyStrategy _keyName; // url public IUrlToKeyStrategy KeyName { get { return _keyName; } } // , amazon private readonly string _amazonS3AccessKeyID; private readonly string _amazonS3secretAccessKeyID; private readonly AmazonS3Config _amazonConfig; public S3Storage(string S3Key = null, string S3SecretKey = null, IUrlToBucketNameStrategy bns = null, IUrlToKeyStrategy kn = null) { _amazonS3AccessKeyID = S3Key; _amazonS3secretAccessKeyID = S3SecretKey; _buckName = bns ?? new UrlToBucketNameStrategy(); // , _keyName = kn ?? new UrlToKeyStrategy(); // , _amazonConfig = new AmazonS3Config { RegionEndpoint = Amazon.RegionEndpoint.USEast1 // bucket US Default, }; } public async Task<string> Get(string url) { // url bucket string bucket = _buckName.Get(url), key = _keyName.Get(url), res = string.Empty; // var client = CreateClient(); // GetObjectRequest request = new GetObjectRequest { BucketName = bucket, Key = key, }; try { // var S3response = await client.GetObjectAsync(request); using (var reader = new StreamReader(S3response.ResponseStream)) { res = reader.ReadToEnd(); } } catch (AmazonS3Exception ex) { if (ex.ErrorCode != "NoSuchKey") throw ex; } return res; } private IAmazonS3 CreateClient() { // var client = string.IsNullOrWhiteSpace(_amazonS3AccessKeyID) // ? Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonConfig) //from appSettings : Amazon.AWSClientFactory.CreateAmazonS3Client(_amazonS3AccessKeyID, _amazonS3secretAccessKeyID, _amazonConfig); return client; } public async Task Put(string url, string body) { string bucket = _buckName.Get(url), key = _keyName.Get(url); var client = CreateClient(); PutObjectRequest request = new PutObjectRequest { BucketName = bucket, Key = key, ContentType = "text/html", ContentBody = body }; await client.PutObjectAsync(request); } }



, url S3 . Bucket . , .

public interface IUrlToBucketNameStrategy { string Get(string url); // url, bucket } public class UrlToBucketNameStrategy : IUrlToBucketNameStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); var bucketName = "day"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); if(parts.Length > 1) { // switch(parts[1]) { case "posts": // , , bucketName = "month"; break; case "users": // , bucketName = "weak"; break; } } return bucketName; } }



bucket , . IUrlToKeyStrategy.

public interface IUrlToKeyStrategy { string Get(string url); } public class UrlToKeyStrategy: IUrlToKeyStrategy { private static readonly char[] Sep = new[] { '/' }; public string Get(string url) { Debug.Assert(url != null); string key = "mainpage"; // var parts = url.Split(Sep, StringSplitOptions.RemoveEmptyEntries); // if(parts.Length > 0) { // "" key = string.Join(".", parts.Select(x => HttpUtility.UrlEncode(x))); } return key; } }



, .



AJAX

ISpaSnapshot

public interface ISpaSnapshot { Task TakeSnapshot(string url, IUrlStorage storage); }



Blitline. , , .

public class BlitlineSpaSnapshot : ISpaSnapshot { private string _appId; //id private IUrlStorage _storage; // private int _regTimeout = 30000; //30s // public BlitlineSpaSnapshot(string appId, IUrlStorage st) { _appId = appId; _storage = st; } public async Task TakeSnapshot(string url, IUrlStorage storage) { // string jsonData = FormatCrawlRequest(url); // var resp = await Crawl(url, jsonData); // , if (!string.IsNullOrWhiteSpace(resp)) throw new CrawlException(resp); } private async Task<string> Crawl(string url, string jsonData) { // string crawlResponse = string.Empty; using (var client = new HttpClient()) { var result = await client.PostAsync("http://api.blitline.com/job", new FormUrlEncodedContent(new Dictionary<string, string> { { "json", jsonData } })); var o = result.Content.ReadAsStringAsync().Result; // var response = JsonConvert.DeserializeObject<BlitlineBatchResponse>(o); // if(response.Failed) crawlResponse = string.Join("; ", response.Results.Select(x => x.Error)); } return crawlResponse; } private string FormatCrawlRequest(string url) { // , JSON var reqData = new BlitlineRequest { ApplicationId = _appId, Src = url, SrcType = "screen_shot_url", SrcData = new SrcDataDto { ViewPort = "1200x800", SaveHtml = new SaveDest { S3Des = new StorageDestination { Bucket = _storage.BuckName.Get(url), Key = _storage.KeyName.Get(url) } } }, Functions = new[] { new FunctionData { Name = "no_op" } } }; return JsonConvert.SerializeObject(new[] { reqData }); } }





, . . , . PhantomJS



public class PhantomJsSnapShot : ISpaSnapshot { private readonly string _exePath; // PhantomJS private readonly string _jsPath; // , public PhantomJsSnapShot(string exePath, string jsPath) { _exePath = exePath; _jsPath = jsPath; } public Task TakeSnapshot(string url, IUrlStorage storage) { // var startInfo = new ProcessStartInfo { Arguments = String.Format("{0} {1}", _jsPath, url), FileName = _exePath, UseShellExecute = false, CreateNoWindow = true, RedirectStandardOutput = true, RedirectStandardError = true, RedirectStandardInput = true, StandardOutputEncoding = System.Text.Encoding.UTF8 }; Process p = new Process { StartInfo = startInfo }; p.Start(); // string output = p.StandardOutput.ReadToEnd(); p.WaitForExit(); // return storage.Put(url, output); } }

_jsPath

var resourceWait = 13000, maxRenderWait = 13000; var page = require('webpage').create(), system = require('system'), count = 0, forcedRenderTimeout, renderTimeout; page.viewportSize = { width: 1280, height: 1024 }; function doRender() { console.log(page.content); phantom.exit(); } page.onResourceRequested = function (req) { count += 1; clearTimeout(renderTimeout); }; page.onResourceReceived = function (res) { if (!res.stage || res.stage === 'end') { count -= 1; if (count === 0) { renderTimeout = setTimeout(doRender, resourceWait); } } }; page.open(system.args[1], function (status) { if (status !== "success") { phantom.exit(); } else { forcedRenderTimeout = setTimeout(function () { doRender(); }, maxRenderWait); } });



AJAX , . widjer.net ( DEMO). url http://widjer.net/timeline/%23_ . http://widjer.net/timeline/%23_?_escaped_fragment_= javascript. , .







All Articles