Again about phpQuery

Greetings to all. Recently, I took an order where it was necessary to automatically pull data to the site (in other words, write a parser).



The content of the article:







Foreword



Since I work in php, my eyes fell on the phpQuery library. Of course, I agree that there are many other libraries, including the one built into php by default , but for an ordinary programmer who works freelance over the weekend, you need some kind of miracle. Fortunately, laziness drives us all. One Czech laziness led to the creation of phpQuery.



I did not find any documentation in Russian for this library (maybe I was looking badly?). Finding a bunch of questions from newcomers to the forums, and not being able to read the documentation in English, I thought about writing this article. Please note that the article was written mainly for beginners .



Let's get started



PhpQuery is not the fastest library, but one of. With newer php versions it is almost invisible. The main load, as before, rests with page loading.

It has many features that are not mentioned in many Russian-language manuals.

Some programmers, without having understood phpQuery, run to create their own libraries (just like our colleagues from the js world). Yes, this library has a major drawback - the code is outdated, but it works for itself.



Start work



It is quite difficult for beginners to immediately understand how phpQuery works. But I will try to β€œchew” all the difficult moments as much as possible.



Many methods of this library are aimed at working with Dom, as if we were working on jQuery. And the names of these libraries are as similar as possible.



So. First we need to decide on the site from which we will pick up the HTML code. By the way, this does not have to be a site. If we already have html (xml) in the file (variable), then we can load it from there.



/**  : $siteName = "site.com/";  : $siteName = "index.html"; */ $html = file_get_contents("$siteName");
      
      





Next, we need to pass the resulting code to the phpQuery handler



 $dom = phpQuery::newDocument($html);
      
      





The "newDocument ()" method will return a dom object with which we can work.



Now we can find something in this dom object. Let's imagine that we are pulling up a page of a site where there is such a block:



 <div class="product-essential"> <a class="brand-link" href="https://-_.com/-_" title="- "> <span class="brand-name">- </span> </a> <div class="product-name"> <h1>Jeans Denim</h1> </div> <div class="price-info"> <div class="price-box"> <span class="regular-price" id="product-price-424337"> <span class="price">€ 200</span> </span> </div> </div> <div class="description"> <span class="product-description"> </span> <div class="sku"> <span> ID :</span> <span>830214303</span> </div> </div> </div>
      
      





In this example, there is a line with a link to the brand, brand name, product name, description, ID and price.



Practical part



Let's try to get all of the above data.



 //   $html = file_get_contents("https://-_.com/"); //   dom $dom = phpQuery::newDocument($html); //    dom    .product-essential,    find().        . foreach($dom->find(".product-essential") as $key => $value){ //  dom    phpQuery.       pq();    ($)  jQuery. $pq = pq($value); //        .brand-link     "href"    attr(); $productHref[$key]["brand-href"] = $pq->find(".brand-link")->attr("href"); //   .     <span class="brand-name">- </span>. //    ,   <span>       text(); $productHref[$key]["brand-name"] = $pq->find(".brand-name")->text(); //      . //    ,      . //         <h1>,     <div class="brand-name"> $productHref[$key]["product-name"] = $pq->find(".product-name h1")->text(); // PhpQuery    ,    , . //      ! //     . $productHref[$key]["product-price"] = $pq->find(".price-info .price-box .regular-price .price")->text(); //    $productHref[$key]["product-description"] = $pq->find(".description .product-description")->text(); //       . //      next(); //           . $productHref[$key]["product-id"] = $pq->find(".description .sku span")->next()->text(); }
      
      





At the output, we get this array:



 Array ( [0] => Array ( [brand-href] => https://-_.com/-_ [brand-name] => -  [product-name] => Jeans Denim [product-price] => € 200 [product-description] =>   [product-id] => 830214303 ) )
      
      





Conclusion



PhpQuery is a very handy library, but unfortunately too heavy. So after going through the elements it is recommended to unload the document:



 phpQuery::unloadDocuments();
      
      





Despite the convenience of the library, I advise you not to get used to it. For solving small problems, it is probably best of all. But this is still a bit outdated library.



This library has the ability to add items on the fly. But we will touch on this topic in the next article.



All Articles