Diffbot klasa¶
Diffbot klasa je prva klasa koju developer mora instancirati prilikom koristenja ovog klijenta. Sluzi kao globalni spremnik za konfiguraciju, te kao tvornica klasa raznih API-ja.
-
class
Swader\Diffbot\
Diffbot
¶ Diffbot klasa prima jedan neobavezni parametar -
$token
- koji se dobiva ovdje. Instancira se ovako:$diffbot = new Diffbot("my_token");
Drugi pristup je da se setira token globalno, te instancira klasa bez parametra:
Diffbot::setToken("my_token"); $diffbot = new Diffbot();
Note that if you instantiate without a global token set, and don’t pass in a token while instantiating either, you’ll get a
Swader\Diffbot\Exceptions\DiffbotException
thrown.
setToken¶
- static
Swader\Diffbot\Diffbot::
setToken
($token)¶
Parametri:
- $token (string) – Token.
Vraća: void, ili baca \InvalidArgumentException ako token nije dobrog formata
Koristi se za podesiti token za sve buduce instance Diffbot klase.
Koristenje:
Diffbot::setToken("my_token");
getToken¶
Swader\Diffbot\Diffbot::
getToken
()¶
Vraća: null ili string Vraca ili token zadane instance, ili globalno definirani token, ili null ukoliko nijedan ne postoji
Koristenje:
echo $diffbot->getToken(); // "my_token"
setHttpClient¶
Swader\Diffbot\Diffbot::
setHttpClient
(GuzzleHttp\Client $client)¶
Parametri:
- $client (GuzzleHttp\Client) – HTTP klijent.
Vraća: $this
Omogucava izmjenu HTTP klijenta koji se interno koristi za slanje poziva na Diffbot API. Namijenjeno za koristenje prilikom testiranja - nije od prevelike koristi izvan tog scenarija. Ovu se metodu ne treba zvati kako bi se Diffbot klasa mogla koristiti - default joj je GuzzleHttpClient.
Koristenje:
$client = new GuzzleHttp\Client(); $diffbot->setHttpClient($client);
getHttpClient¶
Swader\Diffbot\Diffbot::
getHttpClient
()¶Returns the currently set HTTP client. Can be changed via
Swader\Diffbot\Diffbot::setHttpClient
.
Vraća: GuzzleHttp\Client
setEntityFactory¶
Swader\Diffbot\Diffbot::
setEntityFactory
($factory)¶
Parametri:
- $factory (Swader\Diffbot\Interfaces\EntityFactory) – A
Swader\Diffbot\Interfaces\EntityFactory
implementation.Vraća: $this
Sluzi za izmjenu defaultne entity factory klase za neku drugu. Entity factory klasa sluzi za pretvaranje dobivenih Diffbot podataka u entitete koji imaju mogucnosti specificne za taj tip obradenih podataka. Na primjer, poseban Entity Factory mogao bi vracati Author entitete za Custom API koji sluzi za preuzimanje portfolija autora sa SitePoint.com. Time se dobivaju direktno iskoristivi podaci iz poziva Diffbotu, te nema potrebe za dodatnom obradom.
If not explicitly set, defaults to built-in
Swader\Diffbot\Factory\Entity
.Koristenje:
$newEntityFactory = new \My\Custom\EntityFactory(); $diffbot = new Diffbot('my_token'); $diffbot->setEntityFactory($newEntityFactory); // @todo: Full tutorial about a custom Entity and EntityFactory
getEntityFactory¶
Swader\Diffbot\Diffbot::
getEntityFactory
()¶
Vraća: Swader\Diffbot\Interfaces\EntityFactory
Returns the currently defined
Swader\Diffbot\Interfaces\EntityFactory
instance. This method generally isn’t needed outside of testing scenarios. See above for usage of the setter.
createProductApi¶
Swader\Diffbot\Diffbot::
createProductApi
($url)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća: The product API turns web shops, catalogs, etc. into structured JSON (think eBay, Amazon...). This method creates an instance of the
Swader\Diffbot\Api\Product
class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below). For a detailed directory of available methods and in depth usage examples, see theSwader\Diffbot\Api\Product
documentation.Koristenje:
$api = $diffbot->createProductApi("http://www.amazon.com/Oh-The-Places-Youll-Go/dp/0679805273/"); $result = $api->call(); echo $result->offerPrice; // $11.99 echo $result->getIsbn(); // 0679805273
createArticleApi¶
Swader\Diffbot\Diffbot::
createArticleApi
($url)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća: The article API turns online news posts, blog articles, etc. into structured JSON. This method creates an instance of the
Swader\Diffbot\Api\Article
class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below). For a detailed directory of available methods and in depth usage examples, see theSwader\Diffbot\Api\Article
documentation.Koristenje:
$api = $diffbot->createArticleApi("http://techcrunch.com/2012/05/31/diffbot-raises-2-million-seed-round-for-web-content-extraction-technology/"); $result = $api->call(); echo $result->publisherCountry; // United States echo $result->getAuthor(); // Sarah Perez
createImageApi¶
Swader\Diffbot\Diffbot::
createImageApi
($url)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća: The image API finds images in a post and returns them as JSON. This method creates an instance of the
Swader\Diffbot\Api\Image
class. The method accepts a single string as a parameter: either a URL which to process for images, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below). For a detailed directory of available methods and in depth usage examples, see theSwader\Diffbot\Api\Image
documentation. Note that unlike Product and Article, the Image API can return several Image entities (see usage below). If not iterated through, the result refers to the first image only.Koristenje:
$api = $diffbot->createImageApi("http://smittenkitchen.com/blog/2012/01/buckwheat-baby-with-salted-caramel-syrup/"); $result = $api->call(); echo $result->naturalHeight; // 333 foreach ($result as $image) { echo $result->title; echo $result->getXPath(); }
createAnalyzeApi¶
Swader\Diffbot\Diffbot::
createAnalyzeApi
($url)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća: The analyze API tries to autodetect the content it’s dealing with (image, product, article...) and extracts it into structured JSON. This method creates an instance of the
Swader\Diffbot\Api\Analyze
class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below). The Analyze API is the default API used duringSwader\Diffbot\Diffbot::crawl
mode.Koristenje:
$api = $diffbot->createAnalyzeApi("http://techcrunch.com/2012/05/31/diffbot-raises-2-million-seed-round-for-web-content-extraction-technology/"); $result = $api->call(); echo $result->publisherCountry; // United States echo $result->getAuthor(); // Sarah Perez
createDiscussionApi¶
Swader\Diffbot\Diffbot::
createDiscussionApi
($url)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća: The discussion API turns online comments, forum topics or pages of reviews into structured JSON. Think Amazon review section, Youtube comments, article Disqus comments, etc. This method creates an instance of the
Swader\Diffbot\Api\Discussion
. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below). Like the Image API above, this one also returns severalSwader\Diffbot\Api\Discussion
entities per call, if available, along with other data - see usage below.Koristenje:
$api = $diffbot->createDiscussionApi("http://boards.straightdope.com/sdmb/showthread.php?t=740315"); $result = $api->call(); echo $result->numPosts; // 43 echo $result->getParticipants(); // 23 foreach ($result as $post) { echo $post->getAuthor(); echo $post->votes; }
createCustomApi¶
Swader\Diffbot\Diffbot::
createCustomApi
($url, $name)¶
Parametri:
- $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
- $name (string) – Ime Custom API-ja kao sto je prikazano u Diffbot sucelju
Vraća: Diffbot customers can define Custom APIs. For a tutorial on doing this, see here. What it comes down to, is that you can tell Diffbot how to recognize certain areas of a web page, and have it translate that into JSON for you if none of the standard APIs do the trick. This allows for much more lightweight and specific calls, resulting in a quicker turnaround and (usually) more precise data. This method creates an instance of the
Swader\Diffbot\Api\Custom
. The method accepts two parameters: either a URL which to process, or the word “crawl” if used in conjunction with theSwader\Diffbot\Diffbot::crawl
method (see below), and the name of the custom API to use. Unlike other APIs, this one has no specific entity to return and instead returns aSwader\Diffbot\Entity\Wildcard
entity which matches anything.Koristenje:
$api = $api->createCustomApi("http://sitepoint.com/author/bskvorc", "AuthorFolio"); $result = $api->call(); echo $result->bio; // Bruno is a coder from Croatia with Master's Degrees in...
crawl¶
Swader\Diffbot\Diffbot::
crawl
($name = null, Swader\Diffbot\Api $api = null)¶
Parametri:
- $name (string) – Ime novog crawljoba. U slucaju da ga se izostavi, aktivira se read-only nacin rada koji vraca podatke o svim crawljobovima na zadanom Diffbot tokenu.
- $api (Swader\Diffbot\Api) – Instance of the API to process the crawled URLs. If omitted, defaults to
Swader\Diffbot\Api\Analyze
.Vraća: Crawl metoda se koristi za kreiranje novog Crawlbot zadatka (crawljoba). Za vise informacija o Crawlbotu, te svemu sto, zasto i kako radi, vidi ovdje. Radi izbjegavanja nejasnoca, korisno je procitati i Crawlbot API dokumentaciju, kao i Crawlbot podrsku.
In a nutshell, the Crawlbot crawls a set of seed URLs for links (even if a subdomain is passed to it as seed URL, it still looks through the entire main domain and all other subdomains it can find) and then processes all the pages it can find using the API you define (or opting for Analyze API by default). The result of the call is a collection of
Swader\Diffbot\Entity\JobCrawl
objects, each with details about a defined job. To actually get data obtained by crawling and processing, use theSwader\Diffbot\Diffbot::search
API.Here’s how you can create a crawljob (see detailed
Swader\Diffbot\Api\Search
for a step by step guide with explanations):$url = 'crawl'; $articleApi = $diffbot->createArticleAPI($url)->setDiscussion(false); $crawl = $diffbot->crawl('mycrawl_01', $articleApi); $crawl->setSeeds(['http://sitepoint.com']); $job = $crawl->call(); // See JobCrawl class to find out which getters are available dump($job->getDownloadUrl("json")); // outputs download URL to JSON dataset of the job's result
search¶
Swader\Diffbot\Diffbot::
search
($q)¶
Parametri:
- $q (string) – Upit kojeg Search API treba izvrsiti na Diffbot bazi
Vraća: The Search API is used to search through sets of crawled and processed data obtained through the use of the Crawl or Bulk API. It accepts a simple string query, and returns an array of all matching entities. For a live example of crawl + search implemenation, see here, and for a full walkthrough of the Search API, see the
Swader\Diffbot\Api\Search
docs.Koristenje:
$search = $diffbot->search('author:"Miles Johnson" AND type:article'); $result = $search->call(); foreach ($result as $article) { echo $article->getTitle(); }