Diffbot klasa

Diffbot klasa je prva klasa koju developer mora instancirati prilikom koristenja ovog klijenta. Sluzi kao globalni spremnik za konfiguraciju, te kao tvornica klasa raznih API-ja.

class Swader\Diffbot\Diffbot

Diffbot klasa prima jedan neobavezni parametar - $token - koji se dobiva ovdje. Instancira se ovako:

$diffbot = new Diffbot("my_token");

Drugi pristup je da se setira token globalno, te instancira klasa bez parametra:

Diffbot::setToken("my_token");
$diffbot = new Diffbot();

Note that if you instantiate without a global token set, and don’t pass in a token while instantiating either, you’ll get a Swader\Diffbot\Exceptions\DiffbotException thrown.

static Swader\Diffbot\Diffbot::setToken($token)
Parametri:
  • $token (string) – Token.
Vraća:

void, ili baca \InvalidArgumentException ako token nije dobrog formata

Koristi se za podesiti token za sve buduce instance Diffbot klase.

Koristenje:

Diffbot::setToken("my_token");

Swader\Diffbot\Diffbot::getToken()
Vraća:null ili string

Vraca ili token zadane instance, ili globalno definirani token, ili null ukoliko nijedan ne postoji

Koristenje:

echo $diffbot->getToken(); // "my_token"

Swader\Diffbot\Diffbot::setHttpClient(GuzzleHttp\Client $client)
Parametri:
  • $client (GuzzleHttp\Client) – HTTP klijent.
Vraća:

$this

Omogucava izmjenu HTTP klijenta koji se interno koristi za slanje poziva na Diffbot API. Namijenjeno za koristenje prilikom testiranja - nije od prevelike koristi izvan tog scenarija. Ovu se metodu ne treba zvati kako bi se Diffbot klasa mogla koristiti - default joj je GuzzleHttpClient.

Koristenje:

$client = new GuzzleHttp\Client();
$diffbot->setHttpClient($client);

Swader\Diffbot\Diffbot::getHttpClient()

Returns the currently set HTTP client. Can be changed via Swader\Diffbot\Diffbot::setHttpClient.

Vraća:GuzzleHttp\Client

Swader\Diffbot\Diffbot::setEntityFactory($factory)
Parametri:
Vraća:

$this

Sluzi za izmjenu defaultne entity factory klase za neku drugu. Entity factory klasa sluzi za pretvaranje dobivenih Diffbot podataka u entitete koji imaju mogucnosti specificne za taj tip obradenih podataka. Na primjer, poseban Entity Factory mogao bi vracati Author entitete za Custom API koji sluzi za preuzimanje portfolija autora sa SitePoint.com. Time se dobivaju direktno iskoristivi podaci iz poziva Diffbotu, te nema potrebe za dodatnom obradom.

If not explicitly set, defaults to built-in Swader\Diffbot\Factory\Entity.

Koristenje:

$newEntityFactory = new \My\Custom\EntityFactory();

$diffbot = new Diffbot('my_token');
$diffbot->setEntityFactory($newEntityFactory);

// @todo: Full tutorial about a custom Entity and EntityFactory

Swader\Diffbot\Diffbot::getEntityFactory()
Vraća:Swader\Diffbot\Interfaces\EntityFactory

Returns the currently defined Swader\Diffbot\Interfaces\EntityFactory instance. This method generally isn’t needed outside of testing scenarios. See above for usage of the setter.

Swader\Diffbot\Diffbot::createProductApi($url)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća:

Swader\Diffbot\Api\Product

The product API turns web shops, catalogs, etc. into structured JSON (think eBay, Amazon...). This method creates an instance of the Swader\Diffbot\Api\Product class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below). For a detailed directory of available methods and in depth usage examples, see the Swader\Diffbot\Api\Product documentation.

Koristenje:

$api = $diffbot->createProductApi("http://www.amazon.com/Oh-The-Places-Youll-Go/dp/0679805273/");
$result = $api->call();

echo $result->offerPrice; // $11.99
echo $result->getIsbn(); // 0679805273

Swader\Diffbot\Diffbot::createArticleApi($url)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća:

Swader\Diffbot\Api\Article

The article API turns online news posts, blog articles, etc. into structured JSON. This method creates an instance of the Swader\Diffbot\Api\Article class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below). For a detailed directory of available methods and in depth usage examples, see the Swader\Diffbot\Api\Article documentation.

Koristenje:

$api = $diffbot->createArticleApi("http://techcrunch.com/2012/05/31/diffbot-raises-2-million-seed-round-for-web-content-extraction-technology/");
$result = $api->call();

echo $result->publisherCountry; // United States
echo $result->getAuthor(); // Sarah Perez

Swader\Diffbot\Diffbot::createImageApi($url)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća:

Swader\Diffbot\Api\Image

The image API finds images in a post and returns them as JSON. This method creates an instance of the Swader\Diffbot\Api\Image class. The method accepts a single string as a parameter: either a URL which to process for images, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below). For a detailed directory of available methods and in depth usage examples, see the Swader\Diffbot\Api\Image documentation. Note that unlike Product and Article, the Image API can return several Image entities (see usage below). If not iterated through, the result refers to the first image only.

Koristenje:

$api = $diffbot->createImageApi("http://smittenkitchen.com/blog/2012/01/buckwheat-baby-with-salted-caramel-syrup/");
$result = $api->call();

echo $result->naturalHeight; // 333

foreach ($result as $image) {
    echo $result->title;
    echo $result->getXPath();
}

Swader\Diffbot\Diffbot::createAnalyzeApi($url)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća:

Swader\Diffbot\Api\Analyze

The analyze API tries to autodetect the content it’s dealing with (image, product, article...) and extracts it into structured JSON. This method creates an instance of the Swader\Diffbot\Api\Analyze class. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below). The Analyze API is the default API used during Swader\Diffbot\Diffbot::crawl mode.

Koristenje:

$api = $diffbot->createAnalyzeApi("http://techcrunch.com/2012/05/31/diffbot-raises-2-million-seed-round-for-web-content-extraction-technology/");
$result = $api->call();

echo $result->publisherCountry; // United States
echo $result->getAuthor(); // Sarah Perez

Swader\Diffbot\Diffbot::createDiscussionApi($url)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
Vraća:

Swader\Diffbot\Api\Discussion

The discussion API turns online comments, forum topics or pages of reviews into structured JSON. Think Amazon review section, Youtube comments, article Disqus comments, etc. This method creates an instance of the Swader\Diffbot\Api\Discussion. The method accepts a single string as a parameter: either a URL which to process, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below). Like the Image API above, this one also returns several Swader\Diffbot\Api\Discussion entities per call, if available, along with other data - see usage below.

Koristenje:

$api = $diffbot->createDiscussionApi("http://boards.straightdope.com/sdmb/showthread.php?t=740315");
$result = $api->call();

echo $result->numPosts; // 43
echo $result->getParticipants(); // 23

foreach ($result as $post) {
    echo $post->getAuthor();
    echo $post->votes;
}

Swader\Diffbot\Diffbot::createCustomApi($url, $name)
Parametri:
  • $url (string) – URL koji bi se trebao obraditi, ili rijec “crawl”
  • $name (string) – Ime Custom API-ja kao sto je prikazano u Diffbot sucelju
Vraća:

Swader\Diffbot\Api\Custom

Diffbot customers can define Custom APIs. For a tutorial on doing this, see here. What it comes down to, is that you can tell Diffbot how to recognize certain areas of a web page, and have it translate that into JSON for you if none of the standard APIs do the trick. This allows for much more lightweight and specific calls, resulting in a quicker turnaround and (usually) more precise data. This method creates an instance of the Swader\Diffbot\Api\Custom. The method accepts two parameters: either a URL which to process, or the word “crawl” if used in conjunction with the Swader\Diffbot\Diffbot::crawl method (see below), and the name of the custom API to use. Unlike other APIs, this one has no specific entity to return and instead returns a Swader\Diffbot\Entity\Wildcard entity which matches anything.

Koristenje:

$api = $api->createCustomApi("http://sitepoint.com/author/bskvorc", "AuthorFolio");
$result = $api->call();

echo $result->bio; // Bruno is a coder from Croatia with Master's Degrees in...

Swader\Diffbot\Diffbot::crawl($name = null, Swader\Diffbot\Api $api = null)
Parametri:
  • $name (string) – Ime novog crawljoba. U slucaju da ga se izostavi, aktivira se read-only nacin rada koji vraca podatke o svim crawljobovima na zadanom Diffbot tokenu.
  • $api (Swader\Diffbot\Api) – Instance of the API to process the crawled URLs. If omitted, defaults to Swader\Diffbot\Api\Analyze.
Vraća:

Swader\Diffbot\Api\Crawl

Crawl metoda se koristi za kreiranje novog Crawlbot zadatka (crawljoba). Za vise informacija o Crawlbotu, te svemu sto, zasto i kako radi, vidi ovdje. Radi izbjegavanja nejasnoca, korisno je procitati i Crawlbot API dokumentaciju, kao i Crawlbot podrsku.

In a nutshell, the Crawlbot crawls a set of seed URLs for links (even if a subdomain is passed to it as seed URL, it still looks through the entire main domain and all other subdomains it can find) and then processes all the pages it can find using the API you define (or opting for Analyze API by default). The result of the call is a collection of Swader\Diffbot\Entity\JobCrawl objects, each with details about a defined job. To actually get data obtained by crawling and processing, use the Swader\Diffbot\Diffbot::search API.

Here’s how you can create a crawljob (see detailed Swader\Diffbot\Api\Search for a step by step guide with explanations):

$url = 'crawl';
$articleApi = $diffbot->createArticleAPI($url)->setDiscussion(false);

$crawl = $diffbot->crawl('mycrawl_01', $articleApi);

$crawl->setSeeds(['http://sitepoint.com']);

$job = $crawl->call();

// See JobCrawl class to find out which getters are available
dump($job->getDownloadUrl("json")); // outputs download URL to JSON dataset of the job's result