Article API¶
This API is used to turn content like blog posts, news articles, and other prose into JSON.
For examples of data that might be returned, please see http://diffbot.com and run the Article API demo.
The Article API part of the Diffbot PHP client consists of two main classes: the API class, and the Article Entity class. We’ll describe them in order. Note that the API class extends Swader\Diffbot\Abstracts\Api
, so be sure to read that first if you haven’t already.
Article API Class¶
-
class
Swader\Diffbot\Api\
Article
¶
Basic Usage:
use Swader\Diffbot\Diffbot;
$url = 'http://some-article-to-process.com';
$diffbot = new Diffbot('my_token');
$api = $diffbot->createArticleApi($url);
setSentiment¶
Swader\Diffbot\Api\Article::
setSentiment
($bool)¶
Parametri:
- $bool (bool) – Either
true
orfalse
Vraća: $this
This method sets the
sentiment
optional field value. This determines whether or not to return the sentiment score of the analyzed article text, a value ranging from -1.0 (very negative) to 1.0 (very positive). Sentiment analysis is powered by Semantria for advanced features like keyword and entity extraction, but the basic sentiment analysis (score only) is enabled for everyone, even those without Semantria accounts.Usage:
$url = 'http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/'; // ... $api->setSentiment(true); $result = $api->cal(); // ... echo $result->sentiment; // -0.0979
setPaging¶
Swader\Diffbot\Api\Article::
setPaging
($bool = true)¶
Parametri:
- $bool (bool) – Either
true
orfalse
Vraća: $this
If set to false, Diffbot will not auto-concatenate several pages of a multi-page article into one. Defaults to true, max 20 pages.
For more info about auto-concatenation, see here.
While practical, this is a less reliable method of concatenating long posts than finding out the number of pages manually and processing them each one by one. Not only does it often fail to recognize the next page links, but also if there’s a chance that the series is longer than 20 parts, everything from 20 onward will remain ignored. This is a limitation of Diffbot, not the client, and there’s little chance of it changing - concatenations longer than 20 pages would likely trigger timeouts as the page count becomes less and less trivial.
If you need to process multiple pages of something, it is thus recommended you find out those links yourself, then pass them into Article API one by one and concatenate later. If you’d like to analyze the entire concatenated post after the fact, it’s best to manually concat and then send the merged content into Diffbot as a POST value for processing.
Usage:
$url = 'http://www.some-seven-part-article.com/'; // ... $api->setPaging(true); $result = $api->cal(); // ... echo $result->numPages; // 7
setMaxTags¶
Swader\Diffbot\Api\Article::
setMaxTags
($max = 5)¶
Parametri:
- $max (int) – The number of tags to generate and return
Vraća: $this
Set the maximum number of automatically-generated tags to return. By default a maximum of five tags will be returned. Tags are a built-in feature of Diffbot, and could generate different results on two different calls to the same URL provided enough time has passed, due to Diffbot’s engine evolving over time as it processed more and more content.
For an example of what the tags might look like, run the demo example at https://diffbot.com or see
Swader\Diffbot\Entity\Article::getTags
.
setDiscussion¶
Swader\Diffbot\Api\Article::
setDiscussion
($bool = true)¶
Parametri:
- $bool (bool) – Either
true
orfalse
Vraća: $this
Whether or not to use the Discussion API to additionally process any detected comment or review threads in the article. Behaves as if the
Swader\Diffbot\Api\Discussion
was set to process the page, and merges the returned data with the Article API’s results by means of adiscussion
field in the result. The field will have all the sub-fields of the usualSwader\Diffbot\Api\Discussion
call; i.e. you will be able to access theSwader\Diffbot\Entity\Discussion
entity and all its sub entities via theSwader\Diffbot\Entity\Article::getDiscussion
method.
Article Entity Class¶
When the Article API is done processing an article (or several) the result will be an Article Entity (i.e. a collection of one Article Entities inside an instance of Swader\Diffbot\Entity\EntityIterator
).
For an overview of the abstract class all Entities build on, see Swader\Diffbot\Abstracts\Entity
.
Note that the Article entity can also be returned by the Swader\Diffbot\Api\Analyze
API in “article” mode, or in default mode when processing a URL that contains an article (auto-determined).
-
class
Swader\Diffbot\Entity\
Article
¶
__construct¶
Swader\Diffbot\Entity\Article::
__construct
(array $data)¶
Parametri:
- $data (array) – The data from which to build the Article entity
The Article entity’s constructor needs the data to populate its properties (see getters below). This class is automatically instantiated after an
Swader\Diffbot\Api\Article
orSwader\Diffbot\Api\Analyze
call. You probably won’t ever need to manually create an instance of this class.In the case of the Article entity, the constructor differs from the abstract one (
Swader\Diffbot\Abstracts\Api::__construct
) in that it also looks for the discussion key in the result, in order to build aSwader\Diffbot\Entity\Discussion
sub-entity (seeSwader\Diffbot\Entity\Article::getDiscussion
).
getType¶
Swader\Diffbot\Entity\Article::
getType
()¶
Vraća: string Will always return “article” for articles:
// ... API setup ... // $result = $api->call(); echo $result->getType(); // "article"
getText¶
Swader\Diffbot\Entity\Article::
getText
()¶
Vraća: string | null Returns the plaintext content of the processed article. HTML tags are stripped completely, images are removed. If the text property is missing in the result, returns
null
.
getHtml¶
Swader\Diffbot\Entity\Article::
getHtml
()¶
Vraća: string Returns the full HTML content of the article. If the HTML property is missing in the result, returns
null
.
getDate¶
getAuthor¶
Swader\Diffbot\Entity\Article::
getAuthor
()¶
Vraća: string | null Returns the name of the author as written on the page. If Diffbot was unable to figure out who the author is,
null
is returned.
getTags¶
Swader\Diffbot\Entity\Article::
getTags
()¶
Vraća: array Returns an array of tags/entities, generated from analysis of the extracted text and cross-referenced with DBpedia and other data sources. Note that these are not the meta tags as defined by the author, but machine learned ones:
// ... API setup ... // // URL: "http://www.sitepoint.com/diffbot-crawling-visual-machine-learning" // $result = $api->call(); echo count($result->tags); // 5 var_dump($result->tags); /** Output: array (size=5) 0 => array (size=4) 'count' => int 1 'score' => float 0.62 'label' => string 'Machine learning' (length=16) 'uri' => string 'http://dbpedia.org/resource/Machine_learning' (length=44) 1 => array (size=4) 'count' => int 4 'score' => float 0.61 'label' => string 'Web crawler' (length=11) 'uri' => string 'http://dbpedia.org/resource/Web_crawler' (length=39) 2 => array (size=4) 'count' => int 4 'score' => float 0.59 'label' => string 'Lexical analysis' (length=16) 'uri' => string 'http://dbpedia.org/resource/Lexical_analysis' (length=44) 3 => array (size=4) 'count' => int 7 'score' => float 0.54 'label' => string 'Uniform resource locator' (length=24) 'uri' => string 'http://dbpedia.org/resource/Uniform_resource_locator' (length=52) 4 => array (size=5) 'count' => int 2 'score' => float 0.52 'label' => string 'JavaScript' (length=10) 'rdfTypes' => array (size=3) 0 => string 'http://dbpedia.org/ontology/ProgrammingLanguage' (length=47) 1 => string 'http://dbpedia.org/ontology/Software' (length=36) 2 => string 'http://dbpedia.org/ontology/Work' (length=32) 'uri' => string 'http://dbpedia.org/resource/JavaScript' (length=38) **/Returns a maximum of 5 by default, though this can be changed in
Swader\Diffbot\Api\Article::setMaxTags
.
getNumPages¶
Swader\Diffbot\Entity\Article::
getNumPages
()¶
Vraća: int Returns the number of pages if the article is a multi-page one. Read about auto-concatenation here and study the
Swader\Diffbot\Api\Article::setPaging
method for more details.
getNextPages¶
Swader\Diffbot\Entity\Article::
getNextPages
()¶
Vraća: array If the article is a multi-page one, returns the list of absolute URLs of the pages that follow after the one that was processed. If the article is a single-page one, an empty array is returned.
getSentiment¶
Swader\Diffbot\Entity\Article::
getSentiment
()¶
Vraća: float | null Returns the sentiment score of the analyzed article text, a value ranging from -1.0 (very negative) to 1.0 (very positive). If sentiment score is absent (due to Diffbot being unable to determine it, or due to
Swader\Diffbot\Api\Article::setSentiment
being set tofalse
, returnsnull
.
getDiscussion¶
Swader\Diffbot\Entity\Article::
getDiscussion
()¶
Vraća: Swader\Diffbot\Entity\Discussion
| nullReturns the
Swader\Diffbot\Entity\Discussion
found on the article’s page (comments section). SeeSwader\Diffbot\Api\Article::setDiscussion
for details and below for usage:use Swader\Diffbot\Diffbot; $url = "www.sitepoint.com/quick-tip-get-homestead-vagrant-vm-running/"; $diffbot = new Diffbot("my_token"); $api = $diffbot->createArticleApi($url); $result = $api->call(); echo $result->getDiscussion()->getNumPosts(); // 7 echo $result->getDiscussion()->getProvider(); // DisqusFor other methods exposed on the
Swader\Diffbot\Entity\Discussion
entity, see its documentation.
getImages¶
Swader\Diffbot\Entity\Article::
getImages
()¶
Vraća: array An array of images found in the article, with their details. The elements of the array are arrays like this one:
/** array (size=7) 'height' => int 512 'diffbotUri' => string 'image|3|-851701004' (length=18) 'naturalHeight' => int 727 'width' => int 749 'primary' => boolean true 'naturalWidth' => int 1063 'url' => string 'http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/07/140624455201.png' (length=79) **/Unlike the
Swader\Diffbot\Api\Discussion
API which returns details about discussion posts even when used with theSwader\Diffbot\Api\Article
API, the image data returned with this method is minimal. For fuller details about images, use theSwader\Diffbot\Api\Image
API.