Scribe
/subscription
Register a URL and optional navigation config.
Update multiple subscriptions in a batch
Get multiple subscriptions using correlation_id
post /subscription
Register a URL and optional navigation config.
Body
Media type: application/json
Type: object
Properties- url: required(string)
Example:
mindlin.com
- country_code: (string - default: USA)
An optional ISO3 country code that the site is expecting to be visited from. This is NOT regional crawling, this simply guarantees that we visit the site from a proxy that makes sense.
Example:
GBR
- correlation_id: (string)
An optional correlation_id tied to this url
Example:
12345
- frequency: (integer - default: 180)
The number of minutes to wait between scrapes
Example:
60
- requires_cart: (boolean - default: false)
Whether or not the cart should be navigated to if "add-to-cart" vocabulary is detected. Defaults to false
Example:
true
- routing_key: (string - default: #)
Optionally what routing key to send results to. Will append to the subscriber's namespace. Defaults to
Example:
foo
- config: (object)
configs for the subscription
Example:
{ [ { "action": "click", "element": "#hikashop_product_characteristic_2243_chzn > a > span", "representsOption": false, "step": 1 }, { "action": "click", "element": ":nth-child(2) > ul > :nth-child(5)", "selector": "#hikashop_product_characteristic_2243_chzn > :nth-child(2) > ul > :nth-child(1), #hikashop_product_characteristic_2243_chzn > :nth-child(2) > ul > :nth-child(2), #hikashop_product_characteristic_2243_chzn > :nth-child(2) > ul > :nth-child(3), #hikashop_product_characteristic_2243_chzn > :nth-child(2) > ul > :nth-child(4), :nth-child(2) > ul > :nth-child(5), :nth-child(2) > ul > :nth-child(6), :nth-child(2) > ul > :nth-child(7)", "representsOption": true, "step": 1, "definition": "color", "displayContent": { "content": "Ice", "hasStaticLabel": false, "selector": ":nth-child(2) > ul > :nth-child(5)", "isDropdown": true, "contentLastMatched": "Ice" } } ] }
- params: (object)
An optional set of key value pairs representing SyphonX params for the subscription
Example:
{ "key1": "value1", "key2": 2 }
- request_type: (string)
determines the request_type of the crawl, will be defaulted to "product_page" if one is not provided. The different request_types are "product_page", "review_page", "category_page", "search_page"
Example:
product_page
Example:
{
"url": "example.com/p/1234",
"frequency": 180,
"correlation_id": "D&DFF7",
"request_type": "product_page",
"requires_cart": false,
"routing_key": "foo",
"params": {
"seller_id": 862411,
"zip_code": "94066",
"store_number": 1019
}
HTTP status code 200
Body
Media type: application/json
Type: object
Properties- subscription_id: required(integer)
Example:
121
- requested_url: required(string)
Example:
mindlin.com
- resolved_url: required(string)
Example:
mindlin.com
- correlation_id: (string)
An optional correlation_id tied to this url
Example:
12345
- frequency: (integer - default: 180)
The number of minutes to wait between scrapes
Example:
60
- request_type: (string - default: product_page)
A string of a value that determines the type of crawl
Example:
review_page
Example:
{
"subscription_id": 4,
"requested_url": "example.com/p/1234",
"resolved_url": "example.com/p/1234",
"correlation_id": "D&DFF7",
"frequency": 180,
"request_type": "product_page"
}
HTTP status code 403
HTTP status code 500
HTTP status code 504
patch /subscription
Update multiple subscriptions in a batch
Body
Media type: application/json
Type: object
Properties- correlation_id: (string)
Example:
1337BRO
- requires_cart: (boolean)
Example:
true
- requires_inventory: (boolean)
Example:
true
- routing_key: (string)
Example:
results.#
- params: (object)
An optional set of key value pairs representing SyphonX params for the subscription
Example:
{ "key1": "value1", "key2": 2 }
- request_type: (string)
An optional string used to set the type of crawl for the subscription
Example:
{
"requires_cart": true,
"request_type": "product_page",
"params": {
"seller_id": 862411,
"zip_code": "94066",
"store_number": 1019
}
}
get /subscription
Get multiple subscriptions using correlation_id
Query Parameters
- correlation_id: required(string)
HTTP status code 200
Body
Media type: application/json
Type: array of object
Items: SubscriptionRecord
- date_created: required(string)
- date_updated: required(string)
- date_archived: required(string)
- correlation_id: required(string)
- id: required(integer)
- instruction_id: required(integer)
- namespace: required(string)
- requires_cart: required(boolean)
- requires_inventory: required(boolean)
- request_url: required(string)
- routing_key: required(string)
- subscriber_id: required(integer)
- request_type: required(string)
HTTP status code 400
HTTP status code 403
HTTP status code 404
HTTP status code 500
Register a URL for multi-nav.
post /subscription/multi-nav
Register a URL for multi-nav.
Body
Media type: application/json
Type: object
Properties- country_code: (string)
Example:
USA
- correlation_id: (string)
Example:
123TLW
- frequency: (integer)
Example:
180
- requires_cart: (boolean)
Example:
true
- requires_inventory: (boolean)
Example:
true
- routing_key: required(string)
Example:
results.#
- slugs: required(object)
- name: required(string)
Example:
foo
Example:
[object Object]
HTTP status code 200
Body
Media type: application/json
Type: object
Properties- subscription_id: required(integer)
Example:
121
- requested_url: required(string)
Example:
mindlin.com
- resolved_url: required(string)
Example:
mindlin.com
- correlation_id: (string)
An optional correlation_id tied to this url
Example:
12345
- frequency: (integer - default: 180)
The number of minutes to wait between scrapes
Example:
60
- request_type: (string - default: product_page)
A string of a value that determines the type of crawl
Example:
review_page
Example:
{
"subscription_id": 4,
"requested_url": "example.com/p/1234",
"resolved_url": "example.com/p/1234",
"correlation_id": "D&DFF7",
"frequency": 180,
"request_type": "product_page"
}
HTTP status code 400
HTTP status code 403
HTTP status code 404
HTTP status code 500
Get a subscription
Unsubscribe from a subscription
Update a subscription
get /subscription/{subscription_id}
Get a subscription
URI Parameters
- subscription_id: required(string)
HTTP status code 200
Body
Media type: application/json
Type: object
Properties- date_created: required(string)
- date_updated: required(string)
- date_archived: required(string)
- correlation_id: required(string)
- id: required(integer)
- instruction_id: required(integer)
- namespace: required(string)
- requires_cart: required(boolean)
- requires_inventory: required(boolean)
- request_url: required(string)
- routing_key: required(string)
- subscriber_id: required(integer)
- request_type: required(string)
HTTP status code 400
HTTP status code 403
HTTP status code 404
HTTP status code 500
delete /subscription/{subscription_id}
Unsubscribe from a subscription
patch /subscription/{subscription_id}
Update a subscription
URI Parameters
- subscription_id: required(string)
Body
Media type: application/json
Type: object
Properties- correlation_id: (string)
Example:
1337BRO
- requires_cart: (boolean)
Example:
true
- requires_inventory: (boolean)
Example:
true
- routing_key: (string)
Example:
results.#
- params: (object)
An optional set of key value pairs representing SyphonX params for the subscription
Example:
{ "key1": "value1", "key2": 2 }
- request_type: (string)
An optional string used to set the type of crawl for the subscription
Example:
{
"requires_cart": true,
"request_type": "product_page",
"params": {
"seller_id": 862411,
"zip_code": "94066",
"store_number": 1019
}
}
POST a subscriptionID to add instructionID to IPU set in cache
/configs
Create a new domain navigation config
Get the config for a particular domain
post /configs/domain/{domain_id}/navigation
Create a new domain navigation config
URI Parameters
- domain_id: required(string)
Body
Media type: application/json
Type: object
Properties- engine: required(string)
Example:
curl
- cart_engine: required(string)
Example:
puppetteer
Example:
{
"engine": "curl"
}
get /configs/domain/{domain_id}/navigation
Get the config for a particular domain
URI Parameters
- domain_id: required(string)
Delete a navigation config from a domain's configuration
delete /configs/domain/{domain_id}/navigation/{config_id}
Delete a navigation config from a domain's configuration
URI Parameters
- domain_id: required(string)
- config_id: required(string)
Create a new scrape config for a domain
Get the scrape config for a domain
post /configs/domain/{domain_id}/scrape
Create a new scrape config for a domain
URI Parameters
- domain_id: required(string)
Body
Media type: application/json
Type: object
Properties- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
get /configs/domain/{domain_id}/scrape
Create scrape configs in bulk
post /configs/domain/{domain_id}/scrape/batch
Create scrape configs in bulk
URI Parameters
- domain_id: required(string)
Body
Media type: application/json
Type: array of object
Items: NewScrapeConfigRequest
- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
Delete a scrape config from a domain's configuration
delete /configs/domain/{domain_id}/scrape/{config_id}
Delete a scrape config from a domain's configuration
Create a new scrape config for a URL
post /configs/url/{url_id}/scrape
Create a new scrape config for a URL
URI Parameters
- url_id: required(string)
Body
Media type: application/json
Type: object
Properties- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
Create scrape configs in bulk
post /configs/url/{url_id}/scrape/batch
Create scrape configs in bulk
URI Parameters
- url_id: required(string)
Body
Media type: application/json
Type: object
Properties- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
HTTP status code 200
Body
Media type: application/json
Type: object
Properties- errors: required(boolean)
- succeeded: required(array of myLib.CreatedScrapeConfigResponse)
Items: CreatedScrapeConfigResponse
- id: required(integer)
- field_name: required(string)
- selector: required(string)
- failed: required(array of myLib.NewScrapeConfigFailure)
Items: NewScrapeConfigFailure
- error: required(string)
- request: required(object)
- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
- created_at: (string)
- archived: required(array of myLib.NewScrapeConfigRequest)
Items: NewScrapeConfigRequest
- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
- created_at: (string)
HTTP status code 400
HTTP status code 403
HTTP status code 404
HTTP status code 500
Delete a scrape config from a URL's configuration
delete /configs/url/{url_id}/scrape/{config_id}
Delete a scrape config from a URL's configuration
For each subscription, adds scrape configs to the URL associated to the subscription. Meaning that these scrape configs would be for a particular URL but not for each URL on that domain, and example of this would be PROWL's pinning feature.
post /configs/subscription/scrape/batch
For each subscription, adds scrape configs to the URL associated to the subscription. Meaning that these scrape configs would be for a particular URL but not for each URL on that domain, and example of this would be PROWL's pinning feature.
Body
Media type: application/json
Type: object
Properties- subscription_id: required(integer)
Example:
1892
- scrape_configs: required(array of myLib.NewScrapeConfigRequest)
Items: NewScrapeConfigRequest
- created_at: (string)
Example:
TODO
- deleted_at: (string)
Example:
TODO
- field_name: required(string)
TODO
- selector: required(string)
TODO
- priority: required(integer)
TODO
Example:
[object Object]
- created_at: (string)
Sets up a multi-nav subscription's specific nav config
Sets up a multi-nav subscription's specific scrape config
/lookup
Lookup internal values for a given name string
get /lookup
/debug
Send debug request to launch.* exchange in RabbitMQ
post /debug/request
Send debug request to launch.* exchange in RabbitMQ
Body
Media type: application/json
Type: object
/instant
Sends a request directly to the queue for crawling using the configs attached in the message
post /instant/raw
Sends a request directly to the queue for crawling using the configs attached in the message
Body
Media type: application/json
Type: object
/engines
Get the list of available crawling_engines