API Documentation

Last updated:

Adopt Admin users can create API keys in order to use the API to push data from your Adopt accounts to another source.

Navigate to Settings > Integrations. If the Integrations tab is unavailable, the API isn't enabled for your account. Contact your Pendo Adopt provider to inquire about API access.

Note: If you sign into your account from somewhere other than adopt.pendo.io you must update the endpoint URLs to match the custom hostname.

Aggregation API documentation

Aggregations are a query language for accessing your data. They take sources of data and apply operators to do computations, and are written in JSON. 

You can reach the Aggregation API can through the following endpoint:

https://adopt.pendo.io/api/v1/aggregation

The aggregations endpoint isn't intended to be a bulk export feature, so you might need to break up aggregations by different time ranges if you experience sizing or timeout limits.

Data querying is performed using a flexible aggregation pipeline modeled after MongoDB. Each step of the pipeline takes rows as its input and outputs zero or more rows. The basic unit is a row, which contains named fields that are possibly nested. At a high level, an aggregation consists of:

  • A row source and time specification.
  • A pipeline for row processing.

All input data rows for the specified time period are fed into the aggregation pipeline. The rows that those pipelines yield are the result of the aggregation. A pipeline can result in no rows or in many rows depending on the pipeline and the input.

If the time specification expands into twenty-four separate time periods, one for each hour of a day, then twenty-four independent pipelines are run, and the aggregation will have twenty-four sets of results. Each of those results are independent of the others.

Nested Fields

Wherever a field name is needed, a dotted list of field names can be used to specify nested objects. Nested specifiers can be used to create new fields as well. Any missing intermediate objects will be initialized.

In the following example, the field name.last has the value bunny.

{ "name": { "first": "bugs", "last": "bunny" } }

Data Stores

Data Stores  
Visitors General information on all visitors ever seen.
Events Each interaction with your application by a visitor. This includes clicks, page loads, metadata, and guide events.

 

Expressions

A number of pipeline operators support expressions. Expressions follow basic C expression syntax along with function calls. Supported operators include: ==!=&&||<><=>=+-/*%, and !.

All numeric value are converted to floats for computational purposes, and C order of operation is used.

Parenthesis () may be used for grouping expressions.

The comparison operators can be used to compare strings, the logical operators and comparison work on boolean values, and the equality operators can compare two lists.

For the || and && operators, null is treated equivalently to false.

The == and != operators treat null as a distinct value (so null == null but null != false) and any other operator that has null as a parameter returns null as its value (so 1 + null == null). Looking up a field that doesn't exist in a row results in null.

Values may be extracted from list by using normal JavaScript style indexes such as listField[index]. Sublists can be created using JavaScript style slices such as listField[start:end]. Additionally, negative indexes can be used to look up items from the end. listField[-2] will return the next to last item in the list. This follows Python style syntax.

Arrays of values are supported, such as ["a","b","c"] or [1,1+2,sum(a)]. All array elements must be of the same type. You may select an element from the array using normal Javascript style indexes (i.e. ["a","b","c"][2]) as defined in the preceding paragraph.

Expression primitives are numbers (with or without a decimal point), strings enclosed in double quotation marks, truefalse, and field names (using dotted notation to specify nested fields). The currently supported functions are listed below.

Functions  
ceil(val) returns the ceiling of numeric value val 
null if val == null 
error if val is non-numeric/non-null
contains(listValue, item) returns true if item is in the specified list
contains(haystackString, needleString) returns true if the needle is contained in the haystack (both as strings)
floor(val) returns the floor of numeric value val 
null if val == null 
error if val is non-numeric/non-null
formatTime(goFormatString, timeInMillis) formats the given time using the go-style go format string examples.
hash(string) returns a 32 bit hash value for the given string
if(testExpression, ifTrue, [ifFalse]) returns the ifTrue value if testExpression is true 
returns the ifFalse value if testExpression is false
isNull(field) returns true if field is null or does not exist 
returns false otherwise
len(listField) returns the length of the specified list
list(field1, field2, field3) returns a list containing the named fields
listInRange(listField, minValue, maxValue) returns a list containing the elements in listField which are between minValue and maxValue (inclusive)
map() explained in greater detail below
now() returns the current time in milliseconds past the epoch
reverse(list) returns the specified list, reversed
round(val) returns the the numeric value val rounded to the nearest integer 
returns null if val == null 
returns error if val is non-numeric/non-null
split(s, sep) splits the string s into a list of strings using string sep as the field separator
startOfPeriod(periodName, timeStamp) returns the first timestamp in the period which includes the passed timestamp 
period can be one of "hourly", "daily", "weekly", or "monthly"
startsWith(haystack, needle) returns true if the string haystack begins with the string needle 
returns false if haystack is nil
sum(listValue) returns the sum of the items in the specified list, ignoring non-numeric elements

 

The map() function

The map() function comes in two variants, which take either two or three parameters. The simpler form is map(listField, expression), and it applies the given expression to each item in the specified list, and returns a list of the results. The listField must specify a list of objects. This means a list of integers won't work.

Two parameter function

map(users, users.name)

When applied to the row:

{ 
"users": [
{ "id": 5, "name": "mickey" },
{ "id": "2", "name": "minnie" }
]
}

Example response:

["mickey", "minnie"]

Three parameter function

map(x, names, len(x))

When applied to the row:

{ "names": ["bugs", "daffy"] }

Example response:

[ 4, 5 ]

The three parameter form of map() is more general and works on lists of any type. While it works on any type, it processes lists of objects slower than the two parameter form.

The first parameter is an identifier which the third parameter (the expression) can use to reference the particular value being mapped.

Both versions will return null if the list parameter is null.

Source specification

Definition

{ 
"source": {
"sourceType": {SOURCE PARAMS}
},
"timeSeries": {
"first": timeInMsOrExpression,
"count": numberOfPeriods,
"period": periodName
}
}

The source key must be specified as the first step in the pipeline unless spawn is used, then no source can exist. This tells the aggregation what data to feed into the pipeline.

Details  
sourceType Source of data for the aggregation.

 

Row source specification

The short answer to “what can we access with the API” is: all the major entities, day/hour summaries for page usage, and event level data for guide activity and polls.

Row sources are like tables. When you return row source data, you can dissect it in a number of ways.

The sourceType specifies what kind of data is being aggregated. Each type has specific parameters. If no parameters are provided, the empty {} can be replaced with null.

Row sources specify what types of rows should go into the pipeline, and are normally parameterized. The time period for the rows is not specified here, and is part of the time series specification instead.

Sources  
events Returns summary data for all events on the system in a time period. Accepts an event class or list of event classes { "eventClass" : "web"|"ios"|["web","ios"] } to return, "web" by default.
guideEvents Returns all guide events (guideSeen/dismissed/advanced) for requested time period. Can be limited by guideId or guideId and guideStepId.
guidesSeen Returns firstSeenAtlastSeenAtlastState, and seenCount for each visitor that saw the requested guide(s) in a time period. Parameter must be { "guideId" : guideId, "guideStepId" : guideStepId }; leaving out guide (blank/null/unsupplied) shows rows for all steps on all guides, and leaving out step shows all steps on the requested guide.
pageEvents Covers all pages unless pageId is provided to narrow the results to a single pageId. Returns results for requested time period. Accepts an event class or list of event classes { "eventClass" : "web"|"ios"|["web","ios"] } to return, "web" by default.
pollEvents Returns all poll responses for requested time period. Can be limited by guideId or guideId and pollId.
pollsSeen Returns poll responses in a time period. If visitor has multiple responses to the same poll, only the most recent is returned. Parameter must be { "guideId" : guideId, "pollId" : pollId }
visitors All visitors with their metadata. Parameter may be { "identified" : true } to eliminate anonymous visitors

 

Time Series

Example time series for July 10, 2015 through July 12, 2015:

{ 
"timeSeries": {
"period": "dayRange",
"first": 1436553419000,
"count": 3
}
}

Example time series for three hours ago to (and including) the current hour:

{ 
"timeSeries": {
"period": "hourRange",
"first": "now() - 3 * 60601000",
"last": "now()"
}
}

The timeSeries consists of the length of each time period and a timestamp for the first period in milliseconds since the epoch. If first is a string it will be evaluated as an expression to derive a timestamp.

It also requires either a count of the number of periods to include in the aggregation or an expression which evaluates to the last time period to include in the aggregation.

Details  
period dayRange or hourRange
first The timestamp in milliseconds after the epoch for the start of period.
count Number of periods to include in the aggregation

 

Full time periods are always evaluated. Even if the first time does not coincide with the beginning of a period, all of the data for the period which contains time first will be used.

For the dayRange period, the day aligns with the time zone for your account. This does mean that there will occasionally by dayRange periods which include 23 or 25 hours of data because of the switch to and from daylight savings time. If this is undesirable, hourRange may be used, though there will be a significant performance impact when large amounts of data are being evaluated.

Events, grouped

The following row sources require a date range (relative or absolute) and period (day). They return a row of data for each unique combination of day/hour, visitorId, accountId, server name, and IP address.

Attributes  
events All recorded click and pageview events (tagged or untagged)
pageEvents All recorded pageviews matching tagged pages

 

The events row source

Definition:

{
    "source": {
        "events": null,
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"events":null,"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}'

Example response:

{
    "results": [
        {
            "accountId": "Kappa Trust",
            "visitorId": "59",
            "numEvents": 12,
            "numMinutes": 1,
            "server": "www.some-domain.com",
            "remoteIp": "59.89.251.103",
            "parameters": null,
            "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
            "day": 1506830400000,
            "pageId": "allevents"
        }
    ],
    "startTime": 1504324800000
}
  • Returns a row for each unique combination of day/hour, visitorId, accountId, server name, IP address, and user agent.
  • Encompasses all events, whether tagged or untagged (note the pageId “allevents”)
  • Useful for measuring total time, days active, etc.
  • Optionally pass an eventClass of web or iOS to specify which event bucket to return.

The pageEvents row source

Definition:

{
    "source": {
        "pageEvents": {
            "pageId": "{pageId}"
        },
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pageEvents":{"pageId":"aJ0KOADQZVL5h9zQhiQ0q26PNTM"},"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}'

Example response:

{
    "results": [
        {
            "accountId": "Acme Corp",
            "visitorId": "10",
            "numEvents": 1,
            "numMinutes": 1,
            "server": "www.some-domain.com",
            "remoteIp": "110.227.245.175",
            "parameters": null,
            "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",
            "day": 1506484800000,
            "pageId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM"
        }
    ]
}
  • Includes events which can be matched to tagged page rules
  • Returns a row for each unique combination of page ID, day/hour, visitorId, accountId, server name, IP address, and user agent.
  • Accepted parameters are null for all page data or pageId:PAGE_ID for a specific page.

Events, ungrouped

For some events, we expose ungrouped events. These are visible for a specified time period.

Sources  
guideEvents Interactions your visitors have with guides
guidesSeen Summary of when visitors see guides
guidesSeenEver Summary of when visitors see guides
pollEvents Interactions your visitors have with polls
pollsSeen Summary of when and how visitors respond to polls
pollsSeenEver Summary of when and how visitors respond to polls

 

The guideEvents row source

Definition:

{
    "source": {
        "guideEvents": null,
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guideEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'

Example response:

    { "accountIds": [ "Acme" ], "browserTime": 1462796392383, "country": "US", "type": "guideAdvanced", "guideId": "He3BLNt44i0xwO5AEJVIijEswro", "guideStepId": "4TaMV6TLY1JmKIFkprqOcQllU8k", "latitude": 35.823483, "loadTime": 0, "longitude": -78.825562, "region": "nc", "remoteIp": "209.136.209.34", "serverName": "your.app.com", "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.5.17 (KHTML, like Gecko) Version/9.1 Safari/601.5.17", "visitorId": "johndoe@acme.com", "accountId": "Acme" }
  • Returns one event row for every guide event that occurs. This is not a summary or bucket structure like previous examples.
  • Type field values include guideAdvancedguideSeen, and guideDismissed
  • Accepted parameters include guideId and guideStepId to drill down into a specific guide or step.
  • pollEvents behaves similiarly and provides one row for every poll response that occurs. You can specify pollId as a parameter to that source.

The guideSeen row source

Definition:

{
    "source": {
        "guidesSeen": {
            "guideId": "{guideId}",
            "guideStepId": "{guideStepId}"
        },
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guidesSeen":{"guideId":"nWtgylnMqWeB4DynO5duF9ako-M","guideStepId":"mHO55y6WJvJ0CnsmUgAtiGtgobA"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'

Example response:

{
    "firstSeenAt": 1489673759222,
    "guideId": "nWtgylnMqWeB4DynO5duF9ako-M",
    "guideStepId": "mHO55y6WJvJ0CnsmUgAtiGtgobA",
    "lastAdvancedAutoAt": 0,
    "lastDismissedAutoAt": 0,
    "lastSeenAt": 1489674347693,
    "lastState": "advanced",
    "seenCount": 5,
    "visitorId": "avery"
}
  • Returns a row for each unique combination of Visitor ID, guide ID, and step ID
  • You can additionally supply guideId and guideStepId to drill into a specific guide or step.
  • The guidesSeenEver row source can also be used for the same output, but is time insensitive and does not require a timeSeries.

The pollEvents row source

Definition:

{
    "source": {
        "pollEvents": null,
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'

Example response:

{
    "accountIds": [
        "Acme"
    ],
    "browserTime": 1462796392383,
    "country": "US",
    "type": "pollResponse",
    "guideId": "He3BLNt44i0xwO5A201IijEswro",
    "latitude": 35.823483,
    "loadTime": 0,
    "longitude": -78.825562,
    "pollId": "z7y94y62v3c",
    "region": "nc",
    "remoteIp": "209.136.209.34",
    "serverName": "your.app.com",
    "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
    "visitorId": "user@acme.com",
    "accountId": "Acme",
    "pollResponse": "You guys rock!"
}
  • Returns one row for every poll response that occurs.
  • Accepted parameters include guideId and pollId to drill down into a specific poll.

The pollsSeen row source

Definition:

{
    "source": {
        "pollsSeen": {
            "guideId": "{guideId}",
            "pollId": "{pollId}"
        },
        "timeSeries": {
            "first": "{timeInMsOrExpression}",
            "count": "{numberOfPeriods}",
            "period": "{periodName}"
        }
    }
}

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollsSeen":{"guideId":"1wz09ZEAFp-HVl78Iw8oxpwbRZA","pollId":"3isny1qp7t2"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'

Example response:

{
    "accountId": "1",
    "guideId": "1wz09ZEAFp-HVl78Iw8oxpwbRZA",
    "pollId": "3isny1qp7t2",
    "response": 0,
    "time": 1500305598629,
    "visitorId": "1"
}
  • Returns poll responses in a time period
  • Only the most recent poll response for a visitor is displayed i.e. if a visitor has responded to the same poll multiple times, pollsSeen will only return most recent response. To return all responses, use pollEvents.
  • You must supply guideId and pollId as parameters to drill into specific poll events
  • The pollsSeenEver row source can also be used for the same output, but is time insensitive and does not require a timeSeries.

Entities

Entities are the nouns of Aggregations.

Entities  
guides In-app messages you show your visitors
pages Sets of rules that identify individual pages in your product (defined by a URL rule)
visitors Your product’s users

The guides row source

Definition:

{ "source": { "guides": null } }

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guides":null}}]}}'

Example response:

{
    "attributes": { ... },
    "audience": [ ... ],
    "createdAt": 1440449676899,
    "createdByUser": {
        "first": "Someone",
        "id": "5269565705551872",
        "last": "Lastname",
        "role": 8,
        "username": "someone@yourcompany.com"
    },
    "expiresAfter": 1440472740000,
    "id": "BF_p40WvYWC2o81pvsdt1kjcIkI",
    "isMultiStep": false,
    "lastUpdatedAt": 1460734696429,
    "lastUpdatedByUser": {
        "first": "",
        "id": "4985730428305408",
        "last": "",
        "role": 8,
        "username": "someone@yourcompany.com"
    },
    "launchMethod": "auto",
    "name": "2015-08-24 - Maintenance tonight.",
    "state": "disabled",
    "stateHistory": [ ... ],
    "steps": [
        {
            "advanceMethod": "button",
            "attributes": {
                "height": 395,
                "width": 440
            },
            "content": "",
            "createdByUser": {
                "first": "Someone",
                "id": "5269565705551872",
                "last": "Lastname",
                "role": 8,
                "username": "someone@yourcompany.com"
            },
            "elementPathRule": "",
            "guideId": "BF_p40WvYWC2o81pvsdt1kjcIkI",
            "id": "Itfe6aDRNtgx9J1XisQwaCxDS-A",
            "lastUpdatedAt": 1460734696429,
            "lastUpdatedByUser": {
                "first": "",
                "id": "4985730428305408",
                "last": "",
                "role": 8,
                "username": "someone@yourcompany.com"
            },
            "name": "",
            "rank": 5000000,
            "resetAt": 0,
            "type": "lightbox"
        }
    ]
}

steps include a single item for lightboxes, tooltips, banners, etc. and multiple items for walkthroughs.

The pages row source

Definition:

{ "source": { "pages": null } }

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pages":null}}]}}'

Example response:

{
    "results": [
        {
            "color": "gray",
            "createdAt": 1505938241596,
            "createdByUser": {
                "first": "Adam",
                "id": "5197072786522112",
                "last": "Lohner",
                "role": 8,
                "username": "somePerson@email.com"
            },
            "dirty": false,
            "group": {
                "color": ".groupColor01",
                "createdAt": 1505937447295,
                "createdByUser": {
                    "first": "",
                    "id": "",
                    "last": "",
                    "role": 0,
                    "username": ""
                },
                "description": "",
                "id": "DEFAULTGROUP_01_",
                "items": [

                ],
                "lastUpdatedAt": 1505938327653,
                "lastUpdatedByUser": {
                    "first": "Adam",
                    "id": "5197072786522112",
                    "last": "Lohner",
                    "role": 8,
                    "username": "somePerson@email.com"
                },
                "length": 6,
                "name": "Dashboard"
            },
            "id": "aJ0KOADQZVL5h9zQhiQ0q26PNTM",
            "kind": "Page",
            "lastUpdatedAt": 1505938241596,
            "lastUpdatedByUser": {
                "first": "Adam",
                "id": "5197072786522112",
                "last": "Lohner",
                "role": 8,
                "username": "somePerson@email.com"
            },
            "name": "Dashboard",
            "rootVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM",
            "rules": [
                {
                    "designerHint": "http://some-domain.com/",
                    "parsedRule": "^https?://[^/]/?(?:;[^#])?(?:\?[^#])?(?:#.)?$",
                    "rule": "//*"
                }
            ],
            "rulesjson": "",
            "stableVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM-20170920201041.596687476",
            "validThrough": 1506981600000
        }
    ]
}

Tagged pages. Returns one row for each tagged page.

 

Visitors

Definition:

{ "source": { "visitors": null } }

Example request:

curl -X POST 
https://adopt.pendo.io/api/v1/aggregation 
-H 'content-type: application/json' 
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"visitors":null}}]}}'

Example response:

{
    "results": [
        {
            "metadata": {
                "agent": {
                    "email": "ac@Cras.co.uk",
                    "ipaddress": "184.169.45.4",
                    "language": "en_US"
                },
                "auto": {
                    "accountid": "Epsilon united",
                    "accountids": [
                        "Epsilon united"
                    ],
                    "firstvisit": 1506965912312,
                    "id": "84",
                    "idhash": 2012371633,
                    "lastbrowsername": "Chrome",
                    "lastbrowserversion": "61.0.3163",
                    "lastoperatingsystem": "Mac OS X",
                    "lastservername": "some-domain.com",
                    "lastupdated": 1506976690418,
                    "lastuseragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
                    "lastvisit": 1506976690112,
                    "nid": 84
                }
            },
            "visitorId": "84"
        }
    ]
}

The visitors source returns one row per visitor ever seen and contains various metadata about the visitor. Each visitor has a unique visitorId.

Accepted parameters are null or {"identified":true} to eliminate anonymous visitors.

Metadata  
auto data automatically calculated
custom custom field data
salesforce data synced from Salesforce
segment data synced from Segment

 

Pipeline

Example sytnax:

{ "cat": null } 

Example sytnax:

{ "field": "firstName", "operator": "==", "value": "John" }

Each step is of the form { "name" : parameterObject }.

The source rows are transformed into result rows by the pipeline; the final rows are the result of the aggregation step. Pipelines can be arbitrarily long, and are represented as an ordered list of individual steps.

Operators

Operators  
cat Takes no parameters, and returns each row without changing it.
count Counts the number of input rows and returns a single row of the form { "count" : 456 }
accumulate Keeps a running total of more than one field, and includes the current value of the total in each row. Parameters are of the forum { "totalField1" : "sourceField1" }
eval Inserts new fields based on one or more expressions. Format is an { "fieldName1" : "expression" }
identified Eliminates rows whose named field looks like an anonymous visitorId { "identified" : "visitorId" } where visitorId is the field name to check.
limit Returns the first N rows and discards the rest { "limit" : 5 }
select Builds a new row (while discarding the old one) containing one or more fields. Format is an { "fieldName1" : "expression" }
set Sets fields to constant values { "set" : { "field1" : 5, "field2.nested" : 20 } }
sort Sorts the rows by the specified field list { "sort" : [ "fieldName", "anotherFieldName" ] } Prepend a - to the front of a field name to reverse sort by that field.
unmarshal Expands an embedded JSON string into sub-objects.
useragent Expands a user agent string into a parsed set of strings.

 

The spawn operation

Definition:

{
    "spawn": [
        [
            {
                "source": {
                    "events": null,
                    "timeSeries ": {
                        "period": "dayRange",
                        "first": 1404745200000,
                        "count": 7
                    }
                }
            },
            {
                "cat": null
            }
        ],
        [
            {
                "source": {
                    "events": null,
                    "timeSeries ": {
                        "period": "month",
                        "first": 1404745200000,
                        "count": 7
                    }
                }
            },
            {
                "count": null
            }
        ]
    ]
}

The spawn operator behaves like fork, but each nested pipeline must specify its own source. This allows multiple sources to be used to create a single stream of rows which are processed by the remainder of the pipe.

Only sources which cause a single pipeline to be instantiated may be used; this means dayRangehourRange, or any source whose count is 1 is available.

The timeSeries for the source is part of the source specification for the spawn pipeline.

The switch operation

Definition:

{
    "switch": {
        "newField": {
            "existingField": [
                {
                    "value": "small",
                    "<": 10
                },
                {
                    "value": "medium",
                    ">=": 10,
                    "<": 20
                },
                {
                    "value": "large",
                    ">=": 20,
                    "<": 50
                }
            ]
        }
    }
}

The switch operation allows a field to checked against any number of conditions, and have a new field set in the row based on which condition the original field matches.

For each row the existingField will be checked to see which set of conditions its value matches.

It must match all of the conditions for newField to be set to the value indicated. If no conditions match, or if existingField is not set at all, the row is skipped.

For the above input, the row { "existingField" : 15 } will become { "existingField" : 15, "newField" : "medium" }

The filter operation

Definition:

{ "filter": "fieldName == 5 || len(anotherField) > 3" }

The filter pipeline operator evaluates an expression and eliminates rows that for which the expression is false.

The fork operation

The following calculates the total time for each accountId and visitorId:

{
    "fork": [
        [
            {
                "group": {
                    "group": [
                        "visitorId"
                    ],
                    "fields": [
                        {
                            "time": {
                                "sum": "numMinutes"
                            }
                        }
                    ]
                }
            }
        ],
        [
            {
                "group": {
                    "group": [
                        "accountId"
                    ],
                    "fields": [
                        {
                            "time": {
                                "sum": "numMinutes"
                            }
                        }
                    ]
                }
            }
        ]
    ]
}

The fork operator allows the pipeline to be split into multiple pipelines, with each of the child pipelines receiving all of the rows for that step.

Note that the same row is sent to all of the child pipelines; it is not a copy of the row and any changes to the row made in one pipeline will be visible in all pipelines.

The rows which result from each pipeline are all sent through the remainder of the parent pipeline. Forks may be nested, however this has not been well-tested (aka - no promises).

Empty pipelines (which would pass through the rows without modifying them) are not permitted, but a pipeline with a single cat operation performs identically.

The join operation

Definition:

{ "join": { "fields": [ "firstName", "lastNme" ], "width": 3 } }

The join operator merges multiple rows together based on one or more keys. If there are field conflicts, the values from rows encountered later in the join will win. The optional widthargument allows the user to tell join how many rows are expected to make up the final result. Once that many rows with the same join key have been seen, the merged row is passed to the next operation in the pipeline. If more rows are later seen with that join key, they are treated as a new join operation. Any join keys which have had less than width rows are processed once all input rows have been seen.

The group operation

Example request to calculate how much time each identified visitor has spent on the site over 30 days:

{
    "requestId": "last30daysVisitorTime",
    "timeSeries": {
        "period": "dayRange",
        "count": 30,
        "first": 1404745200000
    },
    "source": {
        "events": null
    },
    "pipeline": [
        {
            "identified": "visitorId"
        },
        {
            "group": {
                "group": [
                    "visitorId"
                ],
                "fields": [
                    {
                        "minutes": {
                            "sum": "numMinutes"
                        }
                    }
                ]
            }
        }
    ]
}

NOTE: The count value of 180 has to be a long enough period to cover the entire time the guide has been in existence.

The group operator supports flexible grouping (similar to GROUP BY in SQL) and simple aggregations of fields. It supports the following parameters:

Parameters  
group List of fields whose values match in each grouping
fields Set of fields whose values are aggregated across the group
stringSort Result rows will be sorted by the group key. The group key is treated as a string for sort purposes (so "2" > "10"). This provides a stable ordering for the result.

 

Each resulting row consists of all of the fields specified by both the group and fields specification.

Aggregation fields consist of field name to use in the output row, an aggregation name, and a field name from the input row. For example, to count the rows in a group use { "outputField" : { "count" : "inputField" } } will count the number of rows which match (here the field name must be specified, but is irrelevant). The following row aggregators are supported:

Aggregators  
count Counts the number of rows in the group. If a field name is it counts the unique occurrences of that field.
listAverage Average the individual values in a list arguement and return a list of those averages.
max Returns the largest value of the specified (numeric) field.
min Returns the smallest value of the specified (numeric) field.
sum Adds up the values of the specified field.
first Returns the first value of the specified field (note: "first" is only defined as deterministically as the input order is).

 

Count the number of unique visitorIds use the following pipeline:

[
    {
        "group": {
            "group": [
                "visitorId"
            ]
        }
    },
    {
        "visitorCount": {
            "count": null
        }
    }
]

To get the number of events for each visitorIduse:

[
    {
        "group": {
            "group": [
                "visitorId"
            ],
            "fields": {
                "eventCount": {
                    "count": "event"
                }
            }
        }
    }
]

The reduce operation

Count the number of rows and add up the minutes fields in the rows:

[
    {
        "reduce": {
            "eventCount": {
                "count": null
            },
            "totalMinutes": {
                "sum": "numMinutes"
            }
        }
    }
]

The reduce operator collects all rows into a single row, and performs aggregations on fields. The aggregators and format are identical to the fields parameter for the group pipeline operation.

Example request to calculate the total number of events and the total number of minutes for a thirty day period.

{
    "requestId": "last30DaysEventAndMinuteTotals",
    "timeSeries": {
        "period": "dayRange",
        "count": 30,
        "first": 1404745200000
    },
    "source": {
        "events": null
    },
    "pipeline": [
        {
            "reduce": [
                {
                    "events": {
                        "sum": "numEvents"
                    }
                },
                {
                    "minutes": {
                        "sum": "numMinutes"
                    }
                }
            ]
        }
    ]
}

Count the number of unique visitors in a dataset:

{
    "requestId": "numberOfUniqueVisitors",
    "timeSeries": {
        "period": "dayRange",
        "count": 30,
        "first": 1404745200000
    },
    "source": {
        "events": null
    },
    "pipeline": [
        {
            "reduce": [
                {
                    "reduce": {
                        "visitors": {
                            "count": "visitorId"
                        }
                    }
                }
            ]
        }
    ]
}

Example request using group and reduce to count the number of visitors in a set of events and the total number of minutes used by those visitor:

{
    "requestId": "lastDaysEventAndMinuteTotals",
    "timeSeries": {
        "period": "dayRange",
        "count": 30,
        "first": 1404745200000
    },
    "source": {
        "events": null
    },
    "pipeline": [
        {
            "group": {
                "group": [
                    "visitorId"
                ],
                "fields": {
                    "visitorMinutes": {
                        "sum": "numMinutes"
                    }
                }
            },
            "reduce": {
                "visitorCount": {
                    "count": "numVisitors"
                },
                "totalMinutes": {
                    "sum": "visitorMinutes"
                }
            }
        }
    ]
}

The segment operation

Definition:

{ "segment": { "id": "id" } }

Segments are used to preselect groups of visitors. Preexisting segments filter rows via the segment operator.

An input row is passed if the visitorId in the input row match an item in the segment.

The bulkExpand operation

Expand visitors into a subdocument called info:

{ "bulkExpand": { "info": { "visitor": "visitorId" } } }

The following data types support expand: account and visitor.

The unwind operation

To unwind the following:

{ 
"name": "fred",
"items": [
{ "v": "first" },
{ "v": "second" },
{ "v": "third" }
]
}

Try:

{ 
"unwind": {
"field": "items",
"index": "itemIndex"
}
}

Response would yield:

[{ 
"name": "fred",
"items": { "v": "first" },
"itemIndex": 0
}, {
"name": "fred",
"items": { "v": "second" },
"itemIndex": 1
}, {
"name": "fred",
"items": { "v": "third" },
"itemIndex": 2
}]

The unwind operator expands a list of items into separate rows. All other items are preserved, and an index may optionally be added.

Add "prefix" : true and you will get:

[
{
"name": "fred",
"items": [{ "v": "first" }],
"itemIndex": 0
}, {
"name": "fred",
"items": [{ "v": "first" }, { "v": "second" }],
"itemIndex": 1
}, {
"name": "fred",
"items": [{ "v": "first" }, { "v": "second" }, { "v": "third" }],
"itemIndex": 2
}
]

Alternatively, if everything to the left of each items is important, unwind can preserve those items.

If the list to unwind is empty then no rows will be emitted unless { "keepEmpty" : true } is specified, in which case the original row will be preserved.

The useragent operation

Definition

{ "useragent": { "output": "input" } }

The useragent operation parses the user agent string in the field input and renders it into a object in output.

Output  
name the browser's name (e.g. "MSIE", "Chrome", etc.)
version the version of the browser
os the operating system (e.g. "Windows", "Mac OS X", "GNU/Linux", etc.)
type the type of user agent (e.g. "Browser", "Crawler", etc.)

 

Multi-App

Using the pages source as an example, without querying for a specific app, the source looks like this:

{ "source": { "pages": null } }

To execute Aggregations specific to an app, you'll need to change the source value within the aggregation from null - which is an indicator to return all records for all apps associated with the sub - to include an appId value.

Including the appId scopes the results to the data associated with just that application within the subscription.

For all multi-app Aggregations, use the primary app's Integration API key for the value for x-pendo-integration-key in the Header.

appId

To query on a specific app, source will look like this:

{
    "source": {
        "pages": {
            "appId": 5629499534213120
        }
    }
}