Extract any concept from any document.
With the SmartLens Document AI API, you can automatically convert unstructured information in your documents into actionable structured content. The SmartLens Text Analysis API is closely related to the Document AI API, with the key difference of accepting your document as a text string instead of a PDF or image. Use the SmartLens Text Analysis API if your documents are already in text form, or if you already have a working OCR solution and would like to extract information from that. The API harnesses our proprietary AI technology to intelligently extract structured information (such as order totals, line items, and addresses) to be easily organized into a database or ERP. The API accepts an HTTP POST request with a JSON body as its input and returns the extracted items along with the raw text.
API response example
{
"predictions": [
{
"extracted_items": {
"merchant_name": "East Repair Inc.",
"due_date": "26/02/2019",
"issue_date": "11/02/2019",
"invoice_id": "us-001",
"total_payment": "$154.06",
"tax": "6.25%",
"subtotal": "145.00",
"billing_address": "1912 Harvest Lane New York, NY 12210",
"shipping_address": "3787 Pingview Drive Cambridge, MA 12210",
"items": [
{
"description": "Front and rear brake cables",
"price": "100.00"
},
{
"description": "New set of pedal arms",
"price": "15.00"
},
{
"description": "Labor",
"price": "3hrs 5.00"
}
],
"billing_recipient_name": "John Smith",
"shipping_recipient_name": "John Smith",
"payment_terms": "Payment is due within 15 days",
"po_number": "2312/2019"
}
}
]
}
The SmartLens Text Analysis API accepts a text string as its input. Like our other APIs, SmartLens Text Analysis accepts one input per request. To analyze multiple documents,
you'll need to make multiple requests. To call the API, make a HTTP POST request with your API key submitted in the Authorization
header. Send your document input using the text
parameter in the request body.
SmartLens Text Analysis will automatically extract many fields by default. For a list, see the Automatic extractions section.
Request body example
{
"text": "MY-DOCUMENT-TEXT"
}
If you have unique documents with highly-customized fields, SmartLens Text Analysis can still extract them with no additional AI training. Simply make a Custom Extraction item and write in natural language the concept you would like the API to extract. Custom Extraction items
are JSON dictionaries; add your description of your custom concept/form field using the natural_language_query
key and your desired response schema using answer_key
.
Let's say, for example, you want to extract the payment terms of a contract. You can create a Custom Extraction item, setting natural_language_query
to a natural language string such as "What are the payment terms?" and
answer_key
to a string such as "payment_terms". Don't worry too much about crafting your natural_language_query
in a particular way;
SmartLens Text Analysis knows language well and should understand what you mean, but try to be as specific as possible.
What you set answer_key
to does not affect the quality of the API results, so set it to the most convenient value for you to parse the response JSON.
You can add up to 20 Custom Extraction items to a single request. We have an example of an API request body with two Custom Extraction items below.
Request body example with Custom Extractions
{
"text": "MY-DOCUMENT-TEXT",
"custom_extractions": [
{
"natural_language_query": "What are the payment terms",
"answer_key": "payment_terms"
},
{
"natural_language_query": "Extract the P.O. number",
"answer_key": "po_number"
}
]
}
Putting the request body and authorization header together, below is a complete HTTP request. As the server expects a standard JSON payload, we must add backslash escape characters around all quotation marks inside the request body.
cURL
curl -X POST "https://api.smartlens.ai/v1/models/analyze-text/predict" \ -H "accept: application/json" \ -H "Authorization: MY-API-KEY" \ -H "Content-Type: application/json" \ -d "{ \"text\": \"East Repair Inc.\n\n1912 Harvest Lane\nNew York, NY 12210\nBILL TO\n\nJohn Smith\n\n2 Court Square\nNew York, NV 12210\n\nSHIP TO\n\nJohn Smith\n\n3787 Pingview Drive\nCambridge, MA 12210\n\nINVOICE # us-001\nINVOICE DATE 11/02/2019\nP.O# 2312/2019\n\nDUE DATE 26/02/2019\nInvoice Total $154.06\nary DESCRIPTION UNIT PRICE AMOUNT\n1 Front and rear brake cables 100.00 100.00\n2 New set of pedal arms 15.00 30.00\n3 Labor 3hrs 5.00 15.00\nSubtotal 145.00\n\nSales Tax 6.25% 2.06\nTERMS & CONDITIONS\nPayment is due within 15 days\nPlease make checks payable to: East Repair Inc.\", \"custom_extractions\": [ { \"natural_language_query\": \"What are the payment terms\", \"answer_key\": \"payment_terms\" }, { \"natural_language_query\": \"Extract the P.O. number\", \"answer_key\": \"po_number\" } ]}"
While you can always call our REST API directly, we've built client libraries to make it even easier to integrate SmartLens into your app. Our Python client library is now available, with many more coming soon. Supply the image payload exactly like you would for a direct HTTP request to the API. See the Making requests section for URL and base64 examples.
Python
import smartlens
smartlens.api_key = "MY_API_KEY"
response = smartlens.runTextAnalysis(
text = "MY-DOCUMENT-AS-TEXT",
customExtractions = [{'natural_language_query': 'What is the salary?', 'answer_key': 'employee_salary'}] # Custom extractions are optional
)
SmartLens Text Analysis extracts many types of concepts by default, and the full list is below. If you have unique documents with highly-customized fields, SmartLens Text Analysis can still extract them with no additional AI training. Please refer to the Making requests section for information on how to make Custom Extraction items.
We've built an API playground to make it even easier to explore our APIs. Check it out at https://api.smartlens.ai
, which is also
the base URL for the SmartLens API.
Please refer to the sidebar menu for API-specific quickstarts and detailed schema information. If you have any questions or feedback, please reach out to us at [email protected]
.