darzar - automatic data extraction

Mine the data around you with darzar. You don’t need any manual work for this, our AI will adapt.

darzar will automatically extract data from databases, files, websites, emails and even images. It will find items like: products, services, company data, messages, conversations, forums, events, news, articles, books, movies, recommandations.

Benefits

  • Automatic: no manual work, no training on updated sites
  • Immediate: no specialized training delays, no manual tuning
  • Simple: start from the website address
  • Volumes: fast, accurate, big volumes
  • Insights via statistics, automatic catalogs, hierarchies, taxonomies
  • You have a knowledge treasure ready to be mined: ready for machine learning, dashboards, BI
  • GDPR preparation by finding sensitive personal data

Features

Automatic Bots

  • The automatic, adaptable Artificial Intelligence Bots will deliver faster, be more accurate and replace error prone, costly human resources

Extract Everything

  • Extract data valuable for your business: companies, images, discussions, opening hours, articles, products, services, company data, messages, conversations, forums, events, news, articles, books, movies, recommandations

Data extraction

  • Structured: databases, nosql databases (graph, document, mongodb, cassandra, redis)
  • Semi-structured: mobile apps, json, csv, xsl, excel, xml, exports, metadata, microdata, open graph, twitter cards
  • Unstructured Text Data: emails, docs(pdf,pdfa,word), websites, files, text, markdown, micro-languages
  • Fully Unstructured: scan(jpg, png, tiff, pdf), mobile scan - extracted with OCR, multi-language text recognition, barcode, matrix code, classification

Phases

  • Seeding - data source discovery
  • Crawl - exhaustive mining of raw data
  • Scrap - data extraction from downloaded raw data
  • Correction - crop, smart crop, filtering, rescanning
  • Transform - consolidate, standardize, cleanse, reconcile data
  • Import Sources - files (including Dropbox, Google Drive), synology, ftp, links, emails, eCommerce shops, websites
  • Search - labels, metadata, filters, automatic classification
  • Store - Backup, History, Versioning, Time series
  • Structure - clustering, automatic catalogs, hierarchies, taxonomies, websites specifics, updates, changes, structure, comparisons, history
  • Consume - dashboards, BI, export as structured data

Demo

  • Dashboard to explore discovered data at:  www.darzar.com

Technology Stack

  • BigData: spark
  • Scala, Akka, Playframework, Java, Bootstrap4, Responsive

Request a free quote for your project