top of page
scraping_header.jpg

DATA SCRAPING

Data has become a precious resource for all types of projects; but especially when it comes to machine learning and artificial intelligence. One of the primary issues technologists come up against is how to get good data to utilize in their projects. This is where scraping comes in. The internet has an enormous amount of public data (both in structured and unstructured formats) that can be tapped into.

In a recent project, Pending Spark pulled together huge amounts of public data about video games which was used in a recommendation and search engine. The project started by harvesting data from the google play store, steam, twitch, and a variety of other data sources and stored their revision history in an AWS DynamoDb NoSql document store. From there, it was parsed and structured into PostgreSQL and a Neo4j graph database for understanding game relationships. The data was also indexed with AWS ElasticSearch for fast, scalable querying by end users.

If you have a project in mind, but are unsure how to harness the data, send us a message and we'll sort it out.

bottom of page