I am Marcio, Data Engineer, cloud architect, and co-founder Camina trading and consulting group. Being naturally curious and committed to maintaining cutting edge technical skills, I've spent +8 years working for some of the most prestigious brands, including AWS, Accenture, Bayer and others. Below are examples of open projects that I have worked on. PS: To access the most projects in my portfolio, please visit Camina group website.

Projects

Image

Easy to understand, yet highly efficient statistical analysis, on 'Used cars for sale dataset' from ebay. A large dataset, with over 370.000 sale records through 2016, in the german market. This is a deep cleaning and data mining exercise, going through the steps of resurrecting a seemingly dirty and unreliable dataset, to transforming it into a clean file. And uncovering hidden meaning in each variable. Tools used: #Python

You can find the code here.

Image

An advanced analysis, on one of the most fierce competions, to ever happen on kaggle. This an easy to read an visually appealing analysis on a large and very well anonymized dataset. Tools used: #R

You can find the code here.

Image

Simple script ran on Microsoft's and Bill Gate's foundation dataset regarding Women's gynecological problems in third world countries. The dataset is collection of records made by several doctors on the field. Some of the discoveries are a starke and shocking reminder, of opposite realities (and maybe the reason why certain variables are anonnimized). Also as with any survey, there is lots of erroneous input and missing which forces to find hidden meanings in the dataset in order to fill the gaps. Tools used: #R

You can find the code here.

Image

Cut the informational bullwhip in your supply chain and detect deviations / fraud in your supply chain. This a tool developed in cooperation with my former manager Ronald Vissers. This is a powerfull solution capable of detecting deviations in the logistics sector. It ingests data from multiple sources and compares the data with sequential, automated queries. In order to flag it/correct it when necesary (e.g. comapring payments and different shipments specifications as weight,volume,wegiht/volume, custom duties, so on). Tools used: #MySQL #Access

Image

An upgrade over one of most used tools to link data records, the 'deduper'. This upgraded version does not only allows you to standardize, deduplicate and clean the dataset. It also uses machine learning to discover the reasons why you have wrong input in the first place. Tools used: #Python #Django !Due to the value of this work, a demo is only possible by personnel request.

Image

Simple but efficient tool, to verify VAT numbers. Extracting additional company information and detecting invalid numbers, with just one single click. You can choose between verifying one number or to upload a file with multiple numbers. The validation process is done through the European Union database, to ensure maximum reliability. It also can correct/rectify additional fields in your dataset, by comparing it with the european records. The concept is simple, but the impact is big!. Tools used: #Python #Django. !Demo only by request.

Image

When I switched from R to python, I often came across the issue of finding a good demo for hierarchical clustering mixed data types (text and numerical values) on python. Using the famous 'iris' flowers subspecies dataset, you will be guided of how to construct a beatiful hierarchical clustering, dendogram 'R' style in Python. Tools used: #Python

You can find the code here.

Image

An address verifier, capable of rectifying addresses in entire file. Address parsing/verification has been one of the most tackled topics in logistics. Hint, just google the number of companies that offer partial solutions on this. Tools used: #Python #Django

Let's Socialize

Marcio Fernandes