Strata 2014 Retrospective

Introduction

This year I was fortunate enough to be able to attend the Strata conference put on by O’Reilly and Associates.   I started this retrospective by doing a SWOT analysis of my experience.  I then expand with some logistics info and take away research ideas.
I had high expectations for this conference given its marketing.  The last conference of this caliber that I had attended was ƜberConf.

Strengths

  • The chairs in the session rooms were comfortable.
  • Healthy snack options were available at breaks.
  • Sponsored sessions were identified on the agenda.
  • There were lots of vendors at the expo.
  • Most if not all vendors had technical people on hand.
  • Mobile app had maps and schedule.
  • They did provide a daily print out of the session schedule and map.

Weaknesses

  • The mobile app was more frustrating than useful.  It was always wanting to update.
  • Being registered in the directory means you’ll start getting spam before the conference even starts.
  • Navigating the website to find specific session information was difficult.  Easier to find it through Google.
  • This is not an inexpensive conference to attend in terms of conference cost, travel and hotel expense.
  • Full day workshop
    • Late notice on software to pre-install
    • Not enough AC outlets
    • No tables!
Unless they address the logistics issues of the workshop environment I can not recommend attending one.

Opportunities

  • From key notes, sessions and the vendors you get to learn about what tools/processes the future holds.
  • Discern what tools/processes people are using now.
  • Talk with other attendees about the work they are doing and the approaches they are taking to it.
  • Some insights I had are:
    • Many are using Python and Julia for ETL.
    • R is being used for analysis.
    • Data people are starting to think about discuss data patterns.  Such as the Side Kick Pattern presented by Abe Gong (@AbeGong) a Data Scientist from Jawbone.

Threats (or why wouldn’t I want to attend)

  • Fundamentals are potentially better learned with targeted training.
  • Attending this conference could prevent you from attending another more relevant conference.
  • The target audience for this conference is narrow.  People that identify with big data, data science and business intelligence are well served by this conference.
When attending it helps to have specific questions or problems you are looking to solve.  This gives you a good context when choosing sessions and meeting with vendors.  (There are lots of vendors!)

Internet

The conference wireless was acceptable for as many people that were using it.  Internet in the hotel lobby was very good.  There is wired internet available in the hotel room for free.  I did not have an opportunity to use it.  There is also pay to use wireless available in the room.

Research

I came away from the conference with much I want to research and experiment with and people to connect.

Tools and Libraries

Techniques

  • Adjacency Matrix
  • pivot and fold operations
  • hexagonal binning
  • use visualization for data quality checks
  • confusion matrix
  • predictive modeling fundamentals
  • machine learning
  • The work of John Tukey (Statistics)

Speakers

Joe and Jeffrey presented: Data Transformation: Skills of the Agile Data Wrangler
Can we make big data management easier?  Her 3 research threads are: effective, easier and cost effective.

Summary

I came away with a better appreciation of what constitutes data science, the skills needed, the tools utilized and the vendors in the different areas.  If I attend again in the future I would likely skip the workshop day.  I would do additional prep working thinking about specific questions I may have for the technical people that the vendors make available.

Tips for working with the MongoDB aggregation framework

  1. Review the data.
    What are the data types? What is the structure? What do the values mean?
  2. Build out the query one stage at a time.
  3. Start with one pipeline operation and a limit operation at the end. Review that you are getting the results you expect. Keep the limit operation in place as you add operations. Verify expected behavior every step of the way.
  4. Remove the limit operation as the last step .
The online documentation will be very useful:
http://docs.mongodb.org/manual/aggregation/

http://docs.mongodb.org/manual/core/aggregation-pipeline/http://docs.mongodb.org/manual/reference/operator/aggregation/


This reference includes some nice material regarding comparison with standard SQL operations: http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/

Understanding US Zip Codes



In the United States there are States and Provinces.  These States and Provinces have Cities.  The same city name may be used in different states.  For instance, Springfield, MO (Missouri) and Springfield, IL (Illinois).  A single city may be covered by more than one zip code.  A zip code may span states.
The graphic above shows 13 of the 16 Zip codes that are used for Springfield, MO.  For Springfield, IL, different Zip codes are used.  Here 7 of Springfield, IL 36 zip codes are shown.

Reference

http://zipcode.org/ 
http://zipcode.org/city/MO/SPRINGFIELD
http://zipcode.org/city/IL/SPRINGFIELD