SlideShare a Scribd company logo
1 of 20
9/3/2015
1
What I did over my Summer Vacation
Presented By:-
Ashish Kumar
Newfield Wireless -
TekComms
Overview
9/3/20152
 Introduction
 Geomarketing
 Motivation
 Data
 Analysis
 Curve-Fitting
 Local minima analysis
 Weighted K-means clustering
 Conclusion
Core Hypothesis
9/3/20153
What are we really looking for?
9/3/20154
 Not the home location — First Location of
Persistence (FLOP)
 Does not necessarily correlate to reported home
or billing address
 Challenges:
 Perhaps less RAN usage at FLOP
 Median user may have predictable FLOP time
range, but many user profiles do not conform
 SLOP may have equal time duration and be
indistinguishable
Geomarketing and Using FLOP
9/3/20155
 Device location enables:
1. Sending personalized offers, directions to targeted stores
and venues etc.
2. Discovering mobility patterns
 Finding the persistent location for a user allows:
1. Precise definition of how a venue or retailer fits into the daily
life of consumers
2. Determining competition closest to customer base, not to
retail location
3. Improved means of marketing via stronger, more relevant
demographics rather than push/individually-targeted
advertisements
4. Detection of significant changes in targeted consumers
The Median User
9/3/20156
Mean Location/Hour Distribution
9/3/20157
Mean location distribution per user
per hour
Approaches
9/3/20158
 To algorithmically determine the location which
corresponds to decreased mobility, three methods
were investigated:
 Curve fitting
 Local minima analysis
 Weighted K-means clustering
Curve Fitting
9/3/20159
 Tried to fit a binomial curve, Gaussian
distribution, beta distribution and Rayleigh
distribution
 Utilized chi-square test for goodness of fit
 Poor performance
where
Oi : Observed value
Ei: Expected value
Curve Fitting Pitfalls
9/3/201510
 Does not work for sparse data (very misleading)
 Implies some level of symmetry in movements
 Only works on single 24 hour window
 Need mixture of curves and more computational
power for higher time range
 Can only get FLOP
9/3/201511
• Local minima
analysis results for
the most recent two
days showing time
and location spread
for the persistent
locations
• Matched for 735609
users out of total
1187116
Local Minima
Analysis Results
9/3/201512
• Revisiting previous
plot with smaller bin
size to better
appreciate results
• Median of the FLOPs
returned were within
200m. from each
other
Local Minima
Analysis Results
9/3/201513
Local Minima
Analysis Results
• Local minima
analysis results for all
the 5 days showing
time and location
spread for the
persistent locations
• Matched for 833453
users out of 1187116
users successfully
(73%)
Weighted K-
Means Clustering
9/3/201514
• Two approaches were
taken
• First one shown
here
• Two clusters
formed for data
over two days
• A cluster for each
day and variation
studied
• Matched for
886228 users out
of 1187116 (75%)
9/3/201515
Weighted K-
Means Clustering
• Second approach: For
data over recent most
2 days
• Different weighing and
filtering method
• 3 cluster problem
reduced to 2 clusters
by appropriate
selection of 24-hour
window
• Matched for 847317
users out of 1187116
(71%)
• Measure of typical
time obtained
• Typical duration for
location and its
9/3/201516
Weighted K-
Means Clustering
• Intuition: Better
clustering with larger
data set.
• Attempted to see if
analysis of data over
entire 5 days
produced any better
results.
• Matched for 887682
users out of 1187116
users (75%)
• Better than 2 day
analysis yet poorer
than local minima
analysis
Performance Comparison
9/3/201517
Weighted K-Means Clustering Local Minima Analysis
Can be used to measure typical
time and duration at persistent
location for the user
Doesn’t say anything about typical
time or duration at location for user
SLOP and TLOP can be obtained,
given sufficiently large data
No SLOP or TLOP
Lots of post processing of data
needed
(filtering and weighing of data).
SLOW
Less post processing required.
FAST
High memory consumption due to
weighing method
Low memory requirement
Requires larger data set to work on Works on sparse data set as well
Calibration of returned values of
typical time and duration needed
Confidence measure is hard to
establish here
Conclusion
9/3/201518
 Tested 3 approaches to determine FLOP of users
 Local minima analysis: Very efficient and
reasonably accurate
 2 days of DLS data sufficient to work on
 Use search space 12 hours post/prior maxima
 On average, user generates 100 locations per day
 A location record is typically 85 bytes
 For one user, about 17 kB of data
Future Prospects
9/3/201519
 We may consider “slope-analysis” based
algorithm that can work on-the-fly, without post-
processing
 Using GMMs (Gaussian Mixture Model) or other
statistical modellings
 Other machine learning algorithms
 Picking location for minimum mobility time in local
minima analysis
THANK YOU !!! 
9/3/201520

More Related Content

Similar to mytechtalk

Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextInMobi Technology
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataMRS
 
SC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC SoftwareSC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC Softwareinside-BigData.com
 
Secrets of success in telematics
Secrets of success in telematicsSecrets of success in telematics
Secrets of success in telematicsMark Monroe
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...AzarulIkhwan
 
Cloud computing pricing models
Cloud computing pricing modelsCloud computing pricing models
Cloud computing pricing modelsHadi Fadlallah
 
ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization Agile Testing Alliance
 
That's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successThat's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successGordon Haff
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsAndreas Metzger
 
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...Antonio García-Domínguez
 
Amazon GT master data science challenge 2020 presentation
Amazon GT master data science challenge 2020 presentationAmazon GT master data science challenge 2020 presentation
Amazon GT master data science challenge 2020 presentationFan Wu
 
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...LIFE GreenYourMove
 
Data Centric HPC for Numerical Weather Forecasting
Data Centric HPC for Numerical Weather ForecastingData Centric HPC for Numerical Weather Forecasting
Data Centric HPC for Numerical Weather ForecastingJames Arnold Faeldon
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
The Evolving World of Substation Asset Data
The Evolving World of Substation Asset DataThe Evolving World of Substation Asset Data
The Evolving World of Substation Asset DataPower System Operation
 
A Survey on Batch Auditing Systems for Cloud Storage
A Survey on Batch Auditing Systems for Cloud StorageA Survey on Batch Auditing Systems for Cloud Storage
A Survey on Batch Auditing Systems for Cloud StorageIRJET Journal
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsEric D. Schabell
 
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...Open Data Center Alliance
 
Renew power - ReLead Case Competition
Renew power - ReLead Case CompetitionRenew power - ReLead Case Competition
Renew power - ReLead Case CompetitionArush Sharma
 

Similar to mytechtalk (20)

Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
 
SC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC SoftwareSC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC Software
 
Secrets of success in telematics
Secrets of success in telematicsSecrets of success in telematics
Secrets of success in telematics
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
 
Cloud computing pricing models
Cloud computing pricing modelsCloud computing pricing models
Cloud computing pricing models
 
ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization ATAGTR2017 Batch Workload Modelling and Performance Optimization
ATAGTR2017 Batch Workload Modelling and Performance Optimization
 
That's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native successThat's not a metric! Data for cloud-native success
That's not a metric! Data for cloud-native success
 
machineLearningTypingTool_Rev1
machineLearningTypingTool_Rev1machineLearningTypingTool_Rev1
machineLearningTypingTool_Rev1
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
 
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
 
Amazon GT master data science challenge 2020 presentation
Amazon GT master data science challenge 2020 presentationAmazon GT master data science challenge 2020 presentation
Amazon GT master data science challenge 2020 presentation
 
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
 
Data Centric HPC for Numerical Weather Forecasting
Data Centric HPC for Numerical Weather ForecastingData Centric HPC for Numerical Weather Forecasting
Data Centric HPC for Numerical Weather Forecasting
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
The Evolving World of Substation Asset Data
The Evolving World of Substation Asset DataThe Evolving World of Substation Asset Data
The Evolving World of Substation Asset Data
 
A Survey on Batch Auditing Systems for Cloud Storage
A Survey on Batch Auditing Systems for Cloud StorageA Survey on Batch Auditing Systems for Cloud Storage
A Survey on Batch Auditing Systems for Cloud Storage
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
 
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
Forecast 2014 Keynote: State of Cloud Migration…What's Occurring Now, and Wha...
 
Renew power - ReLead Case Competition
Renew power - ReLead Case CompetitionRenew power - ReLead Case Competition
Renew power - ReLead Case Competition
 

mytechtalk

  • 1. 9/3/2015 1 What I did over my Summer Vacation Presented By:- Ashish Kumar Newfield Wireless - TekComms
  • 2. Overview 9/3/20152  Introduction  Geomarketing  Motivation  Data  Analysis  Curve-Fitting  Local minima analysis  Weighted K-means clustering  Conclusion
  • 4. What are we really looking for? 9/3/20154  Not the home location — First Location of Persistence (FLOP)  Does not necessarily correlate to reported home or billing address  Challenges:  Perhaps less RAN usage at FLOP  Median user may have predictable FLOP time range, but many user profiles do not conform  SLOP may have equal time duration and be indistinguishable
  • 5. Geomarketing and Using FLOP 9/3/20155  Device location enables: 1. Sending personalized offers, directions to targeted stores and venues etc. 2. Discovering mobility patterns  Finding the persistent location for a user allows: 1. Precise definition of how a venue or retailer fits into the daily life of consumers 2. Determining competition closest to customer base, not to retail location 3. Improved means of marketing via stronger, more relevant demographics rather than push/individually-targeted advertisements 4. Detection of significant changes in targeted consumers
  • 7. Mean Location/Hour Distribution 9/3/20157 Mean location distribution per user per hour
  • 8. Approaches 9/3/20158  To algorithmically determine the location which corresponds to decreased mobility, three methods were investigated:  Curve fitting  Local minima analysis  Weighted K-means clustering
  • 9. Curve Fitting 9/3/20159  Tried to fit a binomial curve, Gaussian distribution, beta distribution and Rayleigh distribution  Utilized chi-square test for goodness of fit  Poor performance where Oi : Observed value Ei: Expected value
  • 10. Curve Fitting Pitfalls 9/3/201510  Does not work for sparse data (very misleading)  Implies some level of symmetry in movements  Only works on single 24 hour window  Need mixture of curves and more computational power for higher time range  Can only get FLOP
  • 11. 9/3/201511 • Local minima analysis results for the most recent two days showing time and location spread for the persistent locations • Matched for 735609 users out of total 1187116 Local Minima Analysis Results
  • 12. 9/3/201512 • Revisiting previous plot with smaller bin size to better appreciate results • Median of the FLOPs returned were within 200m. from each other Local Minima Analysis Results
  • 13. 9/3/201513 Local Minima Analysis Results • Local minima analysis results for all the 5 days showing time and location spread for the persistent locations • Matched for 833453 users out of 1187116 users successfully (73%)
  • 14. Weighted K- Means Clustering 9/3/201514 • Two approaches were taken • First one shown here • Two clusters formed for data over two days • A cluster for each day and variation studied • Matched for 886228 users out of 1187116 (75%)
  • 15. 9/3/201515 Weighted K- Means Clustering • Second approach: For data over recent most 2 days • Different weighing and filtering method • 3 cluster problem reduced to 2 clusters by appropriate selection of 24-hour window • Matched for 847317 users out of 1187116 (71%) • Measure of typical time obtained • Typical duration for location and its
  • 16. 9/3/201516 Weighted K- Means Clustering • Intuition: Better clustering with larger data set. • Attempted to see if analysis of data over entire 5 days produced any better results. • Matched for 887682 users out of 1187116 users (75%) • Better than 2 day analysis yet poorer than local minima analysis
  • 17. Performance Comparison 9/3/201517 Weighted K-Means Clustering Local Minima Analysis Can be used to measure typical time and duration at persistent location for the user Doesn’t say anything about typical time or duration at location for user SLOP and TLOP can be obtained, given sufficiently large data No SLOP or TLOP Lots of post processing of data needed (filtering and weighing of data). SLOW Less post processing required. FAST High memory consumption due to weighing method Low memory requirement Requires larger data set to work on Works on sparse data set as well Calibration of returned values of typical time and duration needed Confidence measure is hard to establish here
  • 18. Conclusion 9/3/201518  Tested 3 approaches to determine FLOP of users  Local minima analysis: Very efficient and reasonably accurate  2 days of DLS data sufficient to work on  Use search space 12 hours post/prior maxima  On average, user generates 100 locations per day  A location record is typically 85 bytes  For one user, about 17 kB of data
  • 19. Future Prospects 9/3/201519  We may consider “slope-analysis” based algorithm that can work on-the-fly, without post- processing  Using GMMs (Gaussian Mixture Model) or other statistical modellings  Other machine learning algorithms  Picking location for minimum mobility time in local minima analysis
  • 20. THANK YOU !!!  9/3/201520

Editor's Notes

  1. -
  2. Explain weighing and filtering here used