Ethical Analytics, Please

A couple of weeks ago, The Wall Street Journal ran an article entitled:
���You Give Apps Sensitive Personal Information. Then They Tell Facebook.���.



More accurately, there are some developers who:




Use analytics services offered by firms (e.g., Google, Facebook) that some people believe behave unethically
Send personal data ��� such as whether the user is trying to get pregnant ��� to those analytics services


I hold out some hope that the developers fought these decisions and were overruled
by pointy-haired bosses.
Alas, I fear that too many people focus on what is being sent to their own
Web services and do not think through what is being handed over to analytics
services. And, since many places outsource analytics collection and analysis,
privacy and security concerns can get magnified.



Fortunately, or perhaps unfortunately, I have not had to deal with analytics
much personally. If I were implementing analytics, I would fight tooth and nail
to do so ethically. Here are some of the things that I would be advising.



Own the Analytics Server

The biggest problem that people will have with what the Journal reported
isn���t that apps collect private information. The problem is that they are perceived
to share that private information with the likes of Facebook and Google. Facebook
in particular has been slipshod, at best, over the years in terms of data privacy.



While you as a developer might think that Google and Facebook don���t look at your
app���s analytics data, and while it���s possible that this is true, the perception
is that you���re sharing the analytics data with Google and Facebook. After all,
you���re sending it to their servers.



Ideally, analytics wind up being managed by some server that you control directly.
The most likely option for that is for the analytics to be sent to some server
that you provision by one means or another (Docker, VPS, etc.). This implies that
you license the server and host it yourself, which is more complicated than simply
outsourcing it. However, that complexity should be manageable and would greatly
reduce the privacy concerns that other parties have with your analytics.



A theoretical option ��� one that I suspect may not exist ��� would be
end-to-end encryption (E2E) of the analytics data. The machines that allow you
to examine the analytics, generate reports, and so on do not have to be the
same machines that are collecting the analytics from your apps. In principle,
the ���middleman��� servers that collect the analytics could be collecting encrypted
payloads of analytics records. A separate ���analytics analyzer��� that you operate
yourself would collect those records, decrypt them, and let you see the results.
The ���middleman��� servers could then be an outsourced service, with the encryption
preventing that provider from getting at the actual analytics data itself.



Encrypt Data In Motion

Speaking of encryption, ensure that the app���s communications to the analytics
server is using an adequate level of TLS or other on-the-wire encryption, so that
nefarious people do not sniff on your network packets to steal information in
transit.



Only Log Constants

I suspect that most users would be reasonably comfortable with you recording
information about what ���screens��� (activities, fragments, etc.) the user visits
and including that as part of your analytics data. I suspect that most users
would be far less comfortable with you recording their location (e.g., GPS fix).
Unfortunately, there is no easy way to automatically detect that the app is
logging sensitive data.



However, you could try to enforce that you only log constants. For whatever
client-side API the analytics service offers, create a Lint rule that will complain
if you try logging something that is not a string literal or string constant.



(and hopefully there is a way for Lint rules to detect Kotlin string interpolation���)



Obviously, this would block far more benign things than logging user locations.
However:





It is safe to say that hard-coded constants are not going to contain user-sensitive data




Automatic detection is much better than relying on manual audits that may never happen





Opt-In (Or At Least Opt-Out)

My guess is that current (e.g., GDPR) or future legislation will require apps
to allow users to control whether analytics get collected or not. Ideally, you
���get ahead of the curve��� and offer this now. Ideally, it would be an opt-in
choice, so the default is that analytics are not collected. At worst, make it
an opt-out option in your PreferenceFragment or other settings screen.



Decline Unnecessary Metadata

The analytics client library might provide APIs to automatically collect lots
of data about the environment: device model, OS version, screen resolution, and
so on. We see this a lot with crash logging, but analytics may offer to collect
similar stuff.



Try to minimize this. In particular, try to stop the collection of metadata
that you are not going to need.



While fixed values like device model are not user-sensitive, too much metadata
does start to make it possible to identify users across devices. The same sort
of stuff that Panopticlick uses for Web
browsers could be collected by analytics libraries in a native Android app.



Use an Open Source Client Library (and Vet It)

Try to use services that open source their SDKs. This allows you or
some consultant to examine the library and
see if it is doing something that you or your users might regret.



With luck,
somebody else has already performed that analysis and has published a report
that you can use. Just bear in mind that any such report will be for a specific
version of the SDK, and so periodically you will need to find a newer report
or vet the updated SDK yourself.





I know that there are some open source analytics options, and so most of what
I recommend here should be possible. And, I will admit that this is more work
than a lot of organizations will want to deal with. With luck, an ethical
analytics service will emerge that emphasizes these sorts of features, and perhaps
more, to help you avoid charges of invasions of privacy.



But, in general, treat your analytics data the same way that you treat your
���real��� data. And, treat your users the same, from a privacy and security
standpoint, for both types of data. Do not consider analytics privacy and security to be something
that you can ignore��� unless you elect to skip analytics outright.

 •  0 comments  •  flag
Share on Twitter
Published on March 07, 2019 05:42
No comments have been added yet.