Monday, November 23, 2015

Encryption and Decryption

After the Paris  attacks a lot is being said about encryption (and decryption) on mobile devices. A lot of it is bordering on utterly fantastic nonsense. People are taking the hype written into marketing material seriously. That is not a good place for this debate to be.

Let's start at the beginning, sometime after Sept 11, 2001, a massive electronic surveillance machine was set up. The machine is able to download large quantities of data in real-time from all manner of devices. How exactly it does has been discussed by Snowden et al... but frankly I don't care about that. As mobile devices are the most common means of communication, a lot of data collected is from these devices. What happens next should be quite familiar to those of you that work with "big data".

The data collected is archived and sorted into two bins, "relevant" and "irrelevant". The term is defined from the perspective of the national security mechanism, it has nothing to do with local security mechanism (such as police agencies).

In the interest of conserving resources, "Irrelevant" data is largely discarded after some level of pattern analysis. The "relevant" data is obviously stored and processed till as much predictive information as possible is squeezed out of it.

In the "relevant" data, is a small subset of encrypted communications. As only the communication (and not its meta data) is encrypted, it is possible to perform a certain level of analysis on it. For example, Abu Jihad sends an encrypted message to his fedayeen Abu-soon-to-be-dead. The message itself is encrypted, but the fact that both the Abu-whatevers use an unregistered phone is in the meta data. A simple sort can catch conversations between unregistered numbers.

Once unregistered numbers that talk to each other are identified, Then it is possible to look for patterns in the correlated meta-data sets and build up a relational database between such numbers. A clustering algorithm can tell you if there is anomalously high frequency of communication between two or more points in this database.

Once you have identified the target(s) - you can start the decryption. It doesn't make sense to start the decryption before you do all the necessary filtering because decryption is a resource intensive process.

So now - we talk about encryption and decryption.

Decryption is a lot easier when you know what kind of encryption is used. As indicated before - there are two main ways of encryption - a Vernam cipher, or a public-private key pair. The ways of dealing with a Vernam cipher are well discussed in books, it is rarely used except by major intelligence and military services. This is because the Vernam cipher uses unique secret keys. These are expensive to generate and one has to maintain the security of the entire secret key distribution chain. Only a major intelligence or military service has the budget to do things like that.

I want to focus on the issue of decrypting the more commonly used public-private key system (the RSA or the Digital Encryption Standard).

Assume for a moment that Abu-whatevers are talking to each other almost every 12 hours. If the message between them is intercepted then you have 12 hours to decrypt it. Without getting into too much detail, breaking the DES involves prime factorization (yes - that thing you learn in middle school and later forget). In DES - a key pair (i.e. a unique pair of primes) is distributed between the two ends of the communication. One key (so called public key) is given to the transmitter, and the second (private/secret key) is given to the receiver, and a unique product of these two primes is used to encrypt the message. The exact encryption process is something computationally inexpensive (ex. addition or subtraction or some combination of the two).

(leaving out the pesky details) If you have a large repository of prime numbers, you can generate all manner of products and iterate to see if they cause your message to be turned into plain text (there was a good brute force example of this at one the talks in a PyCon). Given how labor intensive this is - you can have an AI do this (even a simple neural network will do wonderful things). However this is where the exact details of the encryption software come into play. Some software has a maximum number of tries it allows before it locks/destroys the encrypted information.

So to decrypt a message between two potential targets, you are actually limited by the number of tries the software allows. These limits can be defeated by a backdoor - essentially something that suspends the attempt counter on the software, but there is no guarantee that a hostile agency will not get access to that door and do something to the message. So such backdoors are usually avoided (see all the back and forth about encryption on the latest IoS, Android etc...).

Hope this helps you all in framing a more meaningful debate on the issues at hand.

ps. cracking the communication after the event can help in understanding the critical dynamics in the event and identifying organizational structures. Without this information you won't be able to distinguish between Abu Jihad and Abu-soon-to-be-dead.


At 8:23 PM, Blogger Ralphy said...

encryption works well for short pieces of data and not so well with reams of text. It just takes too long to encrypt lots of data. What you want to encrypt are the keys that grant you access to the data. It doesn't take a lot of time for the computer to do this, relatively speaking. more on this later.

At 8:34 PM, Blogger Ralphy said...

what has developed in the last few years is encryption server where all read and write access in the enterprise must go through an encryption server, no exceptions. but the data is still exposed before it passes through the encryption server from the external source.

At 8:43 PM, Blogger Ralphy said...

fbi pushes to weaken cell phone security

At 5:08 AM, Blogger maverick said...

Hi Ralphy,

Yes - there is at client-server model that comes into play. I didn't want to get into pesky details, because regardless of how you do it - you run into the same problems, either secure distribution of one-time pads, or a public-private key pair.

If you know one prime number (public key) - you still have to figure out what the exact product is and what the second prime number (private key) is.

It doesn't have to be a simple product i.e a x b, it can be something else like a x a x a... b x b x b...

Each system of encryption carries with it - a separate set of risks and vulnerabilities. For example a physical one-time pad transport favored by S-directorate people is vulnerable to interception of the courier or simple errors like reuse (as the Red Army once did during the Venona Decrypt days).

All DES systems (or for that matter Vernam systems) are vulnerable to impersonation attacks. The case becomes particularly poignant when you have an encryption server which can itself be spoofed. In some ways I feel that the encryption server is more vulnerable than a simple key generator but the community of internet security types doesn't seem to find that agreeable. Everyone wants to be on a cloud now even the security types which traditionally would loathe anything that wasn't air gapped.

I guess everyone is vulnerable to fashion trends.

At 5:13 AM, Blogger maverick said...

On an unrelated note.

If Putin succeeds in cutting ISIS oil transit and breaks up the Turkish bound "Sunni-Only-Captagon" routes, he will have very delicately put a knife inside Saudi Arabia's side.

The Saudis decided to keep production of oil high because they believed

a) that they had financial reserves that would carry them through the declining revenue period, and

b) new high volume routes (i.e. so called Sunni ONG pipelines through Iran and Syria to Europe) would open up and gradually raise revenue through higher sales volume.

If Putin can slip his knife into Saudi Arabia's side as I indicated above, he will be able to force a shift inside Saudi Arabia - a shift that will take Saudi Arabia toward reduced production and a global spike in oil prices.

Given how sensitive the oil market is to speculators, even the slightest indication that the Saudis are going to reduce production or that the Saudis have come up the worse-for-wear in their war of shadows with Putin - might set of a speculation bubble in the oil price.

I guess I don't agree with everything from Oppenheimer analysts after all.

At 5:23 AM, Blogger maverick said...

FWIW it is important to recognize the critical dynamic in Syria.

The Assad regime has a oil for electricity deal with ISIS. In exchange for cheap oil from ISIS controlled sources, Assad agrees to a lower cut of the electricity produced at the Aleppo power plant.

If Assad starts buying Russian oil because ISIS oil is too expensive, then he will be tied at the hip to the Russians. He will stand solidly with Russia in opposition to the Sunni Gas pipelines.

This is the critical dynamic in Syria right now.

All the other stuff is irrelevant nonsense. For example this ISIS spectaculars in Paris is just a reflection of their HR issues. Foreign recruitment into ISIS is down after the recent losses to the Kurds and war fatigue has set in among the existing foreign katibas.

At 3:58 PM, Blogger maverick said...

That was a stupid and impulsive move by the Turks.

Putin will give the Kurds air cover. There is a very high likelihood of Turkish airplanes being shot out of the sky if they get within range of Russian radar.

At 4:50 PM, Blogger Ralphy said...

the s-400 system covers well into Turkish air space. the overall coverage is up to 300 miles. Turkey is a member of NATO so Putin must be careful where and when he shoots. Turkey must be very careful when it crosses the border to bomb the Kurds or else Putin might take a shot.

At 4:41 AM, Blogger maverick said...

This is going to get really messy.

At 5:03 AM, Blogger maverick said...

Putin will take a shot - he can always say "oops" afterwards.

At 7:20 AM, Blogger maverick said...

It is probably a good idea to declare a no-fly zone around that part of the world.

You know how he gets when he is all worked up.

At 5:35 AM, Blogger Nanana said...


The Wisdom of a Grand Nuclear Bargain with Pakistan
December 14, 2015 - 3:30 pm

Atlantic Council, 1030 15th Street NW, 12th Floor (West Tower)
Washington, DC

Please join the Atlantic Council on Monday, December 14th at 3:30 PM for a conversation with a panel of experts to discuss policy options to address international concerns over Pakistan's nuclear arsenal.

At 6:03 AM, Blogger maverick said...

The San Bernardino thing is weird, there is something completely different from what people think going on there.

I wonder if this is the first case of a reversal of "who-wears-the-pants" in the Jihadist community. From emerging data, the woman- Tashfeen - held the comms in the assault, she is the one with bigger Jihadi cred then her husband.

There are still many unanswered questions but if one goes with the Arif Jamal perspective - that ISIS only really messages to pre-indoctrinated adults, then this Tashfeen woman literally fell into the cauldron (of radicalism) as a baby. Her family members are deep in the ass of Maulana Haq Nawaz Jhangvi's anti-apostasy crowd in Multan. After that she goes to KSA to life with her dad, and that place is like anti-Shia central in the world, so again more indoctrination. This woman had no real chance growing up around all those Sunni-uber-mensch - she became on with her surroundings.

There is another wrinkle - the six month old baby. You have to be more than just indoctrinated to pull that kind of shit off - you have to be completely fucking brainwashed. Like S-directorate level conditioning - not something you typically see in the Jihadi women.

Back in Red Mosque 2007 days, the women stood by their men no matter what stupid shit they pulled and the Jamia-Hafsa types staged loud be ineffective protests. Now its like the women have taken the lead. Could be a sexual revolution thing inside Jihadi Islam - women tired of having the inbred men fuck up things are coming forward to set things right.

Then there is the issue of the target - was this workplace conflict? or not. What relevance does the target choice have.

It is no surprise that the relatives have "no clue" of their "relative's radicalization" - the CAIR is to Islam what the NRA is to guns. Everytime round someone does something stupid the NRA/CAIR jump forward to say "the dude is mental" there is nothing wrong with gun-ownership/Islam - again technically true but totally irrelevant.

So putting my counter-intel hat on, given the extent of comsec this woman practiced, how does one distinguish between a self-radicalized woman steeped in Sunni sectarian thinking from a Pakistani/Saudi deep cover operative that went active?

At 7:21 AM, Blogger maverick said...

This provocation discrimination part of things is very hard.

The story in public so far is that Tashfeen was brought up in an environment where radicalization was easy. Between the Sipah Sahaba Pakistan, the Al Huda insitute and Zakariya University - she had plenty of opportunities to go postal. Unlike so many people who become radical in their views, she didn't want to live in an Islamist paradise like KSA, but instead put herself in a marriage with Farook in the US (which is like the opposite of KSA as Islamist disneyland goes).

Syed Farook grew up in the US steeped in the traditions of liberalism. He was outwardly religious but preferred to live in a society where his view was one-among-many. When he found a bride in KSA, he didn't immigrate to KSA where he could practice his faith in a community that was more supportive, he stayed in the US.

When in the US, she and her husband self-segregated and undertook a detailed preparation for a terror strike. They implemented the strike and returned to their home where they were intercepted 4 hours after the event by police. They fought to the bitter end.

This makes no sense.

People usually turn to faith when there is a crisis they can't cope with in their lives.

Once they find some emotional resolution in their faith, they usually try to stick with a support community that keeps the resolution levels high. This is why religious groups and individuals self-organize and self-segregate.

If either Tashfeen and/or Syed Farook had a personal crisis - we still don't know what it was.

If they found resolution of this crisis in Islam-of-some-kind, we still don't know why they wouldn't seek a life in a community that was contoured around their views - like KSA itself.

If we assume that they were somehow unable to emigrate to KSA, we still can't explain why they didn't seek the usual Islamic study circles or local support groups or the internet based Islamist support groups which are all under FBI/DHS surveillance.

If we assume that somehow they defeated the surveillance regime and created an invisible support group, we still don't know why this invisible support group pushed them towards terror acts.

If we assume that this invisible support groups pushed them towards terror, why did they not follow the path of other fedayeen? - why not just stay at the conference center and draw fire from police - that would have escalated the carnage and then die in the shoot out there? - why bother with the escape and then return to home?

This story doesn't make any sense.

At 7:23 AM, Blogger maverick said...

Erin Burnett brought up the issue of postpartum depression in Tashfeen, seems likely that a mental health issue was at play but in all the cases that people have seen of this so far - even when there is an overlap with something like bipolar disorder, the violence is directed inwards usually towards the child.

To go from there to a full on shootout with police is outside the scope of what is seen in postpartum depression.

At 7:25 AM, Blogger maverick said...

There is something completely unknown about these people's lives - and the fact that it was not detected at all suggests that correlation mappers in electronic sea are failing. The AI isn't working properly.

The Paris attacks and now this in San Bernardino, it is almost like the correlations are not being weighted correctly. Data is being filtered out as noise when it should not be.

At 7:27 AM, Blogger maverick said...

After the discovery of the encrypted messaging app in the cellphone of the Paris assaulters, I thought perhaps this was a collection issue, that the decryption of the data presented an overhead that was hard to come up with timely predictions.

But now after the San Bernardino affair, I wonder if the correlation mappers are just off in some random place. There is an optimization problem that you have to solve when determining a correlation mapping. A function has to be minimized. If that optimization is off - then the correlations are crap.

At 7:37 AM, Blogger maverick said...

If I construct a correlation, I also need to come up with a noise measure. Unless I can construct a valid noise measure, I can't call my correlation meaningful.

If I have a correlation and a noise measure, I need to be able to show that the inverse of the correlation also has a meaningful noise measure. If I can't do that - I can't claim the inverse of the correlation is meaningful.

Inverses are easy to construct for two-point correlations. If you go to three and four point correlations, the physics (or underlying dynamics) you construct out of that becomes very shaky.

A good AI will refrain from using multi-point correlations and always make sure that the inverse of each two point correlation has a meaningful error.

Unfortunately this is a time intensive process, if you can't construct the inverse quickly enough, you might be inclined to declare correlations as valid before you know they actually are.

At 9:02 AM, Blogger Ralphy said...

"It is no surprise that the relatives have "no clue" of their "relative's radicalization" - the CAIR is to Islam what the NRA is to guns."

I like this. May I use it?

At 5:13 AM, Blogger maverick said...

> I like this, May I use it?


It seems to me that neither the CAIR nor the NRA for all their love of the US constitution can really provide any meaningful inputs on how to cope with these mass-casualty gun crimes.

The discussion on strategies to contain domestic terrorism can proceed faster if one were to completely ignore both the CAIR and the NRA and proceed down a data-driven analysis.

At 5:22 AM, Blogger maverick said...

The bigger data related question for me is why isn't this stuff detected ahead of time if the Government is downloading all this data?

Either the AIs that surf the data for potential threats are crap! or the humans who are supposed to monitor the AI outputs are flaking out.

Which is it?

My money is on the AIs being crap. Every AI is a correlational database and as long as I can recall those have always been a total pile of steaming crap.

here is an interesting article about databases by Paul Ford, that gets at some of the stuff I am complaining about.

At 6:44 AM, Blogger maverick said...

is the correlation crap?

A question that one will ask oneself many many times, but I feel among physicists - that question usually has some relatively straight-forward answers.

Questions to ask yourself when a correlation is presented.

1) How much data is backing it up?

1a) does the data sound like BS? as it things that can't possibly be measured correctly?

2) If the total data has N points in it and the correlation is N1. How does N compare with sqrt(N)

2a) is Sqrt(N) >> N1? (this could be a black swan)
2b) is Sqrt(N) << N1? (this could be a significant statistical departure)
2c) is Sqrt(N) ~ N1 (then you might just be seeing shot noise)

3) Is there an equally visible anti-correlation?

3a) is P(A) = N1/N ~ P(negative(A))? (you are most likely seeing shot noise)
3b) is P(A) = N1/N << P(negative(A))? (this is unusual - you may be seeing a black swan)
3b) is P(A) = N1/N >> P(negative(A))? (this may signal emergent behavior or imminent statistical shift).

4) How critical is P(A) to your analysis. If O is the desired outcome of the analysis

4a) Is dO/dp(A) small? (you don't have to give a shit)
4b) Is dO/dp(A) large? (refine the analysis to see what the exact functional form of the dependence is)

5) Is the correlation metric crap? Any >2 point correlation usually is crap. Look at the construction of a correlation and determine how many independent data points are being implicitly correlated.

At 7:02 AM, Blogger maverick said...

In the age of crowd-sourcing - one has to have a decent correlation analysis framework to get actionable intelligence.

Take for example the situation with Pathankot.

Most of the data from various sources looks normal. It is an old story now - told repeatedly - a fedayeen squad slips across the border (for some reason in Gurdaspur - again) and makes its way to a heavily guarded installation and attempts a strike. The strike predictably fails to make headway as reaction forces kick in and the attack stalls wiping out the Fedayeen.

But then there is a very odd outlier in the data. That bit about the SP Salwinder Singh's abduction and the seizure of his cell phone. It is bizarre that SP Salwinder Singh is left alive by the terrorists even though he is riding in a white SUV with government plates and a blue light on top. What is even more bizarre is that he is left alive and his phone is taken by the Fedayeen to make calls - specifically personal calls to the Bahawalpur area. This is very strange - but stranger still is the manner in which the SP's PSO calls up the phone and tells the Fedayeen that this is his CO's phone and they continue to hang on to it a full 24 hours after the initial contact. A taxi driver who is called to pick up the terrorists under a ruse however is killed.

Now if one digs deeper this SP is the same person who is accused by five of his female subordinates of sexual harassment and is the subject of an inquiry.

This data has such a large number of anti-correlations even within the same data set that it is difficult to imagine that something is not completely off with this story.

At 5:35 AM, Blogger Nanana said...

At 6:31 AM, Blogger maverick said...

A few more points about noise which I feel people sometimes forget.

If I have the world's most perfect operator and the greatest measurement apparatus in the world, and I use these to measure some quantity X, I will still see an error dX.

If I ask the operator to take N measurements, then dX will be proportional to sqrt(N).

To reduce this "shot noise" - I would prefer to take a lot of measurements (N will have to be large), and if I take those measurements at some rate f, then I will see a "pink" or "flicker" noise dX that will be proportional to 1/f^a where 1<a<2.

Apart from this I might see that as I increase the rate f, the dX my measurements report will not go to zero - even asymptotically. This is the sign that I am measuring "white noise".

All noise essentially reports a physics (i.e. dynamics) that is separate from what I am measuring. This physics can reach an "equilibrium" on some timescale, so the even with the most perfect operator and measurement, there will be a dependence of the error on the rate of measurements. The "pink" noise may represent non-equilibrium dynamics in the system that has yet to reach its desired state in the timescale of the measurement. The "white" noise is physical processes that have reach their desired states on rates much faster than anything explored by the measurement.

If I am able to take enough measurements quickly and at least some fraction of the "other physics" (i.e. driving the noise) is allowed to equilibriate, then when we look at the statistical distribution of the errors (by making a histogram for example) - you will see a gaussian lineshape.

If your errors don't look gaussian, then you are not giving enough time for the dominant physics in the error mechanisms to equilibriate. You may want to take the measurement at a slower rate.

If there is a slow drift in the errors, they will show up as a small shift in the mean error as a function of the time interval size. That will allow you to catch some very subtle physics that is at play in the data.

At 10:28 AM, Blogger maverick said...

I feel there is a way of thinking about the Pathankot incident that makes sense, but it rests on the premise that ISI HQ deliberately detached itself from the away team after the op was launched.

This may explain why the away team chose to use a cell phone to re-establish contact instead of sticking to secure coms. It is interesting that ABHQ was established in Bahawalpur, although it could be that call was forwarded to a line in I'bad.

It may also be that the GPS malfunctioned or was rendered inoperative - this would explain why the away team drove around in circles for hours.

At 11:12 AM, Blogger Nanana said...

"@20committee: Intel services that are states-within-states - Algeria's DRS, Pakistan's ISI, Soviet KGB - cannot be reformed, only killed. "

"Taliban have long referred to the ISI as The Black Snake. Even they think Pak spooks are deeply cynical & untrustworthy. Let that sink in."


Post a Comment

<< Home