Treebeard's Homepage : Stumpers

# Treebeard's Stumper Answer8 December 2000

Obfuscated URLs

We're all used to seeing Internet addresses or URLs, but there's more than one way to find a Web site! Try these links, exactly as shown: All of these obfuscated URLs will take you to the same Dunn School Web site! Spammers and scammers use these weird addresses to hide their tracks, but why do they work? (Hint: this is a math problem!)

The above URLs all work for me using Internet Explorer 5.0 with Win98. The HTML links are coded exactly as shown. Moving the cursor over the links in IE5 gives another hint!

A Web address is really an alias for a unique number that your computer looks up in a special network database called a Domain Name Server. 207.154.84.115 = (207 x 2563) + (154 x 2562) + (84 x 256) + 115, or 3,482,997,875. That's the real Internet address for Dunn School, and it works! You can add any multiple of 2564 and get the same address since most computers just ignore larger powers. "%77" is the ASCII computer code for "w", so "%77%77%77" is just as good as "www". These obfuscated codes all end up as the same address-number for your computer!

Notes:

Computers do their work with pulses of electricity that have an exact voltage that changes over brief periods of time. That's analog. We usually simplify that to a digital high or low voltage, and we don't care about the exact voltage level as long as it's above or below a certain threshold. That's already an abstraction. We can think of these pulses as strings of ones and zeros that represent numbers in binary or number base 2. That's another abstraction. Sometimes it's important to remember that computers really work with physical voltage levels that are not numbers at all, but it's usually OK to think that computers crunch numbers.

Numbers can represent many different things, like a text character, or the red/green/blue components of a colored dot on your screen, or the voltage level of a Grateful Dead sound sample at a particular microsecond-moment, or the modem sounds on your phone line, or even an Internet address. That's a higher level of abstraction.

 Numbers can represent many things, and they can also be represented in many different ways. MCMLXVIII (Roman numerals), 11110110000 (base 2), 7B0 (base 16), and 0.7.176 (dotted-decimal base 256) are all perfectly good ways of writing 1968, the year that Stanley Kubrick's (now) timely movie 2001: A Space Odyssey was first released. The blue square on the top-right represents 1,968 as a particular color with 0 units of red, 7 units of green, and 176 units of blue. There are 256 levels of each primary color possible on most computer displays, so this translates to a unique number for that color: 0 x 2562 + 7 x 2561 + 176 x 2560 = 1,968. We can also write 1,968 in Internet dotted-decimal fashion as "0.7.176", where it's implied that 256 is the place value. If we had better RGB color sense, we could do arithmetic with colors! Check out the addition on the right for an example. (Note that I carefully avoided any carry. What would it mean? What about multiplication and division??) We can write numbers any way we like, as long as we understand the rules for what we're doing. I admit I'm a Platonist. I think of numbers as ideal forms that are "out there" somewhere in the landscape of the mind. We don't invent math, we discover it. Don't confuse the number with how you find the number!

Computers work in binary base 2, but long strings of zeros and ones are hard to remember. Phone numbers are hard to remember too, so we break them up with dashes into smaller chunks that have a rhythm. Number bases that are powers of two make it easy to work with binary, so number bases 8 (octal), 16 (hex), and 256 are usually used by programmers, and are represented in standard ways. In hexadecimal (base 16), digit place values represent powers of 16 ( = 24). With 16 digits to represent, we can use the letters A-F as extra digits: A=10, B=11, c=12, D=13, E=14, and F=15. Take any binary number and divide it right-to-left in blocks of four digits, and then translate block-by-block into hex. For example 10110101 (base 2) = 1011 0101 = B7 (base 16). This gets easy with practice. Divide the binary number into 3s for Octal (base 8), and into 8s for base 256. You can also divide a hex number into 2s to convert it to base 256. There aren't enough standard characters for base 256, so the dotted-decimal form is used instead.

The Windows calculator (in scientific mode) is a useful tool for converting between common number bases, as Graybear illustrates:

The easiest way to convert from base ten to base 256 is to first convert to base 16 (hexadecimal) on the computer's calculator, then combine the digits into pairs and convert each pair to base ten.

 3482997875 (base 10) = CF9A5473 (base 16) CF (base 16) = 207 (base 10) 9A (base 16) = 154 (base 10) 54 (base 16) = 84 (base 10) 73 (base 16) = 115 (base 10) 3482997875 (base 10) = 207.154.84.115 (base 256)

I bet you could write a short program that would make it even easier.

My BIGNUM and BNC Basic programs can do all sorts of number base conversions. They are available with source code from Treebeard's BASIC Vault. There are also online calculators.

Here are the standard ways of writing numbers in these bases. These rules can be ambiguous, so don't trust to chance!

Number Base Binary Standard The rule:
10 Decimal - 1986 Start with a non-zero digit {1 .. 9}.
2 Binary 11110110000      - (no way)
8 Octal 011 110 110 000
3   6   6   0
03660
03.06.060
dotted-octal?
16 Hex 0111 1011 0000
7    B    0
256 base 256 00000111 10110000
7        176  (dec)
07       B0   (hex)
7.176
0x07B0
Use "." (a dot) to separate decimal numbers in the range {0 - 255}

Now we can examine each of the obfuscated URLs, and find a few more!

• http://www.dunnschool.com/
This is the regular URL. When you click on this link, the first thing that happens is that your computer looks up the address in a special network database called a Domain Name Server. If there's a problem, you might get a "DNS error", which means the DNS server couldn't complete the job for some reason.
• http://207.154.84.115/
This is the dotted-decimal base 256 representation of the address returned by the Domain Name Server. This is known as IP4 format since it uses four numbers between 0 and 255. Graybear observes that "Since IP4 only allows for 256^4 = 4,294,967,296 possible URLs, the internet is gearing up for IP6 which will allow for 256^6 = 281,474,976,710,656 possible URLs."

I don't know if the DNS server actually returns the dotted-decimal form, or just a 4 byte (32 bit) binary number. Either way, these are all equivalent numbers, and some of them even work as URLs on my computer with Internet Explorer 5.0 with Win98.

 207.154.84.115 dotted-decimal (base 256) works! 0xCF.0x9A.0x54.0x73 dotted-hex (base 16/256) doesn't work 0317.0232.0124.0163 dotted-octal (base 8/256) works! 0xCF9A5473 hex (base 16) doesn't work 031746452163 octal (base 8) works! 11001111100110100101010001110011 binary (base 2) doesn't work

I'm really puzzled why the hex forms don't work, but the octal forms DO work, on my computer at least. My impression is that programmers don't use octal much any more. Maybe the hex forms are protected as part of the transition to IP6? I also tried combining different number bases in dotted forms, with the same result that no URLs with hex forms are recognized:

 0317.0232.84.115 dotted-octal & decimal works! 00317.000232.00124.0000163 dotted-octal with extra zeros works! 0xCF.154.84.115 dotted-hex & decimal doesn't work 0317.0x9A.84.115 dotted-octal, hex, & decimal doesn't work

• http://3482997875/
3,482,997,875 = (207 x 2563) + (154 x 2562) + (84 x 256) + 115

This is the dotted-decimal base 256 number converted to decimal. It works.

• http://432979727475/
 2564 = 256 x 256 x 256 x 256 = 4,294,967,296 432,979,727,475 = 100 x 4,294,967,296 + 3,482,997,875 = (100 x 2564) + (207 x 2563) + (154 x 2562) + (84 x 256) + 115 = 100.207.154.84.115 (dotted-decimal base 256) = 0x64CF9A5473 (base 16)

I added a higher power of 256 to the decimal URL number, and it still works, so my computer only looks at the lowest 32 bits of a binary address. It's pretty obvious what I did in the dotted-decimal and hex forms, but the decimal number seems pretty mysterious. This will no longer work if the Web adopts IP6 addresses. These big decimals do work, but adding an extra dotted-decimal (or octal) digit does not work:

 100.207.154.84.115 doesn't work 0666.0317.0232.0124.0163 doesn't work

• http://%77%77%77%2E%64%75%6E%6E%73%63%68%6F%6F%6C%2E%63%6F%6D/
• http://%32%30%37%2E%31%35%34%2E%38%34%2E%31%31%35/
Numbers in a place-value system like decimal or binary are really expressions that evaluate to a number. 66 is (6 x 101) + (6 x 100) = 66. Codes are different since they are ultimately arbitrary. You look up the value in a table rather than evaluate it.

These cryptic URLs use the standard ASCII character code to represent the address. My browser automatically translates these when I drag the mouse over the links. The form for each character is a percent sign followed by a two-digit hex (base 16) number.

ASCII is a seven-bit code that dates back to pre-computer teletype machines. ASCII uses the numbers 0-127 to represent typewriter characters and some near-obsolete control codes. Click on the above link for a complete ASCII table. ANSI and ISO standards keep the old codes and add new characters to bring the code to 256 characters. The success of the Web around the world is moving us to an extended 16 bit character set, which will allow 216 = 65536 different characters. At some point, these weird ASCII URLs won't work. Graybear reports that these URLs do not work with his current version of Netscape.

I just had to try these composite URLs:

 %32%30%37.0232.84.115 ASCII & dotted-octal & decimal works! %32%30%37%2E0232.84.115 ASCII & dotted-octal & decimal works!

• http://www.NotDunnSchool.com@%33%34%38%32%39%39%37%38%37%35/
This URL is the ASCII version of the decimal 3482997875 address, but it uses the little-used authentication feature of the URL spec to make it even more obfuscatory. The full structure of an URL is:
```    http://username:password@www.address.edu:1234/path/subdirs/file.html
|      |                 |               |    |                     |
|action| authentication  |    address    |port|  path and file name |
```
Since the authentication field is rarely used, anything can go there, and it will be ignored! I also tried to obfuscate the "@" character as ASCII %40, but these don't work for me even though they look right when I put my mouse over the link:

These are some truly obfuscated URLs! I expect things to make sense in nature, but with technology, I'm usually happy to know what works and leave it at that. A plain numeric address like http://207.154.84.115/ might just mean that the site is brand-new and hasn't yet been registered. But spammers and scammers use these techniques to hide their tracks when they send junk email. Is this also a way to get around firewalls and Net-Nanny type censorship? (Let me know!)

 Here's my best shot at a truly obfuscated URL for dunnschool.com that uses all the tricks. See if you can figure out why this works! http://Obfuscate!%64@%32%30%37%2e000%3232%2E84%2e%31%315/

A stumper remains. I have my own Web server at treebeard.org, aka 204.48.153.235, which is hosted as a virtual server on Ray Ford's fine Santa Barbara Outdoors site. When I try the same obfuscated URLs on my address, I sometimes get my page and sometimes Ray's! Is there any rhyme or reason to this??

 http://www.treebeard.org/ treebeard.org http://204.48.153.235/ SB Outdoors http://3425737195/ SB Outdoors http://%77%77%77%2E%74%72%65%65%62%65%61%72%64%2E%6F%72%67/ treebeard.org

Update (4 March 2001):
cLive hoLLoway emailed this explanation:

Your server (like most), runs virtual servers, ie mapping more than one domain to the same IP address. Typing the IP address alone gives you the *default* server. Your "actual" URL will be http://204.48.153.235/~yourlogin When the server is requested a URL, it converts the URL to the correct internal mapping. If your domain was the only one on the IP address, then your experiments would work.

Thanks Clive, this makes sense. But when I try various combinations of my login name and my treebeard.org domain name, I still can't get to my page. I either get a "File not Found" or a "You don't have permission" error. This must have something to do with the server mapping just as you say.

Here are some Web links for further research:

• I first started thinking about this stumper after reading former-Byte magazine editor Fred Langa's LangaList (19 October 2000). This is a free email newsletter with lots of good info for PC users. Recommended!

• ASCII stands for the "American Standard Code for Information Interchange". There are tables of ASCII codes here and here. Steven Searle's A Brief History of Character Codes is a good history of ASCII and other computer character codes, with some thoughts about where it's going.

• The Internet began as a series of RFC ("request for comments") documents, including T. Berners-Lee's original spec for Uniform Resource Locators (URL) (RFC 1738) We live with these (ongoing) decisions. What an interesting and influential history is here for study!

• How to Obscure Any URL is a good tutorial on understanding obfuscated URLs. Karen Kenworthy of Winmag.com offers her URL Discombobulator program that discombobulates obfuscated URLs. She has a good tutorial about how it works.

• SamSpade.org has an online calculator that will trace obfuscated URLs, and give lots of info about Web addresses. They offer a free Windows program that's even better. Luc Neijens' CyberKit is another good program for tracking the Internet. The NSLookup gateway can track down obscure URLs online. Windows 98 include the DOS utilities NSLOOKUP and PING that are also useful.

• I love that word "obfuscate". It's true of itself! It's long been a computer geek word. There's the International Obfuscated C Code Contest and the Obfuscated Perl Contest, where the goal is to write really tight but incomprehensible program code that works. The creators of obfuscated URLs have the same goals.

• The Digital Freedom Network has a page of often funny examples showing how unreliable censor-ware is.

Back to Stumper