Down under the HTML

Just for something to do I did a "wget" on a site that I visit for information. I was shocked at what I saw when I did "cat webpage".


WARNING: DO NOT TRY THIS AT HOME

Among many other things there was this as a comment. It also included diagnostic information that looked like novice work. Considering that the site is a commercial enterprise, I am guessing that management is completely disconnected from web IT. That is very common in business from my experience, but eventually you get tired of doing silly things just because you can get away with it.


cat webpage | wc -w -L
   5247    4003

This shows almost nothing on the page and I know it is common to have lines that go on forever, but if you are going to do that, you might as well just run it all together in one long run-on sentence. What you see is not what you get and I see there is an odd interaction with cookies and their local database.

If I use "beautiful soup" I don't see this stuff as it gets ignored as below the actual data threshold AFAIK.

0 comments:

Contributors

Automated Intelligence

Automated Intelligence
Auftrag der unendlichen LOL katzen