Spaces in file names

Hi there,

Sorry to bug you with another question. A lot of my S3 files have spaces in them, and these are showing a bit funny in Webalizer (only shows the bit of the file name up to the first space).
You can see an example here:


There's a couple of things I think are contributing to this:

1. Amazon is 'double encoding' their Cloudfront log files. Spaces are encoded to '%20' (as normal), but then that is encoded again and the percentage sign becomes a '%25', forming a combined string of '%2520'.

2. Your log formatting script takes the Cloudfront logs and turns them into Apache common log format.

3. Webalizer imports the CLF log. iT parses the request field and unencodes the string, turning the '%2520' back into '%20', but it doesn't unencode twice to turn that '%20' back into space.

4. The problem: Percentage sign in Webalizer is a disallowed character, so when it parses the request string it only returns the characters before the percentage sign.


I hope I explained that clearly.

It should be a simple fix, you just need to find and replace '%25' (percentage twenty five) to '%' (percentage) in the script that runs to convert the Amazon log file to Apache common log file format.
Then when Webalizer does its unencoding, it will correctly unencode spaces (and all other characters).

Would it be possible for you to make this small change? It would make the stats much more useful to me, and maybe one or two other people are having the same problem.

Once again, many thanks for the one-of-a-kind incredibly useful service.

Dave Houlbrooke
Tuesday, June 30, 2009

Thanks for the clear and thorough report. I guess that's the nice thing about having a technical userbase.

Do you have a sample server access logfile that demonstrates this double encoding? If so, then we might be better off going straight to Amazon and seeing if they are willing to fix it at their end. They are remarkably helpful in issues like this, so if you have indeed identified a bug at their end, I wouldn't be surprised if they were able to get it fixed quickly.

Could you forward along a raw logfile with this malformation? If these spaced filenames are common enough in your bucket that you think they'll likely be in most of the logs, just let me know the bucket name and I'll grab one directly.

Jason Kester
Thursday, July 2, 2009

I never even considered going to Amazon!

This is an Amazon log file, with the 'double encoded' spaces (%2520):

This is one of the S3STAT converted CLF log files:

I wasn't sure if Amazon had done it on purpose as an undocumented 'feature'. I'll get in touch with them.

Dave Houlbrooke
Friday, July 3, 2009

Hi Jason,

I posted it up on Amazon's forums. You were right, one of their guys came back and said it 'appears to be a bug', and that they're looking into how to fix it.𠹖

Kind regards,

Dave Houlbrooke
Monday, July 6, 2009

Way to stick it to the man!

Reading that thread, it sounds like they're planning to fix the issue from their end. Works for me, since it saves me having to write a workaround. Thanks for helping to dig in to this.

Jason Kester
Wednesday, July 15, 2009

Hi, not to bring up old issues, but we're running into the same problem with our logs. We have a couple of directories with hundreds of files in them and it's skewing the reports. It doesn't look like amazon did anything to fix it, but they did suggest a workaround. Is that some that could be added to s3stat?


Tuesday, June 7, 2011

[ reply to this topic ]   [ return to topic list ]

© 2024 Expat Software Back to Top