Downloading Flash Videos
A friend of mine asked me if I could download a flash video for her that was aparently not downloadable with the firefox plugins that can normally be used for this. As it was weekend and I wanted to know how this flash stuff works anyway, i started to poke arround a bit. I figured that the flash player embedded into the site had to store the video somewhere. And as the video was nearly three hours long I was certain it was using a file to store the video instead of doing it in memory. Strangely there was noting in /tmp or /var or that looked like video data. Lets have a closer look using strace:strace -tt -o firefox_strace.txt -f firefox http://engkino....If we grep in the file for open syscalls in the /tmp folder, only one thing pops up - strangely the very same file gets deleted an instant later!
153321-8012 19:31:33.809867 read(5, "!", 1) = 1
153322-8012 19:31:33.810021 write(6, "!", 1) = 1
153323:8012 19:31:33.810297 open("/tmp/FlashXXGqccIP", O_RDWR|O_CREAT|O_EXCL, 0600) = 15
153324-8012 19:31:33.810479 close(15) = 0
153325-8012 19:31:33.810564 stat("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=12288, ...}) = 0
153326:8012 19:31:33.810720 open("/tmp/FlashXXGqccIP", O_RDWR|O_CREAT|O_TRUNC, 0666) = 15
153327-8012 19:31:33.810835 unlink("/tmp/FlashXXGqccIP") = 0
153328-8012 19:31:33.810984 read(12, 0x7f233b791074, 4096) = -1 EAGAIN (Resource temporarily unavailable)
Aparently the file /tmp/FlashXXGqccIP is opened (and created if it does not
exist) (line 153323) then closed, then opened again, beeing truncated to zero
bytes. A few microseconds after that it is deleted! That is a cool mis-use of
the semantics of linux (POSIX?) files: unlink() deletes the name which refers
to a file, however, any file descriptor referring to that file is still valid.
So the filedescriptor 15 can still be used by the process. And indeed, it seems
to be used to store binary data which is also read back from time to time.
264717-8022 19:32:16.903338 read(15, "\326\351Sd\212h\260G\227\210\366\326\242\330"..., 4040) = 4040
264718:8022 19:32:16.903623 write(15, "r0\32\24 #\305\355\22\31\270\236"..., 4096) = 4096
264719:8022 19:32:16.904525 write(15, "\320\27('31\202)\366\364\0\262K\241\5"..., 248 unfinished ...
264761-8022 19:32:16.905365 read(15, "O\277\312\Z\360L!\353\202\374\233\r"..., 4096) = 192
So how can we get the file? Of course we could use LD_PRELOAD trickery to
intercept the read calls, but that would be very messy. Linux offers a much
easier way to access all open file descriptors opened by a process. We can
check the process id of the flashplayer process, and get a symlink to the file
from the proc filesystem: ls -l /proc/8427/fd | grep tmp gives us:
lrwx------ 1 timos timos 64 Feb 12 20:11 15 -> /tmp/FlashXXvX2DEE (deleted)We can use that symlink to copy the file to our home directory.
Alternativlos, Fun with GNU R
Frank Rieger and Felix von Leitner started to make a Podcast called "Alternativlos" that I like to listen to. Lately I was wondering when the next issue would come out or if they lost interest in producing episodes --- I felt that the delay between episodes was constantly growing.
However, the problem with such "feelings" is that they can be misleading. So I decided to investigate a little using my favorite data analysis tool GNU R.
I used the R code below to download the website which contains the release date and an abstract for each episode. The content is then piped through a perl snipped which extracts the episode numbers and release dates. Then R calculates the difference between successive releases and plots them. Also a moving average is plotted.
svg(filename="alternativlos_issue_freq.svg"); data <- read.table( pipe("wget 'www.alternativlos.org' -q -O - | perl -n -e \'if (m/Folge (\\d+) vom (.+?)</) {print \"$1 $2\n\";}\'") ); data <- data[order(data$V1),]; dates <- as.Date(data$V2, "%d.%m.%Y"); dates <- append(dates, Sys.Date()); dates_before <- dates[1:length(dates)-1]; dates_after <- dates[2:length(dates)]; days_between <- dates_after - dates_before; plot(x=seq(2, length(days_between)+1), y=days_between, type="b", xlab="Issue", ylab="Time between issues [d]", lwd=2, col="red", main="Alternativlos Podcast Issue Frequency", ylim=c(min(days_between), max(days_between)+5) ); ma5 = filter(days_between, rep(1/5, 5), sides=2); lines(x=seq(2, length(days_between)+1), y=ma5, lwd=3, col="blue"); grid(); legend("topleft", lwd=c(2,3), col=c("red", "blue"), bg="white", legend=c("Alternativlos Issues", "Moving Average (over 5 Issues)") ); dev.off();
From the data we can observe that in the beginning of the podcast a new episode was released about every 2 weeks on average, while the average waiting time is now more than one month. Gnaa. So let's all hope there will be a mew episode soon. Note that the script assumes that the next issue will come out today.
And if you have to do some kind of statistical data analysis or just have to make some graphs then I highly recommed trying GNU R. When I wrote my first scientific articles I made a my graphs with gnuplot. This worked nicely until I wanted to put slightly more "advanced" stuff in my graphs. Some things are just impossible with gnuplot and I noticed that gnuplot added or changed features in minor versions. So if I worked with other people we always had to ensure that everybody uses exactly the same gnuplot version. With R I have never encountered an "unsolvable" visualization problem, the documentation is excellent and if I can not find a solution for my problems by myself the R-help mailing list members have good suggestions within minutes.