[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to convert a md5sum back to a timestamp?
From: |
Assaf Gordon |
Subject: |
Re: How to convert a md5sum back to a timestamp? |
Date: |
Thu, 1 Aug 2019 02:08:59 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
Hello,
On 2019-08-01 12:50 a.m., Stephane Chazelas wrote:
2019-07-31 22:36:18 -0500, Peng Yu:
Suppose that I know a md5sum that is derived one of the timestamps
computed below. Is there a way to quickly derive what the original
timestamp is? I could make a database of all the timestamps and their
md5sums. But as the total number of entries increases, this solution
will not be scalable as the database can be big. Is it there any
better solution to this problem?
for i in {1..2563200}; do date -d "-$i minutes" +%Y%m%d_%I%M%p; done
[...]
seq -f '-%g minutes' 2563200 | date -f - +%Y%m%d_%I%M%p
would be an improvement as it would only run one date
invocation, but you'd still need to run one md5sum for each of
those lines. coreutils md5sum in itself is not slow, but forking
a process and loading a command and linking its libraries is,
that's not a bug in coreutils itself.
"datamash" will calculate md5 on multiple lines in one invocation:
$ seq -f '-%g minutes' 2563200 \
| date -f - +%Y%m%d_%I%M%p \
| datamash md5 1
or to see the time AND the md5 sum, add "--full":
$ seq -f '-%g minutes' 2563200 \
| date -f - +%Y%m%d_%I%M%p \
| datamash --full md5 1
Three notes:
1.
I would recommend using "-%7.0f minutes" format in "seq"
instead of "%g", as the latter will result in a scientific notation
for large values:
$ seq -f '-%7g minutes' 2563200 | tail -n1
-2.5632e+06 minutes
$ seq -f '-%7.0f minutes' 2563200 | tail -n1
-2563200 minutes
2.
Using "-N minutes" as a date format is relative to the current time.
Are you sure that's the value you want? you'll get different values
every time you run it...
To be more reproducible, consider starting with a known date, e.g.:
$ date -u -d "2019-08-01 01:53:22Z +55 minutes" +%Y%m%d_%I%M%p
20190801_0248AM
or
$ seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \
| date -u -f - +%Y%m%d_%I%M%p | head
20190801_0154AM
3.
Using "datamash md5" does not include the newline for the md5
calculation, be careful about this when comparing hashing results.
e.g.:
$ echo 20190731_0848PM | md5sum
deb75bda7f8e95d321897d181cbe2556 -
$ printf "%s\n" 20190731_0848PM | md5sum
deb75bda7f8e95d321897d181cbe2556 -
$ printf "%s" 20190731_0848PM | md5sum
d0bf332197593b7c3f6d7757f7d5754a -
$ printf "%s" 20190731_0848PM | datamash md5 1
d0bf332197593b7c3f6d7757f7d5754a
---
For reference, on my old desktop it takes:
$ time seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \
| date -u -f - +%Y%m%d_%I%M%p \
| datamash --full md5 1 | wc -l -c
2563200 125596800
real 0m14.185s
user 0m17.739s
sys 0m0.527s
And results in ~125MB of data - reasonable for an ad-hoc reverse
lookup table for MD5 values.
If you key space gets larger, you should look into
https://en.wikipedia.org/wiki/Rainbow_table .
Hope this helps,
- assaf