Speed War :)

Discussion of the upcoming GPU accelerated rainbow table implementation
  • Ads

Speed War :)

Postby Sc00bz » Thu Mar 15, 2012 1:03 pm

So I did some optimizations and yours is about 6.5% faster with a compute capability 1.1 card (probably same for all 1.x), 24% faster with a compute capability 2.1 card, and probably somewhere in between for compute capability 2.0.

md5_loweralpha-numeric#1-7
Code: Select all
9800 GTX+ (128 cores, 1836 MHz, compute capability 1.1)
  330 MLinks/sec 1.065x CryptoHaze (generation)
  310 MLinks/sec 1.00x  mine (generation)
  240 MLinks/sec 0.77x  rcrack's GPU version (pre-work 100k)
   82 MLinks/sec 0.26x  rcracki_mt 0.7b (pre-work 100k)

GTS 450 (192 cores, 1566 MHz, compute capability 2.1)
  360 MLinks/sec 1.24x CryptoHaze (generation)
  290 MLinks/sec 1.00x mine (generation)
  200 MLinks/sec 0.69x rcrack's GPU version (pre-work 100k)
   91 MLinks/sec 0.31x rcracki_mt 0.7b (pre-work 100k)
Sc00bz
 
Posts: 93
Joined: Thu Jan 22, 2009 9:31 pm

Re: Speed War :)

Postby Bitweasil » Thu Mar 15, 2012 3:52 pm

What algorithm are you using for your reduction function, since this is the main factor to consider?

//EDIT: And Atom has apparently gotten into a speed war with me after I proved I could outrun hashcat in multihash brute forcing on nVidia.
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: Speed War :)

Postby Sc00bz » Fri Mar 16, 2012 4:22 am

This is the standard rcrack method. I guess it was more apparent in the context when I posted it on FRT because I had it next to CPU benchmarks.

This is a single 32 bit thread of a 2.5GHz Q9300.
It looks like the winner is "divcfl-3" for the CPU version:
Code: Select all
  10.24 MLinks/sec  md5_loweralpha-numeric#1-6
   9.35 MLinks/sec  md5_alpha-space#1-9
   9.87 MLinks/sec  md5_loweralpha#1-10
   9.54 MLinks/sec  md5_loweralpha-numeric-space#1-8
   9.56 MLinks/sec  md5_loweralpha-numeric-space#1-9
   9.70 MLinks/sec  md5_loweralpha-numeric-symbol32-space#1-7
  10.40 MLinks/sec  md5_loweralpha-numeric-symbol32-space#1-8
   9.32 MLinks/sec  md5_loweralpha-space#1-9
  10.37 MLinks/sec  md5_mixalpha-numeric#1-8
  10.28 MLinks/sec  md5_mixalpha-numeric-all-space#1-7
  10.69 MLinks/sec  md5_mixalpha-numeric-all-space#1-8
   9.89 MLinks/sec  md5_mixalpha-numeric-space#1-7
  10.38 MLinks/sec  md5_mixalpha-numeric-space#1-8
   8.95 MLinks/sec  md5_numeric#1-12
   9.03 MLinks/sec  md5_numeric#1-14
   8.97 MLinks/sec  md5_hybrid3(omni6.txt)#0-0
   8.70 MLinks/sec  md5_hybrid3(omni7.txt)#0-0
Sc00bz
 
Posts: 93
Joined: Thu Jan 22, 2009 9:31 pm

Re: Speed War :)

Postby Bitweasil » Fri Mar 16, 2012 4:27 am

Damn. Using the multiply to divide trick?
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: Speed War :)

Postby Sc00bz » Fri Mar 16, 2012 7:31 am

Yes. I'm so glad I was too lazy to finish and properly test the fixed point multiply reduction function I came up with. I basically couldn't decide if I should do 32 bit or 24 bit multiply. Now the difference in speed is probably negligible but FPM is less uniformly distributed (and has problems with small sub key spaces compared to the total key space which is why I dropped 1-4 password lengths).
Sc00bz
 
Posts: 93
Joined: Thu Jan 22, 2009 9:31 pm

Re: Speed War :)

Postby Sc00bz » Thu Apr 05, 2012 5:03 am

ARTGen 0.1a

GTS 450:
273 MLinks/second for md5_mixalpha-numeric#1-9
282 MLinks/second for md5_loweralpha-numeric#1-7

Too lazy too swap out GPU for 9800 GTX+. Also If you use a 1.x compute capability card you should recompile it without defining USE___fmul_rd since it should be faster. On that note does anyone know how to tell at compile time which compute capability a .cu file is being compiled for.
Sc00bz
 
Posts: 93
Joined: Thu Jan 22, 2009 9:31 pm

Re: Speed War :)

Postby TheLostMind » Fri Apr 25, 2014 6:25 am

Sc00bz wrote:ARTGen 0.1a

GTS 450:
273 MLinks/second for md5_mixalpha-numeric#1-9
282 MLinks/second for md5_loweralpha-numeric#1-7

Too lazy too swap out GPU for 9800 GTX+. Also If you use a 1.x compute capability card you should recompile it without defining USE___fmul_rd since it should be faster. On that note does anyone know how to tell at compile time which compute capability a .cu file is being compiled for.


Your program is well but have bugs :

artgen rt MD5 div alpha-numeric#8-8 10 48000 1048576 0 .\ test 0 0
pause

Error: gpudivsb.cpp(468) : getLastCudaError() CUDA error : Failed kernel launch.
: (7) too many resources requested for launch.


if you can fix this ,it will be a good rt table generator .
TheLostMind
 
Posts: 9
Joined: Thu Jun 27, 2013 8:14 am


Return to GPU Rainbow Tables

Who is online

Users browsing this forum: No registered users and 2 guests

cron