Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Off Topic > Off Topic - Computers


Reply
 
Thread Tools Rate Thread
Old 06-24-2012, 09:09 AM   #1
HUSKER55
Registered User
 
HUSKER55's Avatar
 
Join Date: Jul 2007
Location: MILWAUKEE
Posts: 5,285
need your opinion

I was wondering if a PDF converter program would allow me to down load the pdf and then put it into excel and allow me to pick and choose the data.

I got some info on Nuance Pro8 that include Dragon speaking program.


Does anyone have any experience or thoughts you would share?


thanks!
__________________
Never tell your problems to anyone because 20% flat don't care and 80% are glad they are yours.

No Balls.......No baby!

Have you ever noticed that those who do not have a pot to piss in nor a window to throw it out of always seem to know how to handle the money of those who do.
HUSKER55 is offline   Reply With Quote Reply
Old 06-24-2012, 09:13 AM   #2
Capper Al
Registered User
 
Capper Al's Avatar
 
Join Date: Dec 2005
Location: MI
Posts: 6,330
Doesn't sound practical to me. Most data streams can either be copied from the screen or uploaded from a comma delimited file. Or I might not be understanding your question.
__________________


"The Law, in its majestic equality, forbids the rich, as well as the poor, to sleep under bridges, to beg in the streets, and to steal bread."

Anatole France


Capper Al is offline   Reply With Quote Reply
Old 06-24-2012, 09:31 AM   #3
wilderness
Registered User
 
wilderness's Avatar
 
Join Date: Dec 2004
Location: 45th parallel
Posts: 2,178
There are different types of PDF's.
1) A PDF created as an image
2) A PDF created as text.

In addition, PDF creators are offered security options (restrictions) that either prohibit and/or limit general file options (copy and paste is one)

You may not copy data from PDF-image file when data is non-existent, at least not without the use of an OCR tool.
__________________
Best Don
wilderness is offline   Reply With Quote Reply
Old 06-24-2012, 01:49 PM   #4
hcap
Registered User
 
hcap's Avatar
 
Join Date: Nov 2002
Posts: 30,398
I doubt if ant will do it well.
Having said that, here is a google search that turned up quite a few claiming otherwise.

http://www.google.com/search?hl=en&s....0.Gp0vF-ih5UQ
hcap is offline   Reply With Quote Reply
Old 06-24-2012, 05:11 PM   #5
HUSKER55
Registered User
 
HUSKER55's Avatar
 
Join Date: Jul 2007
Location: MILWAUKEE
Posts: 5,285
THANK YOU GUYS FOR TAKING THE TIME TO HELP.
__________________
Never tell your problems to anyone because 20% flat don't care and 80% are glad they are yours.

No Balls.......No baby!

Have you ever noticed that those who do not have a pot to piss in nor a window to throw it out of always seem to know how to handle the money of those who do.
HUSKER55 is offline   Reply With Quote Reply
Old 06-25-2012, 12:54 AM   #6
Robert Goren
Racing Form Detective
 
Robert Goren's Avatar
 
Join Date: Jul 2007
Location: Lincoln, Ne but my heart is at Santa Anita
Posts: 16,316
I had a program that I bought a long time ago that allowed me to copy and paste Bris PDF past performances that Twin Spires gives away to Excel. It wasn't as useful as I hope.
__________________
Some day in the not too distant future, horse players will betting on computer generated races over the net. Race tracks will become casinos and shopping centers. And some crooner will be belting out "there used to be a race track here".
Robert Goren is offline   Reply With Quote Reply
Old 06-25-2012, 10:01 AM   #7
SchagFactorToWin
Registered User
 
SchagFactorToWin's Avatar
 
Join Date: Jul 2009
Location: WNY
Posts: 444
I've never been able to get the sub/super script in the running lines to convert properly.
SchagFactorToWin is offline   Reply With Quote Reply
Old 06-25-2012, 10:57 AM   #8
wilderness
Registered User
 
wilderness's Avatar
 
Join Date: Dec 2004
Location: 45th parallel
Posts: 2,178
Quote:
Originally Posted by Robert Goren
I had a program that I bought a long time ago that allowed me to copy and paste Bris PDF past performances that Twin Spires gives away to Excel. It wasn't as useful as I hope.
Robert,
FWIW and with today's additional fields in the Tracmaster data, the software would even be less useful.


Quote:
Originally Posted by SchagFactorToWin
I've never been able to get the sub/super script in the running lines to convert properly.
Schag,
I've the same problem in OCRing with small fonts of any nature. Fractions (race times) are difficult in OCR as well.

There is a PDF OCR software, that is quite effective, however it 'taint cheap.
Abbyy FineReader, has a diverse set of fonts for recognition added to their software in which recognition is far more accurate than the OCR software that I use with my scanner. Unfortunately the bloated fonts slows the OCR process down quite a lot.
I've never used Abbyy for tables ( I try to avoid stats at every turn).

Last year I worked with some HS students using the scanner OCR software (same model scanners and software I use daily) and they thought it was slow as molasses.

The key to OCR is in an accumulated supplemental dictionary (topic based), unfortunately there is NOT much room for adding numerals to the dictionary.
__________________
Best Don
wilderness is offline   Reply With Quote Reply
Old 06-25-2012, 12:05 PM   #9
chickenhead
Lacrimae rerum
 
chickenhead's Avatar
 
Join Date: Apr 2004
Location: at my house
Posts: 7,308
I experimented with building an automated setup for this for race charts. Since equibase has historical charts one could build extensive pp collection.

Ocr: Google docs uses Ocr to do pdf conversion and it handles charts fine. However the formatting is atrocious and would be difficult to deal with.

Cracking: breaking the password to allow text access proved much more fruitful. After unlocking one can use a pdf parse package like many languages have (i used pdfminer for python) and one can pretty reasonably get what they need.
chickenhead is offline   Reply With Quote Reply
Old 06-25-2012, 12:51 PM   #10
wilderness
Registered User
 
wilderness's Avatar
 
Join Date: Dec 2004
Location: 45th parallel
Posts: 2,178
Quote:
Originally Posted by chickenhead
Cracking: breaking the password to allow text access proved much more fruitful. After unlocking one can use a pdf parse package like many languages have (i used pdfminer for python) and one can pretty reasonably get what they need.
My understanding related to this is that the vulnerability of hacking the password is dependent upon the number of pages the PDF contains.

A 1-4 PDF is much easier hack than a 10-20 page PDF, with the later likely not possible at all.
The hacking is also dependent upon which bit-range is set by the PDF creator.

A 40-bit (early Adobe) is easily hacked.
A 128-bit is quite difficult.

I've seen some PDF's in the past where the creator added security settings, however never added two levels of passwords, thus the security settings were easily removed. Course, your not able to change these settings with the FREE Acrobat viewer or many of the other FREE PDF tools.
__________________
Best Don
wilderness is offline   Reply With Quote Reply
Old 06-26-2012, 02:44 AM   #11
tupper
Registered User
 
tupper's Avatar
 
Join Date: Jan 2007
Location: Los Angeles
Posts: 492
I just tested the PDF Import plugin in LibreOffice, and it opens PDF PPs through the draw program in the same layout as the original PDF.

It is probably possible to then make a LibreOffice macro that would convert all of the fields to an ordered spreadsheet.
tupper is offline   Reply With Quote Reply
Old 06-24-2013, 04:16 AM   #12
dietant
Registered User
 
dietant's Avatar
 
Join Date: Apr 2007
Posts: 18
a simple Tip

for a a PDF created as text using foxit software text converter can help a little bit
do the rest with PDFs downloaded from Amwest programs link (today gone?) and excell VBA macro.
The foxit stuff produce 3 rows for a line containing super's and sub's racetrackfonts
example:
----------------------- Page 3-----------------------
Race 2 6 Furlongs BELMONT PARK - Wednesday, June 12, 2013 3+ FCLM, $20000
Purse: $ 28000. For Maidens, Fillies And Mares Three Years Old And Upward Foaled In NeYork State And Approved By The NeYork State-bred
2 Registry. Three Year Olds, 119 Lbs.; Older, 124 Lbs. CLM Price $20,000.
etc...13 Record: 5 0 0 1 $ 7220
1 Royal Blue, YelloTriangular Panel, YelloDiamonds On Sleeves, Blue And YelloCap 2/1 [19, 0, 2, 4, 0%] 12 Record: 1 0 0 0 $ 267
MAXANA 119 $ 20000 JUNIOR ALVARADO 12-13 Off: 1 0 0 0 $ 1650
-etc..
-etc..
1st Row: 52 50 08 38 5 6 6 7 6 nk nk
2nd Row :21/04/13-2AQU ft 3+F MSW50000 6f 23 47 1:11 68 4 2 5 2 6 5 2 5 4 Tomas,P 112 Lb 30.75 ChinaGold113 AhGaga118 Vaid118
to rail 1/2, 4upper[6]
3nd Row: 113 118 118 1 2 4
-etc...
super strings: 52 50 08 38 5 6 6 7 6 nk nk
sub strings: 113 118 118 1 2 4
Concatenating:
the super's 52 50 08 38 and 6f, 23, 47, 1:11 = 6f (52) ft1=23.50, ft2=47:08, ft3=138
the super's 5 6 6 7 and the sub's 1 2 4 = 5 (5 ½), 6 6, 5 (6 ½), 5 (7 ¼)
the sub's 113 118 118 and the super's 6 nk nk = ChinaGold113 6 AhGaga118 nk Vaid118 nk

Last edited by dietant; 06-24-2013 at 04:27 AM.
dietant is offline   Reply With Quote Reply
Old 06-24-2013, 12:51 PM   #13
Longshot6977
Registered User
 
Longshot6977's Avatar
 
Join Date: Feb 2013
Location: Central New Jersey
Posts: 1,467
I have tried some OCR programs and have the same problems as other regarding not reading correctly the superscripts and subscripts. They also put too much data in one row in Excel and that too is hard to deal with since it varies sometimes.
Has anyone got any program they had good success with to allow proper importing/reading of the PDF charts or PPs to Excel? Dietant, can you please elaborate a little more on your procedure? Thanks.

PS- I found ABLE2EXTRACT Pro v8 to be the best so far, but it requires too much finagling with the columns and won't always read sub/superscripts.
Longshot6977 is offline   Reply With Quote Reply
Old 06-24-2013, 04:10 PM   #14
vegasone
Registered User
 
Join Date: Aug 2007
Posts: 531
The HTML output of ABLE2EXTRACT Pro v8 looks like it would be the easiest to parse if you were able to do that.
vegasone is offline   Reply With Quote Reply
Old 06-24-2013, 05:23 PM   #15
dietant
Registered User
 
dietant's Avatar
 
Join Date: Apr 2007
Posts: 18
sharpen ur pencils

No big deal.
the super's, numbers, and sub's have diferent size, ocupies diferent pdf(X,Y) positions, and have diferent fonts.
the translators "pdf to text" write them in diferent rows depending on the value of Y and use the X value as offset from the begin of the row
Study case: Fin 2nd behind 1 and 1/2
  • .............. 1
  • .............. -
  • .............. 2
  • .........1
  • ...2
The "2":
Position (X,Y); (258.40 ,636.50)
Font Name: Univers-Condensed-Medium
Font Size: 7
Text 2
------
The "1":
Position (X,Y); (261.82 ,638.40)
Font Name: Univers-Condensed-Medium
Font Size: 5.25
Text 1
-----
the (1/2)
Position (X,Y); (263.73 ,636.70)
Font Name: SansFractionsVerticalPlain
Font Size: 5.25
Text 2
----
the translators are unable to mix different rows in one
dietant is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 05:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.