Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board (http://www.paceadvantage.com/forum/index.php)
-   Off Topic - Computers (http://www.paceadvantage.com/forum/forumdisplay.php?f=55)
-   -   need your opinion (http://www.paceadvantage.com/forum/showthread.php?t=95428)

HUSKER55 06-24-2012 09:09 AM

need your opinion
 
I was wondering if a PDF converter program would allow me to down load the pdf and then put it into excel and allow me to pick and choose the data.

I got some info on Nuance Pro8 that include Dragon speaking program.


Does anyone have any experience or thoughts you would share?


thanks!

Capper Al 06-24-2012 09:13 AM

Doesn't sound practical to me. Most data streams can either be copied from the screen or uploaded from a comma delimited file. Or I might not be understanding your question.

wilderness 06-24-2012 09:31 AM

There are different types of PDF's.
1) A PDF created as an image
2) A PDF created as text.

In addition, PDF creators are offered security options (restrictions) that either prohibit and/or limit general file options (copy and paste is one)

You may not copy data from PDF-image file when data is non-existent, at least not without the use of an OCR tool.

hcap 06-24-2012 01:49 PM

I doubt if ant will do it well.
Having said that, here is a google search that turned up quite a few claiming otherwise.

http://www.google.com/search?hl=en&s....0.Gp0vF-ih5UQ

HUSKER55 06-24-2012 05:11 PM

THANK YOU GUYS FOR TAKING THE TIME TO HELP.:)

Robert Goren 06-25-2012 12:54 AM

I had a program that I bought a long time ago that allowed me to copy and paste Bris PDF past performances that Twin Spires gives away to Excel. It wasn't as useful as I hope.

SchagFactorToWin 06-25-2012 10:01 AM

I've never been able to get the sub/super script in the running lines to convert properly.

wilderness 06-25-2012 10:57 AM

Quote:

Originally Posted by Robert Goren
I had a program that I bought a long time ago that allowed me to copy and paste Bris PDF past performances that Twin Spires gives away to Excel. It wasn't as useful as I hope.

Robert,
FWIW and with today's additional fields in the Tracmaster data, the software would even be less useful.


Quote:

Originally Posted by SchagFactorToWin
I've never been able to get the sub/super script in the running lines to convert properly.

Schag,
I've the same problem in OCRing with small fonts of any nature. Fractions (race times) are difficult in OCR as well.

There is a PDF OCR software, that is quite effective, however it 'taint cheap.
Abbyy FineReader, has a diverse set of fonts for recognition added to their software in which recognition is far more accurate than the OCR software that I use with my scanner. Unfortunately the bloated fonts slows the OCR process down quite a lot.
I've never used Abbyy for tables ( I try to avoid stats at every turn).

Last year I worked with some HS students using the scanner OCR software (same model scanners and software I use daily) and they thought it was slow as molasses.

The key to OCR is in an accumulated supplemental dictionary (topic based), unfortunately there is NOT much room for adding numerals to the dictionary.

chickenhead 06-25-2012 12:05 PM

I experimented with building an automated setup for this for race charts. Since equibase has historical charts one could build extensive pp collection.

Ocr: Google docs uses Ocr to do pdf conversion and it handles charts fine. However the formatting is atrocious and would be difficult to deal with.

Cracking: breaking the password to allow text access proved much more fruitful. After unlocking one can use a pdf parse package like many languages have (i used pdfminer for python) and one can pretty reasonably get what they need.

wilderness 06-25-2012 12:51 PM

Quote:

Originally Posted by chickenhead
Cracking: breaking the password to allow text access proved much more fruitful. After unlocking one can use a pdf parse package like many languages have (i used pdfminer for python) and one can pretty reasonably get what they need.

My understanding related to this is that the vulnerability of hacking the password is dependent upon the number of pages the PDF contains.

A 1-4 PDF is much easier hack than a 10-20 page PDF, with the later likely not possible at all.
The hacking is also dependent upon which bit-range is set by the PDF creator.

A 40-bit (early Adobe) is easily hacked.
A 128-bit is quite difficult.

I've seen some PDF's in the past where the creator added security settings, however never added two levels of passwords, thus the security settings were easily removed. Course, your not able to change these settings with the FREE Acrobat viewer or many of the other FREE PDF tools.

tupper 06-26-2012 02:44 AM

I just tested the PDF Import plugin in LibreOffice, and it opens PDF PPs through the draw program in the same layout as the original PDF.

It is probably possible to then make a LibreOffice macro that would convert all of the fields to an ordered spreadsheet.

dietant 06-24-2013 04:16 AM

a simple Tip
 
for a a PDF created as text using foxit software text converter can help a little bit
do the rest with PDFs downloaded from Amwest programs link (today gone?) and excell VBA macro.
The foxit stuff produce 3 rows for a line containing super's and sub's racetrackfonts
example:
----------------------- Page 3-----------------------
Race 2 6 Furlongs BELMONT PARK - Wednesday, June 12, 2013 3+ FCLM, $20000
Purse: $ 28000. For Maidens, Fillies And Mares Three Years Old And Upward Foaled In NeYork State And Approved By The NeYork State-bred
2 Registry. Three Year Olds, 119 Lbs.; Older, 124 Lbs. CLM Price $20,000.
etc...13 Record: 5 0 0 1 $ 7220
1 Royal Blue, YelloTriangular Panel, YelloDiamonds On Sleeves, Blue And YelloCap 2/1 [19, 0, 2, 4, 0%] 12 Record: 1 0 0 0 $ 267
MAXANA 119 $ 20000 JUNIOR ALVARADO 12-13 Off: 1 0 0 0 $ 1650
-etc..
-etc..
1st Row: 52 50 08 38 5 6 6 7 6 nk nk
2nd Row :21/04/13-2AQU ft 3+F MSW50000 6f 23 47 1:11 68 4 2 5 2 6 5 2 5 4 Tomas,P 112 Lb 30.75 ChinaGold113 AhGaga118 Vaid118
to rail 1/2, 4upper[6]
3nd Row: 113 118 118 1 2 4
-etc...
super strings: 52 50 08 38 5 6 6 7 6 nk nk
sub strings: 113 118 118 1 2 4
Concatenating:
the super's 52 50 08 38 and 6f, 23, 47, 1:11 = 6f (52) ft1=23.50, ft2=47:08, ft3=1:11:38
the super's 5 6 6 7 and the sub's 1 2 4 = 5 (5 ½), 6 6, 5 (6 ½), 5 (7 ¼)
the sub's 113 118 118 and the super's 6 nk nk = ChinaGold113 6 AhGaga118 nk Vaid118 nk

Longshot6977 06-24-2013 12:51 PM

I have tried some OCR programs and have the same problems as other regarding not reading correctly the superscripts and subscripts. They also put too much data in one row in Excel and that too is hard to deal with since it varies sometimes.
Has anyone got any program they had good success with to allow proper importing/reading of the PDF charts or PPs to Excel? Dietant, can you please elaborate a little more on your procedure? Thanks.

PS- I found ABLE2EXTRACT Pro v8 to be the best so far, but it requires too much finagling with the columns and won't always read sub/superscripts.

vegasone 06-24-2013 04:10 PM

The HTML output of ABLE2EXTRACT Pro v8 looks like it would be the easiest to parse if you were able to do that.

dietant 06-24-2013 05:23 PM

sharpen ur pencils
 
No big deal.
the super's, numbers, and sub's have diferent size, ocupies diferent pdf(X,Y) positions, and have diferent fonts.
the translators "pdf to text" write them in diferent rows depending on the value of Y and use the X value as offset from the begin of the row
Study case: Fin 2nd behind 1 and 1/2
  • .............. 1
  • .............. -
  • .............. 2
  • .........1
  • ...2
The "2":
Position (X,Y); (258.40 ,636.50)
Font Name: Univers-Condensed-Medium
Font Size: 7
Text 2
------
The "1":
Position (X,Y); (261.82 ,638.40)
Font Name: Univers-Condensed-Medium
Font Size: 5.25
Text 1
-----
the (1/2)
Position (X,Y); (263.73 ,636.70)
Font Name: SansFractionsVerticalPlain
Font Size: 5.25
Text 2
----
the translators are unable to mix different rows in one :sleeping:


All times are GMT -4. The time now is 08:58 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved

» Advertisement
» Current Polls
Tuscan Gold VS Catching Freedom
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 08:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.