Re: [O] Org Mode and PDF Notes!

On Wed, Nov 11, 2015 at 3:17 PM, Ramon Diaz-Uriarte <address@hidden> wrote:

Dear Matt,

On Wed, 11-11-2015, at 15:42, Matt Price <address@hidden> wrote:
> I've just written up a post on my workflow for PDF's Since my blog has, I
> think, a readership of 0 (surely there's a way to get emacsers to follow
> me? ah well), I will post a link here in the hopes that someone will be

Add another 1 :-)

> interested:
>
> http://matt.hackinghistory.ca/2015/11/11/note-taking-with-pdf-tools/
>

Really neat! A few comments/questions/ramblings:

- The type of highlights you get from RepliGo contain the text itself. I
mean, when in your pdf I use C-c C-a l, the buffer showing the contents
of each highlight contain the highlighted text.

This is not what I get from, say, EzPDF (which is what I use on Android),
or from highlighting from pdf-tools itself using C-c C-a h, or from
highlighting from Okular. The contents just gives the rectangle). Hummmm...

Because of this, when I use your code on my pdfs, I only get things
such as

Highlight
([[pdfview:/home/ramon/Zotero-data/storage/ESHHD4KW/Frank_2015_Commentary.pdf::5][Frank_2015_Commentary]],
5)

instead of the text. Bummer! I wonder if RepliGO gives you a lot more
than the rest, or if I am doing something silly.

I think that there is no standard way of storing the highlight contents. I chose Repligo over EZPDF because it gives you access to the text of the highlights!

Okular, I think, stores your annotations in its own database, rather than in the pdf. You can (I think!) attach the annotations to the pdf from inside Okular. At leasts, that's what I remember from when I was looking around.

Repligo stores the highlighted text in the "subject" field of the annotation. It's possible that the content of the annotation is stored in some other field, like "content". Maybe you can try:

M-: (pdf-annot-get-annots) and look at the output in the *Messages* buffer. Can you see any evidence of the the text? Can you share what you learned?

Politza and I are discussing this here:
https://github.com/politza/pdf-tools/issues/137

that might be a good place to ocntinue the conversation.

- You have to call mwp/pdf-multi-extract on each file/set of files. I guess
if I knew elisp, I'd find it trivial to iterate over a set of directories
and subdirectories (and do this using a cron job at night), and also
place everything in one single org file. Would this be something
reasonable to do?

for sure. My elisp sucks too but I bet someone will answer you here on the list.

(This might be related to your second Todo)

well, wasn't what I was planning but would still be useful.

- I know nothing about how it works, and it does not use pdf-tools, but in
your first Todo you mention: "extend the pdfview link type (in
org-pdfview) to permit me to specify the precise location of an
annotation,". PDF.js (https://mozilla.github.io/pdf.js/), which is
used for instance by zotfile (http://zotfile.com/) does that and it works
out of the box with Okular (but I've not been able to get it to work with
pdftools).

Until I found pdf-tools, I had planned to write a node wrapper for pdf.js and grab the annotations that way. But I don't really know how to do that, so this turned out to be easier :-)

Anyway, I've judated the post, and it's now possible to create links to individualt annotations, though you will have to use my updated version of org-pdfview, until/unless Markus accepts my patch.

- In case it matters, I have somewhat similar modus operandi. I do a lot
of PDF reading, including note-taking and highlighting, in android
tablets ---I use EzPDF, which also embeds the notes in the PDF. I have a
cron job that extracts all the highlights and annotations of all the PDFs
and places them in a single org file. The kludge is explained here:
https://github.com/rdiaz02/Adios_Mendeley#extracting-all-pdf-annotations-and-placing-them-in-an-org-mode-file
The truth is I use two mechanisms for PDF annotation and highlighting
extraction, since none is fully satisfactory to me, but the one that uses
Ruby (i.e., that does not depend on poppler) is able to actually extract
the text of the highlights.

ah, man, that looks really cool and I'm sorry I didn't know about it earlier! I haven't read through your whole document but looks like there's a lot useful stuff there.

Best, and thanks again for sharing,

you're welcome & thank you!

From:	Matt Price
Subject:	Re: [O] Org Mode and PDF Notes!
Date:	Wed, 11 Nov 2015 15:33:52 -0500