So
the source document for your translation is a PDF – and it contains
some complex tables! You just want to copy and paste them directly into
your Word document so you can overtype the text with your translation…
Right?
Well, if you have the professional version of Adobe Acrobat, then
you’re in luck [1]. But if you only have the free reader, like most
people, then you are going to have to use your wits if want to avoid
retyping all the data…
The skills needed to pull the data out of a complex table in a PDF
and make it spring back to life in a Word document are actually very
basic. What can look like a complex task can be done with a few simple
tricks.
Here we break down the problem into a series of really simple steps.
While each step seems to throw up yet another problem to be solved, each
fix only ever requires really simple skills like
Copy & Paste,
Find & Replace… If
you have mastered these simple skills, you don’t have to remember any
correct “sequence” to do the job – just solve each little problem step
by step until you have achieved your goal!
1 Copy the table in the PDF, and paste the data into Word
Select all the text of the table, copy it and paste it directly into Word. The result may not be a pretty sight!
Most of the formatting in the table will be lost – you’ll just have plain data.
It will look a terrible mess as the columns will have disappeared! In
the example above, the words in each of the column headings appear to
be muddled up. Rather than wrapping within each cell, the words on each
line run into the words of the next column.
But don’t worry about it! It’s easy to fix…
It’s not such a big problem to untangle this apparent mess.
2 Click the Show/Hide button
Make sure you have the formatting marks visible so you can see what is going on, and how the data in the table is structured.
The columns of data are clearly separated by
spaces. We can
use these spaces to reconstruct the columns. But the words in the column
headings are also separated by spaces. Sometimes these spaces show
where the columns are supposed to be and sometimes they are just
ordinary “spaces between words”. Figuring out which is which is the only
part of this job which requires a little human intelligence. This is
your job!
Leave the spaces which are supposed to be spaces as spaces, and
change the spaces which are meant to show where the columns are into
something else.
Easy!
Let’s use
tabs to mark where the columns are supposed to be.
3 Spaces to tabs
- Although there are 7 columns of data in my example table, there are only two
column headings in the top row. So in this row we only need to change
one space to a tab to separate the two pieces of text. Select the space
which separates the two headings, and hit the Tab button:
- We now do the same for the 7 column headings. Remember you just have
to decide whether it’s a “real space” or not. You don’t need to line
everything up and it doesn’t matter too much if you make a few mistakes –
you can fix these up later once the table has been made (when you’ll be
able to see what you’re doing!). In the figure below, the blue circle
show a “real” space, the red circle shows a space replaced with a tab.
Now we have to do the same with the data in the body of the table. In
my example table, there are only letters and numbers in the data –
there are no “real spaces” separating words.
All the spaces
mark where the data is to be separated into columns. Instead of changing
them to tabs one by one, we can simplify the task by changing all of
them in one hit using
Find & Replace:
- Select all the data;
- Open the Find & Replace dialog box;
- Type a space into the Find what field;
- Type Word’s code for a tab (^t) into the Replace what field;
- Hit Replace All.
All the text and the data should now have tabs to mark the columns (and spaces to mark the real spaces).
4 Now make the table
As we now have tabs marking where the columns are supposed to be, we can use Word’s
Convert text to Table function to reconstruct a simple, regular table. (We can sort out the irregularities later.)
- Select all the text & data that are to go into the table;
- Go to Insert|Table|Convert text to Table;
- Word has correctly guessed that this is a 7-column table from the
highest number of tabs you’ve put into any line, and that you are going
to use these tabs to set up the columns.
- Click OK.
Magic!
We are almost there. There’s only a bit of tidying up to do!
5 Fix the top row.
We’ve now got a nice regular 7-column table, but there are some minor
irregularities to deal with. The column headings in the top row are not
only in the wrong place, but they are also supposed to span several
columns.
Easily fixed!
- Just select the text and drag it to the right place;
- Then select the cells the text is supposed to span;
- Right click the selected cells and click Merge Cells.
6 Fix the column headings
Look at the words in the 7 column headings. The PDF inconveniently
split them up over three rows rather than wrapping them within a single
cell.
We
want all the words to all be in a single cell at the top of each
column. Easily fixed! We just need to merge these cells vertically.
- Select the cells containing the text for each column heading;
- Right click and select Merge Cells;
- Do the same for the other columns (or to do it more quickly select
the cells and type Ctrl + Y - this repeats the last thing you did).
Each of the column headings is now in its own separate cell. But as
you can see in the red circle above, we have another small problem to
deal with – the words are separated by unnecessary paragraph markers. We
need to get rid of these and replace them with ordinary spaces. You
could just delete them one by one, but here’s a quicker way using
Find & Replace:
- Select all the column headings right across the table;
- Do a Find & Replace. Type the code for a paragraph marker (^p) in the Find what field;
- Type a space in the Replace with field:
- Click Replace All.
With the extra paragraph marks gone, each column heading will wrap normally within its own cell.
(Now is a good time to do a quick proofread of the column
headings. Now that the data is in a table and we can see what we’re
doing, it’s easy to move any words which have ended up in the wrong
column. Just select any misplaced words and drag them to the right
spot.)
7 Final tidy up
You should now have a table with everything in the right place. It
just needs a cosmetic make-over to make it look like the original:
- Select all the text in the table and click the “Centre button” to centre the text in the columns [2];
- Adjust the font, point size, paragraph and line spacing;
- Select the whole table [2] and get rid of all the borders; then
- Reinstate just those borders you need to match the original.
8 One last problem
My table is pretty much the same as the original in the PDF.
But wait! My column headings don’t line up horizontally …
We need to adjust the table property which controls how the text sits
in each cell. Select the offending row, right-click and select
Cell Alignment|Align top Centre.
PDF to Word… The job is done!
[1] How to copy and paste data and tables without the loss of formatting with the professional version of Adobe Acrobat:
http://www.wikihow.com/Copy-and-Paste-PDF-Content-Into-a-New-File
[2] Note the difference between “select the table” and “select all
the text in the table”. If you “select the table” and hit the “Centre”
button, the whole table will move into the centre of the page. If you
just “select all the text” the text in the table will be centred within
each cell. To tell the difference look at the very right-hand side of
the table:
[3] Some PDF to Word converters are worth trying. I tried this one (using the example tables in this post) with some success:
http://www.pdfonline.com/pdf2word/index.asp.
[4] The examples in this post were illustrated using Microsoft Word 2007 and Adobe Reader X.