Articles >

How to convert complex PDF to ePub format

by Dongsoft on 2011-12-23

E-book reader has become very popular, although they all support to read PDF files, the experience is not very good. The main problem is that the PDF files cannot adapt to screen sizes automatically . ePub file is another popular e-book format. Similar to html, epub files can adapt to all kinds size of the screen automatically. You may want to convert PDF files to ePub files. In fact ,it is more complex. If the PDF file is a novel which is written in words, no charts, it is easy to convert. But if it is a technical manual, or a physics book, which is full of pictures, tables, formulas, and even some multi-column pages. What does it will become after simple conversion? The answer is that it in a whirl.

How to convert PDF files as you expect, first you have to understand, PDF file is a fixed-size page layout format, in which each element, text, picture have precise positions. ePub file is a stream format which shows contents according to screen size. Therefore, you should not expect to keep the same original layout after conversion . The purpose of this paper is to help you to correct all the content of PDF files, and maintain the original structure of the document as far as possible. You will read the right contents, at the same time you don't need scroll the screen to read, the new document format ePub files will automatically adapt to your screen. OK, Let's get right to the issues.

Download conversion software PDF to ePub Converter. The pro version is best.

Download URL:

After downloading the software, install it.

In the first interface.

Select "Convert PDF files to ePub Files", click "Next", and in the second interface, select your PDF file. Click "Next"


In the third interface, pay attention to select"Strict" mode.

This "Strict" model can deal with the every elements of the PDF files accurately.
See article:

Click "More ..." button, pop up option settings dialog box, some options settings, see Figure:

A: The default settings of page parameters, if you select "Strict" mode, you can set each pages individually. Meaning of the parameters:
Single / Multi Column - one or more columns. If your documents' layout is multi-column, select the "Multi Columns".
Paragraph Check - automatically carry out the paragraph check in documents.
Algin Check - automatically carry out the alignment check in documents.
Ignore Image - ignoring the image, do not output images.
Ignore Vector Graph - ignoring vector images, do not output vector images.
Ignore Link - ignoring hyperlinks in the documents.
B: Bookmarks and TOC settings.
Import PDF Bookmarks (If exist) into TOC - extract PDF bookmarks to convert into TOC in ePub documents.
Output Document Outline (bookmarks) - extract PDF bookmarks to output in epub interface. In the following drop-down box, select the insert position.

Click "Next", enters the fourth interface. In this interface, you can handle and set how to convert your PDF files.


Interface layout Description:
A: PDF page list.
B: page guidance.
C: resolution parameters of the current page.
D: public parameters in connection with all pages.
E: Output Preview.
F: operating edit area.

The following contents are about problems and how to deal with them:

Before talking about specific problems, we will talk about the basic principles of software. There is a problem need to be considered in converting PDF files to ePub stream files, that is how to determine the order of PDF elements in the stream. Generally speaking, the logical order of files should be from top to bottom, left to right (the current software does not support right to left logical order of PDF). In some simple documents, it doesn't need consider this problem, such as novels which can be converted by this order basically. However, for some technical documentations, operating manuals and other documents with complex constructions, software can not entirely check the order of their individual elements correctly, then it need manual intervention. Users can set some parameters to tell the software to output to the stream according to what kind of orders to get desired results. The following settings and operations will help software to handle these complex documents.
1.To remove duplicate page headers and footers.
General official PDF files have page headers and footers, after converting into epub files, because there is no concept of single page, the entire files are presented in the reader in the form of stream, therefore, generally it should remove these duplicate page headers and footers. To remove them, you only need take them out of the valid region. After selecting the active area, the gray part of the content will not output into epub. As shown below. Right-click in the gray area, select the menu "Apply to all Pages", which can apply to all pages.

Reference link:

2. Paragraph check.
The PDF files don't have concept of paragraph, each character has its precise positioning. When they are outputted to epub files, it should detect which are sentences and paragraphs in logical. This is very important for reading. Without this function, each line in output document will convert into one paragraph. Generally speaking, we should set this option in the main text page.
Reference link:

3. Alignment check.
In some pages, such as cover, title page which have centered, right aligned sentences. You can set this option to detect the alignment state, so the presented format in the epub will closer to the PDF. Generally speaking, we should set this option in the non-text pages.
Reference link:

4. Deal with the category pages in files
General official files have their own catalogs. In category pages, the single line is one paragraph. Therefore, before conversion,you should remove "Paragraph Check" option. Otherwise, the software will output them as paragraphs. Setting operation as follows:
Select the directory page (multiple choice) in the "PDF page list" , right click pop up menu option"Set Parse Options", remove the "Paragraph Check" option, and click "OK".
5. Vector graphic conversion.
In the PDF, there are two graphics, one is compressed bitmap format such as JPEG, the other one is the vector graphic which are composed by the point, line, fill and other geometric elements. Most charts are composed of the two which mix together, sometimes there are text mark inside. Therefore, a simple and effective way is to combine them together, then output them as one picture. The operation method is that select the all element contents of the chart firstly, right-click pop-up menu, choose "to Graphic". As shown below:

6.Deal with mathematical formulas.
The conversion of mathematical formulas is similar to diagrams. The layout of mathematical formula in the html is very difficult, so convert mathematical formulas into images is a better method. The operation way is that firstly select all the contents of the formulas, right-click pop-up menu, select "to Graphic".
7. Deal with Multi-column pages.
The reading order of multi-column pages is different from single-column pages, you need set the software to analyze it in proper order. The setting method is very simple, as long as selecting "Multi Columns" option in the multi-column page. If you have multiple pages need to be set, then select all the pages in "PDF page list" , right-click pop-up menu, select "Multi Columns" option. To set all the pages, select "Multi Columns" option in the "analytic parameters in the current page", click "Apply to all Pages" button.
8. Dealing with Tables.
There are two methods of dealing with tables, the first method is that it will be converted into images for outputting, like vector graphics, mathematical formulas, the second method is that it will be converted into real html tag like "<table>..". If the table format is relatively simple, you can use the second method, the operation methods is that firstly select all the elements in the table, right-click pop-up menu, select "to Table". As shown below:

If the table format is complex, there are many merged unit cents, you can use the first method. You can preview the output results to determine which method to choose.

Now all pdf pages settings are completed. Click "OK" button in right-bottom corner.


Software enters the fifth interface.

Software start converting PDF files, the interface displays the current progress of the conversion.

Conversion is complete, software enters the sixth interface automatically.

Set the "title", "author", "subject" in output documents. click "OK", click "next".

To edit epub files which will be output

If it is Pro version, the software has final step which is that edit epub files for outputting. Select the files' name, click "Edit". Pop-up editing software, you can edit any contents, including toc, ncx, html codes. If you are not very familiar with epub format, please do not change it.

The final step, to obtain epub files for outputting.

The final step, click "Output Folder:" the link below, open the output folder, you can see the final epub files for outputting.

Copyright © 2009-2013 DONGSOFT. All Rights Reserved.

old website|Sitemap|Privacy Policy