Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segmentation fault on VOB #1128

Closed
9 tasks done
boog opened this issue Nov 28, 2019 · 15 comments
Closed
9 tasks done

[BUG] Segmentation fault on VOB #1128

boog opened this issue Nov 28, 2019 · 15 comments
Labels

Comments

@boog
Copy link

boog commented Nov 28, 2019

Please prefix your issue with one of the following: [BUG], [PROPOSAL], [QUESTION].

CCExtractor version (using the --version parameter preferably) :
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc

CCExtractor detailed version info
Version: 0.88
Git commit: 280b430
Compilation date: 2019-11-27
File SHA256: eec95999b58c1a22f7a0909844d58df1e6456ae87d6c34afd782ae3ab2173c6b
Libraries used by CCExtractor
libGPAC Version: 0.7.2-DEV
zlib: 1.2.11
utf8proc Version: 2.2.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.35
FreeType
libhash
nuklear
libzvbi

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have used CCExtractor just a couple of times.

Necessary information

  • Is this a regression (did it work before)?
  • NO
  • What platform did you use?
  • Linux
  • Mac
  • What were the used arguments? -debug -o test.srt

Video links (replace text below with your links)
http://ccextractor.s3-website-us-east-1.amazonaws.com/example.vob

Additional information
Ripped and decrypted from DVD with Region code 2. Sub titles are image based dvd_subtitle.

Tried to export in srt and spupng both segfault part way through. Was able to successfully extract some pngs with spupng. Debug flag doesn't output very much helpful information.

On linux:

Unknown command in control sequence!
w:1 h:-79
Segmentation fault

On Mac:

search_start_code: bitsleft <= 0
read_pic_data: reached end of bitstream.
PACK header
Subtitle found Stream id:29
PES data read: 912
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
Illegal instruction: 4

PS: Make sure you set an alert in GitHub so you get notifications about your ticket. We may need to ask questions and we do everything inside GitHub's system.

@cfsmp3 cfsmp3 added the GCI19 label Nov 30, 2019
@NilsIrl
Copy link
Contributor

NilsIrl commented Dec 5, 2019

Somehow, I suddenly can't reproduce it anymore

@Sudoxo
Copy link
Contributor

Sudoxo commented Dec 5, 2019

I've compiled with flag -g, runned valgrind with --leak-check=full --show-leak-kinds=all and it made big log file, still I don't know in which file of code ccextractor crashes.
Full log: log2.txt
Last lines of log file:

==18757== 18,635,728 bytes in 1 blocks are still reachable in loss record 396 of 397
==18757== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18757== by 0x5309978: tesseract::SquishedDawg::read_squished_dawg(_IO_FILE*, tesseract::DawgType, STRING const&, PermuterType, int) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x530AC75: tesseract::DawgLoader::Load() (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x530AFD6: tesseract::DawgCache::GetSquishedDawg(STRING const&, char const*, tesseract::TessdataType, int) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x53117B5: tesseract::Dict::Load(tesseract::DawgCache*) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x52D599D: tesseract::Wordrec::program_editup(char const*, bool, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x5214D68: tesseract::Tesseract::init_tesseract_internal(char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector const*, GenericVector const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x521584C: tesseract::Tesseract::init_tesseract(char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector const*, GenericVector const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x51C6247: tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector const*, GenericVector const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x51CF9D7: TessBaseAPIInit4 (in /usr/lib/libtesseract.so.3.0.4)
==18757== by 0x425F75: init_ocr (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x41A554: init_dvdsub_decode (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757==
==18757== 76,363,992 bytes in 1 blocks are still reachable in loss record 397 of 397
==18757== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18757== by 0x411B41: ccx_dtvcc_init (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x44B2EA: init_cc_decode (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x440A07: update_decoder_list_cinfo (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x4101BA: general_loop (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x407283: api_start (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757== by 0x407FB1: main (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757==
==18757== LEAK SUMMARY:
==18757== definitely lost: 291,811 bytes in 396 blocks
==18757== indirectly lost: 0 bytes in 0 blocks
==18757== possibly lost: 2,003,923 bytes in 135 blocks
==18757== still reachable: 130,648,548 bytes in 124,494 blocks
==18757== of which reachable via heuristic:
==18757== newarray : 3,987,112 bytes in 4,319 blocks
==18757== suppressed: 0 bytes in 0 blocks
==18757==
==18757== For counts of detected and suppressed errors, rerun with: -v
==18757== Use --track-origins=yes to see where uninitialised values come from
==18757== ERROR SUMMARY: 10000008 errors from 15 contexts (suppressed: 0 from 0)

@cfsmp3
Copy link
Contributor

cfsmp3 commented Dec 5, 2019

I'd say that's not compiled with -g or you'd have line numbers.

But nevertheless that output looks HORRIBLE. For example:

==18757== Conditional jump or move depends on uninitialised value(s)
==18757==    at 0x40F693: process_data (in /home/patryk/Desktop/ccextractor/linux/ccextractor)

That's of course asking for trouble. What value is that?

=18757== Source and destination overlap in memcpy(0x98681e9, 0x9868040, 2019)
==18757==    at 0x4C32513: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

We should be using memmove() in this case, IF (that's a big if) we're actually trying to move inside the buffer.

The actual crash:

==18757== Process terminating with default action of signal 11 (SIGSEGV)
==18757==  Access not within mapped region at address 0x0
==18757==    at 0x4C3453F: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18757==    by 0x419654: get_bitmap (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757==    by 0x41A518: process_spu (in /home/patryk/Desktop/ccextractor/linux/ccextractor)
==18757==    by 0x40F6E1: process_data (in /home/patryk/Desktop/ccextractor/linux/ccextractor)

We're calling memset over a NULL pointer. Of course it's going to crash :-)

@Sudoxo
Copy link
Contributor

Sudoxo commented Dec 6, 2019

Okay I've run this correctly and now I see that issue is in dvd_subtitle_decoder.c:99

	ctx->bitmap = malloc(w*h);
	buffp = ctx->bitmap;
	memset(buffp, 0, w*h);

So I think that I have to change memset to memmove now, is that right?

==6603== Process terminating with default action of signal 11 (SIGSEGV) 
==6603==  Access not within mapped region at address 0x0
==6603==    at 0x4C3453F: memset (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6603==    by 0x419654: get_bitmap (dvd_subtitle_decoder.c:99)
==6603==    by 0x41A518: process_spu (dvd_subtitle_decoder.c:410)
==6603==    by 0x40F6E1: process_data (general_loop.c:669)
==6603==    by 0x4105D8: general_loop (general_loop.c:1028)
==6603==    by 0x407283: api_start (ccextractor.c:210)
==6603==    by 0x407FB1: main (ccextractor.c:534)

@cfsmp3
Copy link
Contributor

cfsmp3 commented Dec 6, 2019

No, you can see there that the error is that we're memset()ing on a NULL pointer. The pointer comes from that malloc(), and the reason you're getting a NULL pointer is that you are asking for a negative amount of memory (since we knew that we had a negative height).

So there's two options for now

  1. Figure out why we are getting a negative height. Is it corrupt data, or is our parser broken?
  2. In any case, if we get a negative value for size, we should skip that block completely and try to recover.

Bonus - we should be checking the result from malloc() in any case.

@Sudoxo
Copy link
Contributor

Sudoxo commented Dec 8, 2019

When I've added checking if pointer is NULL:

        ctx->bitmap = malloc(w*h);
	buffp = ctx->bitmap;
	
	if(!buffp)
	{
		dbg_print(CCX_DMT_VERBOSE, "Error!");
		return;
	}
	memset(buffp, 0, w*h);	

There is no longer signal 11, but it is still crashing.
Valgrind log after that change:

==6893== Memcheck, a memory error detector
==6893== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6893== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==6893== Command: ./ccextractor /home/patryk/Desktop/example.vob -debug
==6893== Parent PID: 6873
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x40F693: process_data (general_loop.c:664)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Source and destination overlap in memcpy(0x98681e9, 0x9868040, 2019)
==6893==    at 0x4C32513: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6893==    by 0x41A3E7: process_spu (dvd_subtitle_decoder.c:380)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x4199D3: decode_packet (dvd_subtitle_decoder.c:210)
==6893==    by 0x41A554: process_spu (dvd_subtitle_decoder.c:415)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x4199DC: decode_packet (dvd_subtitle_decoder.c:210)
==6893==    by 0x41A554: process_spu (dvd_subtitle_decoder.c:415)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x4199E1: decode_packet (dvd_subtitle_decoder.c:210)
==6893==    by 0x41A554: process_spu (dvd_subtitle_decoder.c:415)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x4199E6: decode_packet (dvd_subtitle_decoder.c:210)
==6893==    by 0x41A554: process_spu (dvd_subtitle_decoder.c:415)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== Conditional jump or move depends on uninitialised value(s)
==6893==    at 0x4199EF: decode_packet (dvd_subtitle_decoder.c:210)
==6893==    by 0x41A554: process_spu (dvd_subtitle_decoder.c:415)
==6893==    by 0x40F6E1: process_data (general_loop.c:669)
==6893==    by 0x4105D8: general_loop (general_loop.c:1028)
==6893==    by 0x407283: api_start (ccextractor.c:210)
==6893==    by 0x407FB1: main (ccextractor.c:534)
==6893== 
==6893== 
==6893== More than 10000000 total errors detected.  I'm not reporting any more.
==6893== Final error counts will be inaccurate.  Go fix your program!
==6893== Rerun with --error-limit=no to disable this cutoff.  Note
==6893== that errors may occur in your program without prior warning from
==6893== Valgrind, because errors are no longer being displayed.
==6893== 
==6893== 
==6893== HEAP SUMMARY:
==6893==     in use at exit: 133,019,609 bytes in 63,684 blocks
==6893==   total heap usage: 64,894,930 allocs, 64,831,246 frees, 21,781,345,659 bytes allocated
==6893== 
==6893== LEAK SUMMARY:
==6893==    definitely lost: 285,386 bytes in 409 blocks
==6893==    indirectly lost: 0 bytes in 0 blocks
==6893==      possibly lost: 2,127,644 bytes in 141 blocks
==6893==    still reachable: 130,606,579 bytes in 63,134 blocks
==6893==                       of which reachable via heuristic:
==6893==                         newarray           : 3,989,864 bytes in 4,344 blocks
==6893==         suppressed: 0 bytes in 0 blocks
==6893== Rerun with --leak-check=full to see details of leaked memory
==6893== 
==6893== For counts of detected and suppressed errors, rerun with: -v
==6893== Use --track-origins=yes to see where uninitialised values come from
==6893== ERROR SUMMARY: 10000000 errors from 7 contexts (suppressed: 0 from 0)

How to skip some block completely if the error occurs in dvd_subtitle_decoder.c but the loop's in the general_loop.c?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Dec 8, 2019 via email

@boog
Copy link
Author

boog commented Dec 10, 2019

Thanks for your work on this guys - just pulled from master and rebuilt on Mac with,
./build.command OCR

Then run against test file above and still getting,
read_pic_data: reached end of bitstream. PACK header Subtitle found Stream id:29 PES data read: 912 !strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209 Illegal instruction: 4

@eshandhawan51
Copy link
Contributor

@cfsmp3 Is this issue still active ?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jan 12, 2020

@eshandhawan51 I'd say it is if we have a failing assert :-)

@Sudoxo
Copy link
Contributor

Sudoxo commented Jan 13, 2020

My commit've fixed this asset - I've successfully built on Linux. @boog said that there is an error on Mac, I don't own any Mac device so It's hard for me to make further changes.

@NilsIrl
Copy link
Contributor

NilsIrl commented Jan 21, 2020

@boog could you provide a backtrace?

The failing assert is in tesseract code btw, so we are calling tesseract stuff wrong somewhere.

@kdrag0n
Copy link
Contributor

kdrag0n commented Jan 22, 2020

I tried it on macOS 10.14.5 (Tesseract 4.1.1 installed from Homebrew) with the example.vob file linked above and it worked fine, so I don't think this is an issue anymore.

@NilsIrl
Copy link
Contributor

NilsIrl commented Jan 22, 2020

@boog can you confirm it is fixed?

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jan 25, 2020

Closing. @boog Feel free to reopen if you are still having problems

@cfsmp3 cfsmp3 closed this as completed Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants